Why AI Makes Technical Excellence More Critical, Not Less
The content here is under the Attribution 4.0 International (CC BY 4.0) license
Join Our Community
Connect with developers, architects, and tech leads who share your passion for quality software development. Discuss TDD, architecture, software engineering, and more.
→ Join SlackThe rise of AI-assisted development tools, such as GitHub Copilot, ChatGPT, and other large language models (LLMs), has transformed the way developers write code. These tools promise increased productivity, faster feature delivery, and reduced cognitive load. However, a fundamental question emerges: if AI can generate code rapidly, does code quality still matter?
The answer, for me is grounded in extreme programming (XP) principles and empirical evidence. Code quality is not an external attribute that can be bolted onto software after the fact. Rather, it is a foundational element that enables sustainable development (Beck & Andres, 2004). In the AI era, when code generation accelerates, the absence of embedded quality practices will amplify technical debt, maintainability challenges, and security vulnerabilities, as it is already doing (Shin, 2024; Alves et al., 2025).
This post explores why code quality shoul be treated as a foundation rather than an afterthought, drawing from XP values, principles, and practices to demonstrate how AI amplifies both the benefits of good practices and the consequences of poor ones.
The Foundation Principle: Quality as a Core Value
Extreme programming defines five core values: communication, simplicity, feedback, courage, and respect (Beck & Andres, 2004). These values emphasize that quality is not merely a desirable outcome but an intrinsic aspect of the development process. Kent Beck articulates this through technical practices such as test-driven development (TDD), continuous integration (CI), refactoring, and pair programming.
Extreme programming values and practices
XP emphasizes technical excellence through practices that build quality into every step of development. These practices include TDD, where tests are written before code to ensure correctness and design clarity; CI, which integrates code frequently to detect integration issues early; refactoring, which improves code structure without changing behavior; and pair programming, which fosters knowledge sharing and reduces defects (Beck & Andres, 2004).
The principle of “simplicity” in XP is particularly relevant in the AI era. Simple code is easier to understand, modify, and maintain (Beck & Andres, 2004). When AI generates complex or convoluted code, the cost of maintaining that code increases exponentially. Research has shown that AI-generated code can be as difficult to understand as human-written code, if not more so, especially when the generated code lacks clear structure or contains subtle bugs (Maes, 2025).
The feedback loop, another XP principle, relies on rapid iteration and validation. TDD provides immediate feedback about whether code meets its requirements (Beck, 2003). When developers use AI to generate code without tests, they forfeit this feedback mechanism, leading to brittle software that fails under unexpected conditions.
Why AI Amplifies the Need for Quality Practices
AI tools accelerate code generation, but they do not inherently guarantee correctness, maintainability, or security. Multiple studies have documented the risks of relying on AI without quality practices:
Security Vulnerabilities
Research by (Shin, 2024) found that AI-generated code can introduce security vulnerabilities that experienced developers must identify and correct. Without a foundation of secure coding practices, developers may inadvertently deploy vulnerable code to production. This is particularly concerning in regulated industries (finance, healthcare, critical infrastructure) where security failures have severe consequences.
Test Quality and Maintenance Burden
A study on AI-generated Python tests revealed that LLMs frequently produce tests with test smells, code smells that affect test maintainability and effectiveness (Alves et al., 2025). Common issues include excessive test setup, lack of assertions, and brittle tests that couple tightly to implementation details. These problems echo the test anti-patterns documented by the TDD community (Meszaros, 2007).
When developers accept AI-generated tests without review or refactoring, they accumulate technical debt in the test suite itself. This undermines the very purpose of testing: to provide confidence in the system’s behavior and enable safe refactoring.
Understandability and Cognitive Load
While AI can generate code quickly, understanding that code remains a human responsibility. Research indicates that developers spend significantly more time reading code than writing it (Hermans, 2021). When AI-generated code lacks clarity or deviates from established patterns, it increases cognitive load for the entire team.
The XP practice of collective code ownership assumes that any team member can modify any part of the codebase (Beck & Andres, 2004). This requires code to be simple, well-tested, and consistent. AI-generated code that violates these principles hinders collective ownership and creates knowledge silos.
Extreme Programming Practices in the AI Era
XP practices provide a framework for integrating AI tools responsibly while maintaining quality. The following practices are particularly relevant:
Test-Driven Development (TDD)
TDD remains a cornerstone of quality software development. The red-green-refactor cycle (write a failing test, make it pass, refactor the code) ensures that all code is covered by tests and that the design emerges incrementally (Beck, 2003).
In the AI era, TDD takes on new importance. Developers can use AI to generate tests or implementation code, but the TDD cycle provides a safety net. For example:
- Write a failing test that captures a requirement.
- Use AI to generate implementation code that passes the test.
- Review the AI-generated code for quality, security, and clarity.
- Refactor to improve design while keeping tests green.
Research by (Mock et al., 2024) explored both fully-automated and collaborative models of AI in TDD. The collaborative model, where developers actively review and modify AI-generated code, produced higher-quality results than the fully-automated approach. This aligns with XP’s emphasis on human judgment and continuous feedback.
AI-assisted TDD workflow
The collaborative TDD workflow with AI involves iterative cycles where the developer writes a test, the AI suggests implementation code, and the developer reviews, modifies, and refactors as needed. This approach leverages AI’s speed while maintaining human oversight over design decisions and quality attributes (Mock et al., 2024; Piya & Sullivan, 2024).
Continuous Integration (CI)
CI ensures that code integrates frequently and that automated tests run on every commit (Fowler & Foemmel, 2006). In the AI era, CI becomes even more critical because AI-generated code may introduce subtle integration issues or break existing functionality.
A robust CI pipeline should include:
- Automated unit tests to verify correctness at the component level.
- Integration tests to ensure components work together.
- Static analysis tools to detect code smells, security vulnerabilities, and violations of coding standards.
- Code coverage metrics to identify untested code paths.
By running these checks on every commit, teams gain immediate feedback about AI-generated code, preventing defects from propagating through the codebase.
Refactoring
Refactoring improves code structure without changing behavior (Fowler et al., 1999). XP emphasizes continuous refactoring to maintain simplicity and prevent technical debt accumulation.
AI-generated code often requires refactoring to improve readability, reduce complexity, or align with team conventions. Developers must treat refactoring as a first-class activity, not an optional cleanup step. The discipline of refactoring ensures that AI-generated code integrates seamlessly with the existing codebase and adheres to quality standards.
Pair Programming and Code Review
Pair programming and code review are social practices that enhance code quality through collaboration (Beck & Andres, 2004). When one developer generates code with AI, having a second developer review it provides a critical check on correctness, security, and design.
Research suggests that collaborative review of AI-generated code significantly improves quality outcomes (Mock et al., 2024). The reviewer can identify subtle bugs, security vulnerabilities, or design issues that the AI missed.
The Consequences of Neglecting Quality
What happens when teams prioritize speed over quality, relying on AI without embedded quality practices? Empirical evidence and industry experience provide sobering answers:
Technical Debt Accumulation
Technical debt is the cost of rework caused by choosing an expedient solution over a better design (Cunningham, 1992). When AI generates code rapidly but developers skip refactoring, testing, or review, technical debt accumulates at an accelerated pace. This debt manifests as brittle code, difficult-to-diagnose bugs, and increasing maintenance costs.
XP recognizes that technical debt is inevitable but manageable through continuous refactoring and a focus on simplicity (Beck & Andres, 2004). Without these practices, AI-generated code becomes a liability rather than an asset.
Increased Defect Rates
Studies have shown that AI-generated code can introduce defects, particularly in edge cases or when handling complex logic (Shin, 2024; Alves et al., 2025). Without adequate testing, these defects escape to production, leading to customer dissatisfaction, costly bug fixes, and erosion of trust.
XP’s emphasis on TDD and CI provides a defense against this risk. By writing tests first and integrating frequently, teams detect defects early, when they are cheapest to fix.
Security Risks
Security vulnerabilities in AI-generated code pose significant risks, especially in applications that handle sensitive data or operate in regulated environments (Shin, 2024). Without secure coding practices and security-focused code reviews, teams may unknowingly deploy vulnerable systems.
The XP practice of “root-cause analysis” (Beck & Andres, 2004) encourages teams to investigate defects thoroughly and address underlying causes. Applying this practice to security vulnerabilities ensures that teams learn from mistakes and improve their processes.
Code Quality as a Competitive Advantage
Organizations that treat code quality as a foundation gain strategic advantages:
- Faster feature delivery: High-quality code is easier to modify and extend, enabling teams to deliver new features more rapidly (Forsgren et al., 2018).
- Lower maintenance costs: Well-tested, simple code requires less effort to maintain, freeing resources for innovation.
- Higher reliability: Quality practices reduce defect rates, improving customer satisfaction and trust.
- Improved developer experience: Developers working with high-quality codebases report greater satisfaction and productivity (Forsgren et al., 2018).
AI tools amplify these advantages when used within a framework of quality practices. Conversely, teams that neglect quality find that AI accelerates their path to an unmaintainable, defect-ridden codebase.
Conclusion
Code quality is not an external attribute that can be applied after development. It is a foundation that enables sustainable, reliable, and maintainable software. Extreme programming provides a proven framework for building quality into every step of development through practices such as TDD, CI, refactoring, and pair programming (Beck & Andres, 2004).
In the AI era, these practices become more critical, not less. AI tools accelerate code generation, but they do not guarantee correctness, security, or maintainability. Without embedded quality practices, AI amplifies technical debt, defect rates, and security risks.
Teams that treat code quality as a foundation, applying XP principles consistently, will harness AI as a powerful tool for productivity and innovation. Those that prioritize speed over quality will find that AI merely accelerates their journey to an unmaintainable codebase. The choice is clear: quality is not optional. It is the foundation upon which sustainable software is built.
References
- Beck, K., & Andres, C. (2004). Extreme Programming Explained: Embrace Change (The XP Series). Addison-Wesley Professional.
- Shin, W. (2024). A Study on Test-Driven Development Method with the Aid of Generative AI in Software Engineering. International Journal of Internet, Broadcasting and Communication, 16(4), 194–202.
- Alves, V., Bezerra, C., Machado, I., Rocha, L., Virgı́nio Tássio, & Silva, P. (2025). Quality Assessment of Python Tests Generated by Large Language Models. ArXiv Preprint ArXiv:2506.14297.
- Maes, S. H. (2025). The gotchas of ai coding and vibe coding. it’s all about support and maintenance. OSF.
- Beck, K. (2003). Test-driven development: by example. Addison-Wesley Professional.
- Meszaros, G. (2007). xUnit test patterns: Refactoring test code. Pearson Education.
- Hermans, F. (2021). The Programmer’s Brain: What every programmer needs to know about cognition. Simon and Schuster.
- Mock, M., Melegati, J., & Russo, B. (2024). Generative AI for Test Driven Development: Preliminary Results. International Conference on Agile Software Development, 24–32.
- Piya, S., & Sullivan, A. (2024). LLM4TDD: Best practices for test driven development using large language models. Proceedings of the 1st International Workshop on Large Language Models for Code, 14–21.
- Fowler, M., & Foemmel, M. (2006). Continuous integration. Thought-Works. https://martinfowler.com/articles/continuousIntegration.html
- Fowler, M., Beck, K., Brant, J., Opdyke, W., & Roberts, D. (1999). Refactoring: improving the design of existing code. Addison-Wesley Professional.
- Cunningham, W. (1992). The WyCash portfolio management system. Addendum to the Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Addendum), 29–30.
- Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate: The science of lean software and DevOps: Building and scaling high performing technology organizations. IT Revolution.
About this post
This post content s was assisted by AI, which helped with research, curate content and code suggestions.