Introduction
Research
Discussion
AI in TDD Workflows in practice
References

AI and TDD - A match that can work?

Last updated Jun 24, 2025 Published Jun 22, 2025

The content here is under the Attribution 4.0 International (CC BY 4.0) license

Home / AI and TDD - A match that can work?

Test-Driven Development (TDD) is a software development approach that emphasizes writing tests before writing the actual code. This method not only ensures that the code meets the requirements but also helps in maintaining code quality in the form of refactoring. With the advent of AI, TDD can take advantage of it, making it with less friction to be adopted by developers.

In this post, we will explore how AI can be integrated into TDD workflows, providing practical examples and insights into the benefits of this approach based on what research has shown.

Context

Before start, let’s clarify what AI means for the purpose of this post. AI refers to the use of LLMs (Large Language Models) such as ChatGPT, Copilot, and others that can assist in generating code, tests, and even suggesting improvements to the codebase. throughout this post, we will refer to AI as the use of LLMs in the context of TDD workflows, and whenever needed it is specified which LLM was used.

Research

In the world of software development, TDD has been a game-changer. It encourages developers to think about the requirements and design of their code before implementation. However, writing tests can be time consuming (Mock et al., 2024) (Alves et al., 2025). It is not only the advent of AI that looked at improving the TDD process, but also the automatically test case generation through a defined algorithm that generate the test cases and was able to detect more errors in comparison to the traditional TDD approach (Trinh & Truong, 2024).

In that sense, AI can assist in generating tests, suggesting improvements, and even automating parts of the TDD process. Research on the writing tests has shown that AI can significantly reduce the time spent on writing tests, however, an experienced developer is still needed to ensure that the tests are meaningful and cover the necessary scenarios (Mock et al., 2024). Not only that, generated code might not be secure (Shin, 2024).

TDD has been put into a context of generative AI in attempt to improve the way developers write code, mostly target at enhancing the TDD flow (Mock et al., 2024)(Shin, 2024). In that trend, (Mock et al., 2024) used two different scenarios to experiment with AI in TDD workflows:

---
title: The two models of AI in TDD workflows
---

flowchart TD
    A[Developer] -->|Navigates| B[Fully-automated Model]
    B -->|AI generates code & tests| C[Developer reviews output]
    C -->|Checks requirements & security| D[Final code]

    A2[Developer] -->|Collaborates| E[Collaborative Model]
    E -->|AI suggests code & tests| F[Developer reviews & modifies]
    F -->|Approves or edits| D2[Final code]

Fully-automated (left) - the developers is a navigator, checking the AI-generated code and tests and making sure that they meet the requirements and are secure.
Collaborative model (right) - the AI suggests tests and code, which the developer can then review and modify it.

The fully-automated model allows the AI to generate tests and code without human intervention, while the collaborative model involves the AI suggesting tests and code, which the developer can then review and modify. (Mock et al., 2024) used a python script to gather the metrics and integrate with ChatGPT API, in addition to that, the fully automated approach was the one that took less time to complete.

(Shin, 2024) adopted a similar approach, and ChatGPT was the model that best scored in terms of requirements, Copilot and Gemini scored the same. In the next section, we will move on to discuss those approaches to combine TDD and LLMs.

Discussion

As insteresting as it might sounds, fully-automated approach is yet a technique to be refined. The vibe-coding selling point is that AI can write code for you (research has shown that it might even generate code that is no more understandable than a human develop (Maes, 2025)), but it still requires a human to review and modify the code to ensure that it meets the requirements and is secure.

So far, in academic research, the focus has been on how AI can assist in TDD workflows, experiments have been under controlled environments. Despite positive results with students in brownfield projects (Shihab et al., 2025), research has is still a way to go in order to support practitioners out there, that need to deal with existing code bases that are a spaghetti and hard to maintain. From this point point of view, it makes even more urgent the need to use AI in creating the safety net that TDD does manually to support software evolution.

Professional software development has for long being a team sports activity, dating back from the agile boom. Research suggests that developers read more code than write it (Hermans, 2021). Rather than prioritizing speed alone, LLM development should focus on fostering understandability, maintainability, and a sustainable pace. Achieving a task quickly is a beneficial consequence, but quality should remain a primary concern. Unless the game rules change. Is it possible to generate things quickly and avoid maintainence at all?

AI in TDD Workflows in practice

Research has shown that AI can significantly reduce the time spent on writing tests and improve the overall quality of the codebase. In this article, we will delve into how AI can be effectively integrated into TDD workflows, providing practical examples and insights. Let’s explore how AI can be integrated into TDD workflows with practical examples, based on experiences I have had.

ReactJs

Last year, I experimented with using AI to generate tests for a ReactJs application. I used a combination of AI tools to generate tests for the components, ensuring that they were covered by unit tests. The AI (Copilot) was able to generate tests that covered the basic functionality of the components, but it still required manual intervention to ensure that the tests were meaningful and covered all edge cases, the most common aid needed were to identify the test-doubles that were needed to isolate the components from their dependencies. The flowchart that follows is similar to what (Mock et al., 2024) reported.

flowchart TD
    A[Developer: Ask LLM to generate test case for current file] --> B[LLM: Generates the test case]
    B --> C[Developer: Accepts the suggestion and runs the test]
    C --> D[Developer: Updates the test case if needed]
    D --> A

A few things to note about this flow:

The developer is actively involved in the process, reviewing and modifying the AI-generated tests.
The AI is used to generate the initial test cases, which can save time and effort. However, from practical experience, LLMs are not always able to generate the test-doubles needed to isolate the components from their dependencies.
The developer is responsible for ensuring that the tests are meaningful and cover all edge cases, which is a crucial part of TDD.

References

Mock, M., Melegati, J., & Russo, B. (2024). Generative AI for Test Driven Development: Preliminary Results. International Conference on Agile Software Development, 24–32.
Alves, V., Bezerra, C., Machado, I., Rocha, L., Virgı́nio Tássio, & Silva, P. (2025). Quality Assessment of Python Tests Generated by Large Language Models. ArXiv Preprint ArXiv:2506.14297.
Trinh, T.-B., & Truong, N.-T. (2024). Improving Test-Driven Development with Automated Test Case Generation from Use Case Specifications.
Shin, W. (2024). A Study on Test-Driven Development Method with the Aid of Generative AI in Software Engineering. International Journal of Internet, Broadcasting and Communication, 16(4), 194–202.
Maes, S. H. (2025). The gotchas of ai coding and vibe coding. it’s all about support and maintenance. OSF.
Shihab, M. I. H., Hundhausen, C., Tariq, A., Haque, S., Qiao, Y., & Wise, B. M. (2025). The Effects of GitHub Copilot on Computing Students’ Programming Effectiveness, Efficiency, and Processes in Brownfield Programming Tasks. ICER.
Hermans, F. (2021). The Programmer’s Brain: What every programmer needs to know about cognition. Simon and Schuster.

Table of contents