test-driven development with agent - for typescript development

Last updated Feb 13, 2026 Published Feb 14, 2026

The content here is under the Attribution 4.0 International (CC BY 4.0) license

The emergence of AI coding assistants has created opportunities to enhance test-driven development practices. Research by Baudry et al. (Baudry & Monperrus, 2024) demonstrates that generative AI can produce test data with accuracy, suggesting potential for broader TDD support. In a previous post, I explored the literature and the match that TDD principles have with AI capabilities. This guide demonstrates how to approach Copilot as an AI agent specifically for TDD workflows, covering configuration and practical implementation.

Understanding AI agents in the context of TDD

AI coding agents are tools that assist developers by generating code, suggesting implementations, and automating repetitive tasks, they are known by generating extensively amounts of code that might lead to cognitive overload.

When applied to TDD, these agents can support the red-green-refactor cycle (Beck, 2002) by generating test cases, suggesting implementations, and identifying edge cases.

The integration of AI in TDD aligns with the practice’s core principles. TDD emphasizes writing tests before implementation, and AI agents can accelerate this process while maintaining the discipline of test-first development.

Prerequisites

Before configuring Copilot for TDD, ensure you have:

  • Visual Studio Code or compatible IDE with Copilot support
  • GitHub Copilot (or another coding assistant, however, this guide focuses on Copilot)
  • Basic understanding of TDD principles (red, green, refactor cycle)
  • Jest installed in your project (for other programing languages you should have the corresponding testing framework installed)

Step 1: Setting up GitHub Copilot for TDD

GitHub Copilot provides context-aware code suggestions. To optimize it for TDD workflows:

Installation

  1. Install the GitHub Copilot extension in Visual Studio Code
  2. Sign in with your GitHub account that has Copilot access
  3. Verify the extension is active

In order to follow up we will be using a katas playground repository, specifically in the subfolder solutions/string-calculator in Typescript.

Configuration for TDD

Create a .github/copilot-instructions.md file in your repository root:

This repository contains a typescript codebase focused on TDD katas.

Create a .github/agents/tdd-coach.md file in your repository root with the following content to guide Copilot’s suggestions towards TDD patterns:

Agents for specific path
---
name: TDD Coach
description: An AI agent that assists developers in following test-driven development practices. Provides suggestions for writing tests, implementing code to pass tests, and refactoring while maintaining code behaviour. Offer guidance on test design and test quality, specially for test smells.
---

Focus on the following instructions:
- Test case generation: Suggests test cases based on comments and code context
- Implementation suggestions: Provides minimal code implementations to pass tests
- Refactoring support: Offers suggestions for improving code quality while maintaining test coverage
- Edge case identification: Recommends additional test cases for edge scenarios
- Follows TDD principles: Encourages the red-green-refactor cycle and test-first development. On each step, waits for user input
- Provides feedback on test quality and coverage

When writing tests:
- Follow the Arrange-Act-Assert pattern
- Use descriptive test names that explain the behavior being tested
- Write one assertion per test when possible
- Consider edge cases and boundary conditions
- Use appropriate test doubles (mocks, stubs, fakes) based on the testing strategy

When suggesting implementations:
- Write minimal code to pass the current test
- Avoid over-engineering solutions
- Follow SOLID principles
- Consider existing patterns in the codebase

Using Copilot in the TDD cycle

Red phase (Writing failing tests):

  1. Start by writing a test that you trust, take the time to write it with the dependencies and the assertion you want
  2. Run the tests and see it fail (true red phase)

Example workflow in JavaScript with Jest:

test('should calculate the sum of two positive numbers', () => {
  const calculator = new StringCalculator();
  const result = calculator.add(2, 3);
  expect(result).toBe(5);
});

Run the test suite to confirm it fails for the right reason, when the result is different from the expected value.

Green phase (Implementing code):

  1. Navigate to the implementation file
  2. Write a function signature or class definition
  3. Use Copilot to suggest the minimal implementation
  4. Run tests to verify the implementation passes

Refactor phase:

  1. Use Copilot Chat to ask for refactoring suggestions
  2. Prompt: “Refactor this method to improve readability while maintaining behavior”
  3. Review suggestions and apply incrementally

Step 2: Copilot Chat for Test Strategy (Optional)

While inline Copilot suggestions are the primary workflow, you can optionally use Copilot Chat (in VS Code or GitHub.com) for higher-level test strategy discussion before implementation.

Using Copilot Chat for test design

When planning test strategy, open Copilot Chat and ask:

@workspace I need to implement a validation function for email addresses.
Based on the project's testing patterns, what test cases should I consider?
Consider edge cases, invalid formats, and boundary conditions.

Copilot Chat analyzes your codebase context and suggests test cases aligned with your project’s patterns. However, the primary workflow relies on inline Copilot suggestions rather than chat-based interaction.

Alternative: Using Claude as the LLM

If you prefer Claude’s reasoning capabilities, some Copilot-compatible IDEs (through enterprise policies) may allow model selection. Instructions remain identical—configure .github/copilot-instructions.md as shown in Step 1, and the workflow proceeds using whichever model is configured.

Step 3: Copilot-Driven TDD Workflow

Core workflow pattern

  1. Define behavior with comments - Write descriptive test names and requirement comments to clarify intent
  2. Let Copilot suggest test structure - Accept or modify inline suggestions for test bodies
  3. Review and adjust - Verify test quality, assertions, and edge case coverage
  4. Implement minimally with Copilot - Let Copilot suggest minimal implementations to pass tests
  5. Run and validate - Execute tests and verify behavior matches requirements
  6. Refactor with Copilot support - Use Copilot’s suggestions to improve code quality while maintaining test coverage

Step 3: Practical Example - Calculator with Copilot TDD

To illustrate the complete Copilot-driven TDD workflow:

Phase 1: Test Case Design

Start with a comment describing test intent:

// Test: Calculator should add two positive numbers
// Expected: 2 + 3 = 5
test('should add two positive numbers', () => {
  // Copilot suggests the implementation pattern
  const calc = new Calculator();
  const result = calc.add(2, 3);
  expect(result).toBe(5);
});

// Test: Calculator should handle division by zero
// Expected: throws error
test('should handle division by zero', () => {
  // Copilot auto-completes based on pattern
  const calc = new Calculator();
  expect(() => calc.divide(5, 0)).toThrow('Division by zero');
});

Copilot recognizes the comment pattern and suggests test structure automatically.

Phase 2: Test Implementation

Let Copilot complete remaining test cases based on the established pattern.

Phase 3: Red Phase - Verify Tests Fail

Run tests to confirm they fail before implementing Calculator class.

Phase 4: Green Phase - Implementation

Navigate to Calculator.js and start typing:

class Calculator {
  add(a, b) {
    // Copilot suggests: return a + b;
  }
  
  divide(a, b) {
    // Copilot suggests: 
    // if (b === 0) throw new Error('Division by zero');
    // return a / b;
  }
}

Copilot provides minimal implementations that pass tests.

Phase 5: Refactor Phase

After all tests pass, refactor with Copilot suggestions:

// Consider: Extract validation logic
private validateDivisor(b) {
  if (b === 0) throw new Error('Division by zero');
}

Copilot suggests improvements while maintaining test coverage.

Custom Agent Configurations with Copilot

For teams adopting Copilot-driven TDD, establish shared guidelines:

Repository-level instructions (.github/copilot-instructions.md):

This file is the primary configuration for team TDD standards. Store tested patterns and conventions here:

# Project TDD Standards

## Testing Framework and Configuration

Testing framework: Jest
Test location: __tests__/ directory
Naming convention: [component].test.js

## Required Test Structure

- Describe blocks for each component/function
- Nested describe blocks for different scenarios
- Test names should complete the phrase "it should..."
- Use AAA pattern: Arrange, Act, Assert

## Mocking Strategy

- Use jest.mock() for external dependencies
- Prefer dependency injection over global mocks
- Clean up mocks in afterEach blocks
- Mock API responses consistently across tests

## Copilot-Driven TDD Patterns

When using Copilot inline suggestions:
- Start test files with comment describing test intent
- Let Copilot generate test structure based on established patterns
- Review suggestions for domain-specific edge cases
- Maintain 80%+ mutation testing score

## Common Test Patterns

### Testing API endpoints
- Test success path first
- Then test each error scenario
- Mock external service calls
- Verify status codes and response structures

### Testing async operations
- Use async/await syntax
- Test success and rejection paths
- Use jest.useFakeTimers() for timeout scenarios
- Clean up timers in afterEach

### Generating test fixtures
- Create reusable factory functions
- Place factories in __fixtures__/ directory
- Reference factories by descriptive names

This guide becomes Copilot’s context for all inline suggestions across your team, ensuring consistent TDD patterns.

Limitations and Considerations with GitHub Copilot

While Copilot enhances TDD workflows, awareness of limitations is essential:

Copilot suggestions require critical evaluation:

  • Copilot analyzes patterns from your codebase and training data, but doesn’t understand domain semantics
  • Generated tests may miss business-critical edge cases specific to your application
  • Human expertise remains necessary for comprehensive test design

Context window limitations:

  • Copilot works with the current file and limited surrounding context
  • Complex refactorings or cross-module patterns may not be suggested optimally
  • For multi-file architectural decisions, manual design is often necessary

Over-reliance risks:

  • Developers may accept suggestions without critical evaluation
  • Generated code patterns may work but introduce subtle assumptions
  • Test coverage metrics can be misleading if Copilot-generated tests don’t test meaningful behavior

Domain-specific challenges:

  • Copilot excels with common patterns (CRUD operations, utility functions, API handlers)
  • Domain-specific validation logic, complex algorithms, or business processes benefit from manual design
  • Proprietary business rules should be explicitly coded rather than suggested by Copilot

Complementary to expertise, not a replacement:

  • TDD requires understanding of testing principles, design patterns, and domain knowledge
  • Copilot accelerates implementation of well-understood patterns but does not replace developer judgment
  • The red-green-refactor cycle is most effective when developers actively design tests, not passively accept suggestions

Research on AI-assisted software development (Baudry & Monperrus, 2024) indicates that while AI can generate accurate code patterns, human oversight of suggestion quality remains critical.

Measuring effectiveness

Track these metrics to evaluate AI agent impact on TDD workflows:

  • Test quality: Mutation testing scores measure how well tests detect introduced defects. A mutation testing score above 80% indicates robust test coverage.
  • Development velocity: Time from test writing to passing implementation. This metric helps identify whether AI assistance reduces development friction.
  • Refactoring frequency: Number of refactoring cycles enabled by test coverage. Higher frequency suggests developers feel confident making changes with test safety nets.
  • Code quality: Metrics such as cyclomatic complexity (aim for values below 10 per method) and code duplication (target less than 3% duplication).

Compare these metrics before and after agent adoption to assess quantifiable impact. Research suggests that TDD practices reduce defect rates (Beck, 2002), and AI assistance should maintain or improve these outcomes.

References

  1. Baudry, B., & Monperrus, M. (2024). Generative AI for Test Data Generation. IEEE Software, 41(3), 34–41.
  2. Beck, K. (2002). Test Driven Development: By Example. Addison-Wesley Professional.

You also might like