Key Points
- Test-driven development (TDD) follows three phases: red (write failing test), green (write minimal code to pass), and refactor (clean up code)
- AI coding agents typically don't naturally follow TDD - they prefer big bang approaches over iterative cycles
- Standard prompting techniques to encourage TDD have mixed results and agents often give up after a few attempts
- TDD Guard is a library that enforces TDD principles through guardrails that cannot be bypassed
- The experiment used eShop on Web reference application, a .NET application with domain-driven design concepts
- The test feature was "splitting a basket" - separating expensive items (>$100) from cheap items into two baskets
- Without TDD Guard: Claude Code completed the feature in 4 minutes but used "test first" development, not true TDD
- With TDD Guard: The process took 10 minutes but enforced proper TDD methodology through blocking violations
- TDD Guard works by using hooks that trigger before file writes, calling an internal AI judge to validate TDD compliance
- The library uses its own LLM instance as a judge to determine if TDD principles are being followed
- TDD Guard supports multiple languages including Python, TypeScript, Golang, and .NET
- The tool requires hook support in the AI coding assistant and language-specific test framework integration
- TDD Guard successfully caught violations like writing multiple tests at once instead of one at a time
- The library enforced outside-in development approach, flagging bottom-up implementation attempts
- Code quality with TDD Guard was somewhat higher due to preventing unnecessary method invention
- Trade-offs include increased token consumption and slower execution due to AI judge validation
- TDD Guard maintains state through shared folders containing file updates and test outcomes
- The library is the best available tool for enforcing TDD with AI coding agents, despite performance costs
Full Transcript