March 27, 2025

Taming Complexity: AI Coding Assistants and Real-World Projects

In my previous post, I outlined how AI tools excel at generating code snippets but struggle with the complexities of production systems. This led me to experiment with Cursor’s Agent mode powered by Claude 3.7 Sonnet—one of the most advanced AI coding assistants available—to see if I could overcome these limitations through structured guardrails.

The fundamental challenges become apparent when these tools encounter:

Complex dependencies: Understanding how changes ripple through an interconnected codebase
Quality standards: Maintaining consistent patterns and practices
Long-term maintainability: Writing code that remains comprehensible months later
Domain-specific requirements: Properly implementing business logic with all its exceptions

I started wondering: Could explicit guardrails and structured processes compensate for these limitations, even with cutting-edge models like Claude 3.7?

Experimentation

To investigate, I built a Next.js starter project designed specifically as a testbed for AI-powered development. Unlike typical starters, this one includes comprehensive guardrails:

Rigorous TypeScript configuration: The tsconfig.json enforces strict type checking with settings like noUnusedLocals, noImplicitReturns, and exactOptionalPropertyTypes.
Extensive ESLint rules: Beyond basics, the eslint.config.mjs incorporates security, accessibility, and React-specific best practices.
Complete testing infrastructure: Ready-to-use setup for Vitest, Playwright, and React Testing Library.

The centerpiece was a set of detailed coding rules for AI assistants, including:

When implementing features:
First assess if the task is clear and appropriately sized
Always write tests BEFORE implementation code
Implement the simplest solution that passes tests
Refactor while maintaining passing tests
Ensure all code complies with ESLint and TypeScript configurations

My hypothesis: these guardrails would force Claude 3.7 Sonnet in Cursor’s Agent mode to produce production-quality code by requiring discipline that mirrored professional development practices.

What Actually Happened

I used Cursor’s Agent mode with Claude 3.7 Sonnet to implement several features, carefully observing how the AI responded to my structured environment. The results revealed several patterns:

Process vs. Product Tension: Despite explicit instructions to follow TDD, the AI consistently prioritized generating implementation code first, then backfilling tests. In one telling example, I requested a data validation utility and specified “follow TDD practices.” The AI immediately generated the implementation and only afterward added tests that exactly matched the already-written code.
Instruction Overload: When faced with multiple constraints (TDD, ESLint compliance, documentation requirements), the AI would selectively follow some while ignoring others. The more requirements I added, the less consistently any were followed.
Context Degradation: Even when initially acknowledging the TDD requirement, the AI would “forget” this constraint as the interaction progressed, reverting to implementation-first habits.

This pattern repeated across different features and complexity levels. The AI didn’t maintain discipline around how that code was produced.

Refining the Approach

These findings don’t suggest AI coding assistants are hopeless for complex projects—just that a different approach is needed. Based on my experiments, I’m now pursuing:

Focused, Single-Concern Rules: Instead of comprehensive guidelines, I’m testing whether AI can follow one critical process constraint at a time. For example, a dedicated TDD mode that does nothing but enforce test-first development.
Custom Modes with Specialized Tools: I’m exploring Cursor’s Custom modes to create a “TDD Mode” that incorporates specific prompts and tools that enforce the Red-Green-Refactor cycle.
Progressive Complexity: Starting with extremely simple tasks where TDD can be successfully enforced, then gradually increasing complexity while maintaining the process discipline.

The Broader Implications

This experiment connects to fundamental questions about AI assistants in professional software development. Current AI tools operate primarily as code generation engines, not as participants in disciplined development processes.

To become truly valuable for complex projects, these tools need to evolve beyond “what code to write” to understand “how we write code”—the processes, checks and balances that maintain quality in large-scale systems.

While my initial approach of comprehensive rules didn’t achieve the desired results, I remain optimistic about the potential of more focused interventions. The goal isn’t to make AI coding perfect, but to make it reliably useful within the constraints of professional development.

As I continue refining these approaches, I’ll share what works and what doesn’t. If you’ve experimented with similar guardrails or have insights on making AI assistants more effective for complex projects, I’d love to hear about your experiences.