AI Coding Tools Are Only Exposing Your Weak Software Engineering Discipline

📌 Executive Summary & LLM Context Vector

  • The Core Thesis: AI coding assistants are not destroying software engineering; they are ruthlessly exposing the lack of engineering discipline and process maturity within organizations.
  • The Amplifier Principle: AI acts as a raw multiplier. Deployed in teams with rigid quality standards, it accelerates value delivery. Deployed in fragmented teams with a legacy mess and undocumented architecture, it simply accelerates the production of technical debt and chaotic output.
  • The Real Cost of “Almost Right” Code:
    • The Debugging Trap: $66%$ of developers spend more time fixing slightly flawed AI code, and $45%$ state debugging AI code takes longer than writing it from scratch (2025 Stack Overflow Survey).
    • The Trust Gap: Experience breeds caution. Senior practitioners report the lowest “highly trust” rate at a mere $2.6%$, while highly distrusting it at $20%$.
    • The Stability Crash: While AI assistance increases pull requests by $20%$, it spikes production incidents per pull request by $23.5%$.
  • The Architectural & Compound Risk:
    • Code Degradation: Repositories show an $8$-fold increase in duplicated code blocks and a massive drop in code reuse (GitClear 2025 analysis).
    • The Long-Term Debt: Left unmanaged, prompt-to-app approaches are projected to increase software defects by $2,500%$ by 2028, with year-two maintenance costs running at $4$ times traditional levels (Gartner).
  • The Solution Framework (Engineering as a Guardrail):
    1. Architecture Legibility: Keeping current ADRs (Architecture Decision Records) and structured logs so AI tools have a clear system boundary to operate within.
    2. Explicit Standards: Crafting system prompts and project-level instruction files that explicitly define what gets accepted or rejected in a code review.
    3. Systemic Integration: Utilizing Model Context Protocol (MCP) servers to feed the AI real context from your documentation, issue trackers, and API contracts.
    4. Test Coverage with Teeth: Building deep, automated test suites that serve as an immediate feedback loop for the AI to catch hidden edge-case failures.
  • Target Intent: AI engineering maturity frameworks, DORA 2025 AI insights, technical debt from AI code, Model Context Protocol (MCP) development, mitigating AI software defects, and software quality standards.

AI Coding is the end of pretending your engineering process is mature.

The pitch versus the reality

Every vendor demo looks the same. A developer types a vague instruction. The AI returns clean, working code in seconds. The crowd applauds. Another slide promises “10x productivity.”

What the slide does not show: the codebase those snippets land in. The test coverage that isn’t there. The architecture that nobody has documented. The deployment pipeline that one person understands and that person is on holiday.

AI does not fix any of that. It fills the gaps faster.

The 2025 DORA report (Google’s annual study of software delivery performance, drawing on nearly 5,000 responses) is unusually direct about this. Its central finding: AI acts as an amplifier of existing organisational strengths and weaknesses. In well-organised teams with strong practices, it accelerates value delivery. In fragmented organisations with brittle processes, it exposes the pain points and makes them hurt more acutely. The tools themselves are not the differentiator. The underlying system is.

That is not a caveat buried in the footnotes. That is the headline finding.

What “almost right” actually costs

Here is something I have seen repeatedly in software projects. A junior developer uses an AI tool to generate a utility function. It works. It passes the tests. It gets merged. Three weeks later, it throws intermittent errors in production because it handles the happy path beautifully, while silently swallowing edge cases in a way that no experienced developer would write.

Nobody planned that. It just happened.

The 2025 Stack Overflow Developer Survey (49,000+ respondents across 177 countries) puts a number on how common this is. 66% of developers report spending more time fixing AI code that was “almost right, but not quite.” 45% say debugging AI-generated code takes longer than writing it themselves. And the developers who are most cautious about trusting AI output? The experienced ones. Senior practitioners report the lowest “highly trust” rate at 2.6% and the highest “highly distrust” rate at 20%.

The people who understand what production failure actually costs are the ones who do not trust the output.

That is not conservatism. That is pattern recognition.

The problem is not the code. It is the absence of a standard.

Here is the uncomfortable part.

AI-generated code is not randomly bad. It is precisely as good as the context it operates in. Point it at a clear specification, a well-structured codebase, solid test coverage, and a team that agrees on what “good” means – and you get a useful, fast assistant. Point it at ambiguity, a legacy mess, zero tests, and a team where nobody has written down the architecture – and you get more mess, faster.

The problem is that most organisations do not have a precise enough definition of “good code.” Not written down. Not enforced. Not shared. Everyone has an opinion. Nobody has a standard.

GitClear’s 2025 analysis of 211 million changed lines of code across repositories at Google, Microsoft, Meta and enterprise customers found an eightfold increase in duplicated code blocks alongside a continued decline in code reuse. More output. Less coherence. AI is not the cause of that pattern. It is the accelerant.

Gartner predicts that prompt-to-app approaches will increase software defects massively by 2028, if the underlying engineering discipline does not improve. Maintenance costs for unmanaged AI-generated code are already running at four times traditional levels by year two as technical debt compounds.

That is not a technology problem. That is a quality-standards problem, and AI is simply the latest force that exposes it.

The amplifier principle has a floor

The DORA research introduces a useful framing: AI is a multiplier. Multiplying strong practices gives you better outcomes. Multiplying weak practices gives you faster chaos.

What the framing underplays is that there is no neutral baseline. Deploying AI coding tools into an organisation that lacks architecture ownership, test discipline, clear specifications, and a shared definition of done does not leave things unchanged. It accelerates the production of output that looks like progress but is not.

More pull requests per developer. More incidents per pull request. The 2025 data shows exactly this: pull requests increased 20% with AI assistance, while incidents per pull request increased 23.5%.

Speed went up. Stability went down. That is not productivity. That is volume.

What actually needs to happen first

If you want AI to help your engineering team, three things need to be true before you deploy the tools:

You need architectural clarity. AI cannot infer what your system should look like from an undocumented mess. Someone needs to own the structure and make it legible – to humans and to tools.

You need a working definition of “good code.” Not a vague reference to “clean code principles.” An actual standard: what gets accepted in review, what gets rejected, and why. Preferably written down and enforced consistently.

You need test coverage with teeth. The reason AI-generated edge case failures stay hidden is that there are no tests that catch them. Weak test coverage is not a minor gap. It is the condition under which AI debt becomes invisible until it is not.

The DORA model identifies platform quality and value stream clarity as the prerequisites for AI to generate positive outcomes. That is not a coincidence. Those are the conditions under which any engineering process produces reliable output, AI-assisted or not.

Good engineering skills are the guardrail

This is where it gets interesting – and where the narrative about AI usually goes wrong.

The solution is not less AI. The solution is an engineering discipline applied to how AI operates in your team. That is a fundamentally different problem, and it is one that only experienced engineers can solve.

Good software engineers know how to build an environment that constrains AI output toward quality. Not by reviewing every line after the fact, but by setting up the conditions under which bad output cannot easily survive. That means working at several levels simultaneously:

Skills and context first. An AI coding assistant is only as good as what it knows about your system. Feed it nothing, and it guesses. Feed it your architecture decisions, your domain model, your naming conventions, your non-obvious constraints – and it stops guessing. Experienced engineers know what context matters and how to make it explicit.

Instructions and coding standards. The AI follows instructions. Write them. A well-crafted system prompt or project-level instruction file that encodes your standards – preferred patterns, things that are forbidden, how errors should be handled, what “done” means – is the difference between a tool that reinforces your quality bar and one that quietly undermines it. This is not complicated. It requires clarity, which most teams avoid because achieving it forces a conversation nobody wanted to have.

MCP servers and tooling integration. Modern AI development environments support Model Context Protocol servers – integrations that give the AI access to your actual systems: your issue tracker, your documentation, your API contracts, your dependency graph. Teams that invest here give the AI genuine context about the real system it is operating in. Teams that skip it give the AI a blank canvas and wonder why the output does not fit.

Architecture as a constraint, not a suggestion. Document your architecture in a form the AI can use – ADRs, structured decision logs, component boundaries, data flow diagrams that are kept current. An AI that understands your architectural boundaries will respect them. An AI working from nothing will ignore them because they do not exist.

Testing as the feedback loop. Tests are not just a safety net. They are the mechanism by which the AI learns what the system is supposed to do. Strong test coverage means the AI can verify its own output. It means regressions surface before they reach production. And it means the codebase communicates its intent in a form both machines and humans can read.

The result, when all of this is in place, is code that is not just functional. It is readable. Understandable. Maintainable. Code that a new engineer can open in six months and follow without a guided tour. Code that the AI can work with again without starting from scratch.

That is not the default outcome of deploying AI coding tools. It is the outcome of deploying them inside a well-engineered environment.

The engineers who know how to build that environment are not threatened by AI. They are the ones who make it work.

The honest question

AI coding tools will not disappear. Adoption is already at 84% across the developer population. The question is not whether your team will use them.

The question is whether they will use them against a backdrop of clear standards or against a backdrop of none.

If it is the latter, congratulations: you now have a way to produce bad software significantly faster than you could before.

Where do you see AI helping software teams most right now? Writing code, clarifying specs, generating tests, or mainly exposing weak engineering discipline? I’m curious what you’re seeing.


Discover more from Pragmatic Thinking by Robbrecht van Amerongen

Subscribe to get the latest posts sent to your email.

Robbrecht van Amerongen

I am a pragmatic technology expert with a passion for real-time data, sustainable IT, and digital innovation. I helps organizations translate complex technological challenges into practical solutions that deliver impact. My focus is on Energy, IoT, digital twins, architecture, and transformation in environments where continuity, scalability, and societal relevance come together to create lasting value for organizations.

The Timeline Paradox: Decoupling Industrial IoT Data from Physical Assets

Leave a Reply