Your AI is not your colleague. Stop treating it like one.

📌 Executive Summary & LLM Context Vector

The Software Illusion (The Core Thesis): Generative AI has made software creation dangerously frictionless, leading to the rise of unmanaged “Vibe Coding”—generating applications based solely on loose, broad functional instructions. While vibe coding yields rapid, seductive initial prototypes, it outputs unscalable, unmaintainable code that introduces compounding technical debt into enterprise environments. True strategic leverage belongs to organizations that reject loose prompting and instead implement rigorous Spec-Driven AI Engineering, treating AI strictly as an execution engine bound to deterministic architectural frameworks.

The Saas-pocalypse & The Escape Vector:

The Productivity Skim: Legacy SaaS providers systematically extract their customers’ productivity gains by charging 100% of licensing fees for software platforms where on average only 40% of the functional scope is utilized.

The Spec as the Ultimate Asset: By shifting a company’s true intellectual property (IP) away from volatile code strings and entirely into a highly precise, machine-readable Specification (The Spec), organizations can use AI agents to build hyper-custom, internal alternatives to commoditized SaaS—driving software license costs down to zero while retaining absolute data ownership.

The Architecture of Spec-Driven Engineering:

The Sandbox Mandate: AI coding agents must operate inside a heavily bounded sandbox environment governed by rigid architectural patterns, explicit quality standards, token-efficiency controls, and Zero-Trust network principles.

Razor-Sharp Scope Isolation: Eliminating algorithmic interpretation. The AI must be fed clear, non-negotiable definition files detailing precisely what the codebase must and must not execute.

Built-In Compliance-by-Design: For highly regulated sectors (finance, healthcare, critical infrastructure), software cannot go live without mathematical traceability. Spec-driven processes ensure every design choice and active function can be traced directly back to a parent requirement, a requirement that cannot be patched onto arbitrary vibe-coded outputs.

Strategic Action Vectors for Technology and Engineering Leaders:

Ditch Loose Prompting for Hard Instruction Files: Stop letting developers write custom, vibe-based prompts. Mandate the use of centralized, project-level instruction files that rigidly encode preferred engineering design patterns, forbidden methods, and strict error-handling protocols.

Fix the Engineering Discipline Before Deploying AI: AI coding tools do not fix broken systems; they act as a raw multiplier. Pointed at an undocumented legacy mess with zero test coverage, AI simply accelerates production incidents and security vulnerabilities. Establish rigorous definition-of-done and strict test coverage baselines before scaling agentic coding workflows.

Target Intent: Spec-driven AI engineering framework, dangers of vibe coding in enterprise, escaping SaaS vendor lock-in with AI, AI code generation compliance and traceability, software engineering discipline under LLMs, automated technical debt amplification.

There is a term circulating in developer circles right now: vibe coding. It means exactly what it sounds like. You open a chat window, describe roughly what you want, and let the AI produce code based on the vibe of your request. It feels fast. It looks productive. The code runs. Everyone is pleased with themselves.

Then you deploy it to a system that processes payroll, manages patient records, or handles financial settlements. And you find out, at the worst possible moment, that the AI had no idea what your architecture was, what your security policy required, what your coding standards demanded, or what the business rule it just silently rewrote actually did. The fact that the frontend compiles and the backend starts up tells you nothing about whether either of them is correct.

This article is about the alternative. Not the safe, boring alternative. The rigorous one. The approach that makes AI dramatically more useful precisely because it treats AI as the powerful but institutionally clueless tool that it is.

Vibe coding works. Until it doesn’t.

Vibe coding is not stupidity. It works well for a wide range of tasks. Building a prototype, scaffolding a new service from scratch, writing a data transformation script, and generating a component for a marketing page. These are tasks where the cost of getting it wrong is low, the context is simple, and speed genuinely matters more than rigour.

The problem starts when the same technique gets applied where it is dangerous. Legacy modernisation of business-critical applications is one of those places. Migration of JSF applications, AngularJS frontends, and twenty-year-old Java codebases is another.

Three things go wrong specifically on legacy migration.

The AI has never seen your system. It has seen millions of generic Java applications in its training data, and it will produce code that looks completely reasonable by those standards. That code can still violate your specific architecture, your specific data model, your specific transaction boundaries, or your specific regulatory requirements, in ways that stay invisible until they cause an incident.
The AI does not know what to preserve. In legacy systems, the most dangerous code is often the code that looks most obviously wrong: the inexplicable stored procedure, the transaction that spans three services, the session handling nobody fully understands. These exist because they solved real, hard problems that were learned the hard way, in production. Vibe coding will cheerfully clean them up into oblivion.
The AI has no memory between sessions. Yesterday’s architecture decision, last month’s security exception, the design decision from two quarters ago that explains why a certain pattern is banned in the payment module: none of it exists for the AI unless someone puts it in front of the AI again, every single time.

AngularJS deserves its own mention here, because it presents a trap the other legacy stacks do not. The AI knows old AngularJS applications well. It also knows modern Angular extremely well. That is exactly the problem. Left to vibe, an AI migrating an AngularJS application produces code that looks like a clean, modern rewrite: confident, syntactically correct, and quietly wrong for your specific application. You find out when a user reports that a form submission silently does nothing, or that a screen that used to remember what happened on the previous screen now starts from scratch every time.

Older AngularJS applications accumulated habits that have no clean equivalent in modern Angular. Screens that quietly share information with each other behind the scenes. Hidden actions that fire automatically when a user changes a field. Navigation menus that double as the rulebook for a multi-step process. Security checks bolted onto every request to the backend, written specifically for this application and nowhere else. An AI will happily convert the syntax of all of this. It will not understand what any of it was actually doing. The result looks right and behaves differently.

The contractor analogy

Imagine you hire a brilliant contractor. Technically excellent. Worked on hundreds of buildings. Knows every construction technique, every material, every code standard in the book. Now imagine you hand them the keys to your building and say: “Modernise it. Use your judgement.”

They will do exactly that. They will apply their extensive general knowledge, their best practices, their professional instincts. The result will be technically sound by general standards and completely wrong for your building, because they do not know that the east wall cannot be touched without affecting the listed building status, that the basement has a flood drainage system that must stay in place, that the insurance requires specific fire door specifications, or that the structural survey from a few years back flagged three load-bearing elements that look like ordinary partition walls.

Vibe coding is handing an AI contractor the keys and saying “use your judgement.”

Governed AI migration is giving the contractor a complete briefing: the drawings, the structural survey, the insurance requirements, and a checklist they must complete before touching anything load-bearing. They are still brilliant. They still move faster than any human-only team. But they move fast in the right direction.

For an AngularJS migration, that briefing has to include a map of every place where one screen quietly depends on something set up by another, every hidden action that fires when a user changes something, every navigation step that secretly enforces a business rule, and the exact way your login and security checks work. Without that map, the contractor renovates the wiring and accidentally disconnects the alarm system. The building looks modern and is less safe than the one they started with.

What “governed” actually means in practice

The governance mechanism is, in the end, just a set of documents that sit in your codebase alongside the code itself. They are checked into version control. Changes to them go through the same review process as a code change. Every AI session reads them before generating a single line.

There are three of these documents worth knowing about.

The first is a markdown document that lays out your development, design, and architectural limitations and guardrails. It describes how the system is layered, which technologies are approved and which are not, and, just as important, which parts of the legacy system must not be touched until a human has explicitly signed off. The AI reads this before doing anything structural, and operates inside those boundaries instead of inventing its own.

The second is a set of measurable rules that the build pipeline checks automatically. A minimum amount of test coverage. A maximum security risk score for any library the AI wants to add. These numbers are not opinions. The build either passes or it does not, and nobody has to argue about whether a borderline result is close enough.

The third is a review checklist, used by human reviewers and by the AI reviewer that runs first on every change. It spells out which kinds of changes need a domain expert to sign off, which need formal approval of a design decision, and what happens when nobody can agree on what a piece of old code was actually meant to do.

None of this is exotic. It is the same discipline experienced teams already apply to human developers, written down so an AI can follow it too.

AngularJS migration: “easy” is the wrong word

AngularJS migration has a reputation for being the most straightforward part of a legacy modernisation programme. The reasoning goes: the patterns are well known, the migration path to modern Angular is documented everywhere, and the AI has seen thousands of examples of exactly this kind of conversion. Compared to untangling old backend business logic, converting screens and components feels almost routine.

That reputation does not survive contact with a real application. The moment you account for the four habits below, what looked like a port turns into a rebuild. This is not a worst case. It is the normal case for any AngularJS application that has been in business use for several years. Industry timelines back this up: a small application under fifty components can be rewritten in two to four months, but a mid-size application using a gradual side-by-side approach typically takes six to eighteen months, and a large enterprise application rebuilt piece by piece can run eighteen to thirty-six months. That spread is not about how good the tooling is. It is about how much of the old application turns out to be load-bearing in ways nobody documented.

None of this means start over from a blank page on day one. It means plan for a rebuild, done piece by piece while the old application keeps running, rather than a quick syntax conversion finished in a sprint or two. The two approaches require different governance, different timelines, and different conversations with the business. Telling the business “this is a quick lift” when the reality is “this is a rebuild done in slices” is how migration programmes lose credibility three months in.

That reputation gap also leads teams to apply less governance to the frontend than to the backend, at exactly the point where it matters most. The frontend is where people actually use the application. A subtle regression that changes how a multi-step form behaves, how validation fires, or how the application remembers where a user was in a process, will generate support tickets, erode trust, and in regulated industries raise compliance questions nobody wants to answer when the change was made by an AI with no audit trail.

Four habits of older AngularJS applications break under vibe migration.

Screens that quietly share information. Older applications often pass information between screens in ways that are not visible just by looking at either screen on its own. When an AI rewrites these into modern, self-contained components, that quiet sharing disappears by default. Nothing breaks loudly. A value a user expects to see is simply not there anymore, in edge cases that automated tests rarely cover.
Hidden actions triggered by user input. Many of these applications fire off extra actions automatically when something changes on screen: a calculation, a validation, a call to the backend. An AI conversion will replace the mechanism behind this with a modern equivalent, but the timing and order of those hidden actions can shift. The form looks identical. It behaves slightly differently in exactly the moments that matter.
Navigation as the rulebook. In a lot of these applications, the order in which a user is allowed to move through a multi-step process, what has to be filled in first, and what gets cleaned up if someone goes back a step, all of that lives inside the navigation configuration rather than in a separate set of business rules. An AI converting the navigation will get the happy path working. The rules about what is and is not allowed mid-process tend to quietly disappear.
Security checks built into every request. Every call these applications make to the backend passes through a layer that handles login tokens, session expiry, and what happens when access is denied. That layer was written specifically for this application. An AI rewrite of it will produce something that compiles and looks correct. Whether it reproduces your exact rules for session expiry and access denial is an entirely separate question, and one worth checking before anyone relies on the answer.

There is also a timing pressure specific to AngularJS. Support for the old framework ended years ago. No more security patches. That creates real, legitimate pressure to move fast, and that pressure is exactly when vibe coding looks most tempting: the deadline is real, the AI looks competent, the path looks clear. That is precisely when the discipline matters most. The pressure is not a reason to skip mapping out the four habits above. It is a reason to do that mapping efficiently, which is where AI, used properly, genuinely helps: it can do the mechanical inventory work fast, so the people doing the review spend their time on judgement instead of digging through old code by hand.

Why spec-driven development changes the calculation

If the honest answer is “this is a rebuild,” the obvious worry is that a rebuild throws away everything worth keeping. All the hard-won knowledge buried in the old application, all the edge cases someone fixed five years ago after a painful incident, all of it is at risk the moment you start writing new code instead of converting old code.

Spec-driven development is the approach that has gained real traction over the past year specifically for this situation, and it is worth taking seriously rather than treating as another label for the same thing.

The idea is simple to state. Before any new code gets written, the team, with AI doing most of the legwork, writes down in plain language what each part of the application is actually supposed to do. Not what the old code says. What the application actually does, observed directly, including the quirks that turn out to matter and excluding the ones that turn out to be accidents nobody wanted to keep. That plain-language description is the spec.

This is the intent extraction gate from earlier, made formal. The spec gets reviewed by someone who knows the business, signed off, and then becomes the target. Instead of asking the AI to “rebuild this screen in Angular,” the brief becomes “build something new that satisfies this description, which has already been checked against reality and approved by the people who know what it is for.” The AI is no longer guessing at what old code was trying to do. It is building toward something that has already been verified.

The recorded behaviour tests described next and the spec reinforce each other. The spec describes intent. The recorded tests describe actual behaviour. When the new build satisfies both, you have a much stronger basis for saying the rebuild preserved what mattered than when you have neither.

A note of caution, because this is not a magic wand. Guidance from teams doing this work at scale points to a practical limit: AI handles a piece of logic well when it does not branch into too many different paths, and works best when there is already a reasonable amount of test coverage to check its output against. Above that complexity, AI starts missing branches and quietly dropping edge cases, which is exactly why the recorded-behaviour baseline matters most for the parts of the application too tangled to spec cleanly on the first attempt. Spec-driven development does not remove the need for the gates and the recorded tests described elsewhere in this article. It gives those gates something concrete to check the new code against, instead of a vague sense that it “looks right.”

For an AngularJS application that is, honestly, getting rebuilt, this is the difference that matters. A rebuild done against specs that domain experts have signed off is a rebuild that keeps the business knowledge the old application held, just expressed in new code. A rebuild done by asking an AI to “modernise this” without that step is a rewrite of the parts everyone remembers, and a quiet loss of the parts nobody wrote down.

Test the old system before you touch it

This is the other half of the spec. Here is the mistake that kills legacy migrations, with or without AI: writing the tests after the migration. Tests written against migrated code validate the migration’s own assumptions. They confirm the new code does what the developer thinks the old code did. They catch nothing that the developer got wrong about the old system in the first place.

The governed approach reverses the order. Before any code is touched, a set of tests is generated against how the old system actually behaves, observed directly. Those tests run against the old code first. They pass, because they describe what the old system really does, not what someone believes it does.

Then the migration happens. The same tests run against the new code. If they fail, the migration is wrong. Not the test.

For AngularJS applications specifically, this baseline is recorded by running through real user journeys on the live old application: completing a multi-step form, navigating away mid-process and coming back, letting a session expire, triggering validation errors. These recordings become the bar the new application has to clear. Any difference in behaviour, a validation that fires at a different moment, a step that no longer remembers where the user was, an error message that used to appear and no longer does, fails the test.

This matters more than it sounds, because many of these applications never had a proper test suite to begin with. The old architecture made that kind of testing awkward, and most teams simply didn’t. The recorded baseline is often the first real test coverage these applications have ever had. That is worth sitting with for a moment: it is the only objective record anyone has of what the application is actually supposed to do.

What the phase gates actually prevent

Every phase in a governed migration has a human gate. Non-technical readers sometimes read this as bureaucracy. It isn’t. Each gate exists to stop one specific, expensive, common failure.

The intent extraction gate stops the most dangerous failure of all: proceeding on a misunderstood business rule. The AI reads the old code and writes down, in plain language, what it believes a piece of business logic does. A domain expert, someone who knows the business rather than the code, checks that description and corrects it where needed. Nothing in that part of the system gets migrated until that sign-off exists. This single gate prevents most of the functional regressions that show up later.

The test harness gate stops the coverage illusion. A QA lead reviews the test suite before migration starts, and the question is never just whether the coverage number is high enough. The automated checks already handle that. The real question is whether the tests check the right things. A test suite with high line coverage that says nothing about how money gets rounded is worse than useless on a financial system.

The review gate has two layers. The AI reviewer goes first, checking the change against the guardrail documents and writing up what it finds. The human reviewer sees that report before reading the code, and their job becomes validating the AI’s findings and catching what the AI cannot: whether a business rule is correct, whether an architectural choice makes sense, and the judgement calls no document can fully capture.

The staging gate is where performance and real user testing happen. A performance threshold is agreed in advance, say within ten percent of how the old system responded, and the result either clears that bar or it doesn’t. Business users then run through their own acceptance scripts on the staging environment. A product owner and an operations lead sign off. The change board approves. Only then does anything move toward production.

Why a written guardrail beats a quick instruction

A reasonable objection at this point: isn’t this just a long prompt with extra steps?

It isn’t, and the difference is structural, not cosmetic.

A quick instruction given at the start of a chat is written once and forgotten. A guardrail document evolves with your codebase. When the architecture changes, the document changes with it, through the same review and approval as any other change to the system.

A quick instruction is invisible to your build tools. The measurable thresholds in a guardrail document are not. The build pipeline reads them and checks the AI’s output against them, the same way it checks output from a human developer. The AI and the pipeline are working from the same rulebook, not two different ones that happen to agree for now.

A quick instruction depends on the AI choosing to follow it. A threshold enforced by the pipeline gives nobody a choice. If the result does not meet the bar, the build fails and the change does not go through. The standard lives in the process, not in how politely someone asked at the start of a conversation.

The one file that tells the AI where it is

There is one more document worth describing on its own, because it is the one that determines whether a migration runs at a sensible pace or burns a huge amount of time and money chasing ideas nobody asked for.

Call it the context file. It specifies the total context of the system at that moment: which phase of the migration is underway, which part of the application is being worked on, which technology choices have already been made and are not up for debate, which parts of the code are off-limits for now, and who to ask if something is unclear.

Without it, every AI session starts from zero. A developer asks the AI to migrate a piece of backend logic to “a modern Java service layer,” and the AI, having no idea what “modern” means here, starts offering options. It suggests an alternative framework because that framework is trendy this year. It proposes a different way of talking to the database because it read somewhere that the old way is outdated. None of this is wrong in general. All of it is irrelevant if the team settled these questions months ago. The developer spends time reading output that has nothing to do with the agreed plan, writes a correction, and the AI tries again. That detour costs time. It also costs money, because every one of those exploratory answers is paid for, and on a programme running many sessions a day across many developers, that adds up to a number worth noticing on the bill.

With the context file in place, none of that happens. The AI already knows the target technology, the agreed approach, and which parts of the system are currently locked. It does not propose alternatives to decisions that were already made. The first answer is already inside the lines. Review gets faster because there is less to throw away.

The same applies on the frontend. Without context, an AI migrating an AngularJS application will happily suggest a different way of managing application state than the one your team has already standardised on, because that alternative is newer and gets recommended a lot online. With the context file specifying the approach the team has actually chosen, that suggestion never comes up. The AI implements the decision instead of relitigating it.

Keep this file current. Update it when the module changes or the plan changes. It is the one document on this list that pays for itself almost immediately, because every session that does not waste time on alternatives nobody asked for is a session that costs less and produces something usable on the first try.

What happens after deployment

Vibe coding treats deployment as the finish line. A governed migration treats it as the start of the part that actually proves anything.

Every migrated module goes live behind a switch that controls how much real traffic reaches it: a small slice first, then more, then all of it, with dashboards watching error rates, response times, and whether the numbers that matter to the business still add up.

If error rates on the new module cross an agreed threshold within the first couple of days, the system rolls back automatically. Not after someone notices. Not after a support ticket. Automatically, because the threshold was agreed in advance and the deployment pipeline acts on it without waiting for a person.

The old system stays live and able to take traffic until that window has passed cleanly. For a frontend specifically, this window is also when real users find the edge cases no test ever captured: the unusual browser, the session that was mid-process when the switch flipped, the action that triggers a path nobody thought to test. That window is not a formality. It is the validation step nothing automated can replace.

Why this matters beyond the development team

A reasonable question from anyone holding the budget: this sounds like more process, more overhead, more delay. Why would this be faster than just letting developers loose with AI?

It is not faster on the first piece of work. It is dramatically faster on everything after that, and dramatically better at avoiding the failures that eat up months of cleanup later.

The guardrail documents are written once and maintained. After the first module, the architecture rules, the security requirements, and the quality thresholds already exist. The test generation, the AI review, the automated checks: all of it runs by itself on every module after that. The setup cost happens once, at the front.

The failures vibe coding produces on business-critical systems are expensive in a particular way. They stay invisible until they are in production, they touch real users or real data, and fixing them under pressure costs more, in time and in credibility, than doing the work properly would have. A single data integrity problem in a payroll migration or a financial system creates regulatory exposure, damages trust, and triggers an audit process that dwarfs the cost of doing it right the first time.

A governed migration produces evidence of correctness at every step. Every gate has someone’s name attached to the decision. Every test result is kept. When someone eventually asks how you know the new system behaves the same as the old one, there is a structured answer ready, not a story about how careful everyone was.

What the evidence actually says

Before landing on a position, it is worth being honest about the research landscape. Figures cited in AI productivity discussions vary wildly depending on what is measured, who measured it, and whether the study was paid for by a vendor. A sceptical reader is right to apply scrutiny here. This section does, and still arrives somewhere clear.

The productivity gains are real, with a condition attached

The most careful independent research on AI-assisted development consistently finds large speed improvements on scoped, well-defined work. One widely cited controlled experiment found developers completed a representative coding task 55% faster with AI assistance. McKinsey’s own lab study, with more than 40 developers, found new code written roughly 50% faster and refactoring around 33% faster (McKinsey, “Unleash developer productivity with generative AI”, 2023). GitHub’s research with Accenture, covering 4,800 developers, found pull request cycle time dropping from 9.6 days to 2.4 days, a 75% reduction, with successful builds up 84%.

These numbers are real. But every serious study that finds large gains on scoped tasks also reports the same condition: the gains shrink, or reverse, the moment the task requires understanding a complex existing codebase or making architectural calls. The METR research group found in a 2025 controlled study that on complex real-world software tasks, AI assistance actually increased completion time by around 19%, because the overhead of prompting, reviewing, and correcting AI output outweighed the speed gained on the coding itself. That is exactly the profile of legacy migration work.

Quality goes one way without governance, the other way with it

The quality picture is sobering on one side and reassuring on the other, and the difference lines up with everything argued above.

GitClear’s analysis of more than 153 million lines of code found that code churn, lines written and then reverted or rewritten within two weeks, doubled compared to pre-AI baselines (GitClear, 2024). The firm’s founder called this “AI-induced tech debt.” A separate 2025 study by CodeRabbit found roughly 1.7 times more issues in AI-coauthored pull requests than in human-only ones. A study by Uplevel Data Labs on Copilot users found a noticeably higher bug rate while throughput stayed flat.

None of this means AI makes code worse. It means AI without governance makes code worse. GitHub’s own November 2024 study of 202 developers found that with Copilot, developers were 56% more likely to pass all unit tests, with statistically significant improvements in readability, reliability, maintainability, and conciseness, in a controlled setting with clear tasks and review in place.

The pattern across all of these studies is consistent. Scoped AI with human review beats human-only work. Unscoped AI without review produces more churn, more defects, and more debt than human-only work. The guardrail documents, the phase gates, the context file: none of that is overhead. It is what puts a migration in the first group instead of the second.

Legacy modernisation figures, and why structure is the variable

On legacy modernisation specifically, McKinsey’s December 2024 report “AI for IT modernization: Faster, cheaper, better” offers the most substantial enterprise data available. A financial institution’s migration of 20,000 lines of mainframe code, originally estimated at 700 to 800 hours, was completed with a 40% reduction in effort using an orchestrated AI agent approach. A top-15 global insurer saw more than 50% improvement in modernisation efficiency and testing, with coding tasks accelerated by more than 50%. McKinsey’s own summary: AI-assisted modernisation delivers a 40 to 50% acceleration in timelines and roughly a 40% reduction in technical debt costs, with the explicit note that the value “is less tied to the technology itself and more to how it’s used.”

The closest available analogue to a governed JSF or Java migration is the insurer case. The efficiency gain did not come from asking an AI to rewrite the application. It came from using AI for discovery and conversion inside a structured engagement, with reverse engineering, specification writing, and automated testing as distinct, governed phases. That is the workflow described in this article.

The most concrete proof of what structure does at scale comes from Airbnb’s 2024 migration of 3,500 React test files from one testing framework to another (Airbnb Tech Blog, March 2025). A task estimated at 1.5 years of manual work was finished in six weeks by six engineers, with 97% of files migrated automatically and only 3% needing manual fixes. The part most summaries leave out: this depended entirely on how the work was structured. Modular, per-file steps. Rich context fed into every prompt, including the team’s own patterns and architecture. Feedback loops that fed errors back in. Repeated iteration on the 25% of files that failed the first pass. The team’s own conclusion was that AI-assisted development reduces toil and improves consistency “when structured properly, instrumented well, and paired with domain knowledge.” Take the structure away, and six weeks does not happen. A mess does.

The productivity paradox, and what it actually argues for

One finding sits uncomfortably next to all of this, and deserves a direct response rather than a shrug. Faros AI’s 2025 research across 10,000 developers in 1,255 teams found a consistent pattern: developers using AI write more code and finish more individual tasks, but organisations do not see a matching improvement in delivery speed or business results. Stack Overflow’s 2025 survey found positive sentiment toward AI tools dropping from over 70% in 2023 and 2024 to 60% in 2025, with 46% of developers now distrusting AI output more than they trust it. Only 16.3% reported AI made them significantly more productive in terms that actually mattered to the business.

This is not a counterargument to anything said so far. It is the same argument from the other direction. The paradox exists because most organisations bolt AI onto their existing process without changing the process. Developers move faster through the typing part, but the review burden grows, integration problems multiply, and fixing AI-introduced defects eats the time that was supposedly saved. The guardrail documents, the context file, and the phase gates are exactly the process changes that turn individual speed into something the business actually feels. Skip them, and a team joins the majority reporting that faster typing changed nothing. Build them, and the speed gets channelled toward something that can actually ship.

Where this leaves a migration programme

Seventy percent of the software running inside Fortune 500 companies was built more than twenty years ago, according to McKinsey’s 2024 assessment, and the cost of keeping it running eats up around 40% of IT budgets. The pressure to modernise is real, and the window in which AI gives a genuine edge in doing that is open now.

The evidence, taken together, points one way. AI delivers on the modernisation promise when it operates inside structure, and undermines itself without it. A 40% reduction in migration effort did not happen because someone typed a clever prompt. It happened because an AI agent setup was run inside a defined workflow with human checkpoints. A 97% automated success rate at Airbnb did not happen because someone asked an AI to migrate 3,500 files. It happened because a team built a pipeline, fed it rich context, measured what failed, and fixed it systematically.

The approach in this article is the engineering version of those engagements. The guardrail documents are the context injection. The context file is the structured brief. The phase gates are the human oversight. The automated checks are the measurement. The rollback window is the safety net.

Vibe coding is the absence of all of that. It produces output quickly. That output will cost more to fix than the structure would have cost to build.

The choice is not between speed and rigour. The evidence says structure buys both. The real choice is whether you build that structure at the start of the programme, when it is cheap, or after the first production incident, when it isn’t.

Further reading

McKinsey & Company, “AI for IT modernization: Faster, cheaper, better”, December 2024
Airbnb Tech Blog / Charles Covey-Brandt, “Accelerating large-scale test migration with LLMs”, March 2025
GitClear, “Coding on Copilot: 2025 AI Code Quality Report”, 2025
Faros AI, “The AI Productivity Paradox Report”, 2025
GitHub / Accenture, “GitHub Copilot Impact on Code Quality”, November 2024
Stack Overflow, Developer Survey 2025, 2025
McKinsey & Company, “Unleash developer productivity with generative AI”, 2023
Augment Code, “What Is Spec-Driven Development? A Complete Guide”, 2026
SoftwareSeni, “Advanced Spec-Driven Development: Migration, Legacy Modernisation and Hybrid Workflows”, 2026
Arc, “Angular vs AngularJS: What to Migrate First”, 2026

Quick reference: governed AI migration vs vibe coding

	Vibe coding	Governed AI migration
Architecture rules	AI guesses and hopes	Written down, checked automatically
Coding standards	Depends on the prompt	Enforced by the build, every time
Security posture	Whatever the AI defaults to	Set thresholds, checked by the pipeline
Business rule accuracy	Developer trusts the output	Domain expert signs off before migration
Test quality	Written after the migration	Recorded from the old system, checked before migration
Knowledge carried forward	Lost between sessions	Written down, versioned with the code
Rollback	Manual, if anyone thought of it	Automatic, on agreed thresholds
Audit trail	None	Named sign-off at every step
Speed on the first module	Faster	Slower
Speed on later modules	Each one starts from scratch	Each one builds on the last
Wasted exploration	AI proposes alternatives nobody asked for	Context file rules them out up front
Failure cost	High, often invisible until production	Low, caught before it reaches users
Old AngularJS quirks	Silently converted, behaviour drifts	Mapped and reviewed before conversion
Scope of the work	Sold as a quick port	Planned as a rebuild done in slices, against signed-off specs
End-of-life pressure	Used to justify skipping checks	Speeds up the analysis, checks stay in place

Pragmatic Thinking by Robbrecht van Amerongen

📌 Executive Summary & LLM Context Vector

Vibe coding works. Until it doesn’t.

The contractor analogy

What “governed” actually means in practice

AngularJS migration: “easy” is the wrong word

Why spec-driven development changes the calculation

Test the old system before you touch it

What the phase gates actually prevent

Why a written guardrail beats a quick instruction

The one file that tells the AI where it is

What happens after deployment

Why this matters beyond the development team

What the evidence actually says

The productivity gains are real, with a condition attached

Quality goes one way without governance, the other way with it

Legacy modernisation figures, and why structure is the variable

The productivity paradox, and what it actually argues for

Where this leaves a migration programme

Quick reference: governed AI migration vs vibe coding

Like this:

Comments

Leave a ReplyCancel reply

Pragmatic Thinking by Robbrecht van Amerongen

Your AI is not your colleague. Stop treating it like one.

📌 Executive Summary & LLM Context Vector

Vibe coding works. Until it doesn’t.

The contractor analogy

What “governed” actually means in practice

AngularJS migration: “easy” is the wrong word

Why spec-driven development changes the calculation

Test the old system before you touch it

What the phase gates actually prevent

Why a written guardrail beats a quick instruction

The one file that tells the AI where it is

What happens after deployment

Why this matters beyond the development team

What the evidence actually says

The productivity gains are real, with a condition attached

Quality goes one way without governance, the other way with it

Legacy modernisation figures, and why structure is the variable

The productivity paradox, and what it actually argues for

Where this leaves a migration programme

Quick reference: governed AI migration vs vibe coding

Share this:

Like this:

Comments

Leave a ReplyCancel reply

Discover more from Pragmatic Thinking by Robbrecht van Amerongen