AI in Software Engineering: Staying the Pilot, Not Becoming the Passenger

It happens again and again: a senior developer on a team I work with recently spent three hours debugging code that Claude wrote in two minutes. The AI solved the original problem perfectly, but introduced three new edge cases with every small fix.

This isn't a surprise. It's a pattern I see repeatedly as teams adopt AI tooling. And it's exactly where the biggest opportunity and the biggest risk of AI in software engineering converge.

The 40% Paradox

Anthropic research shows developers with AI tools become 40% more productive. That sounds impressive. But the same research reveals something else: after six months, those same developers struggle to solve problems without their digital assistant.

We gained speed. We traded away understanding.

That trade-off doesn't show up in any sprint report. Velocity goes up. The on-call incidents that come six months later don't get attributed to AI adoption. The connection between "we started relying heavily on AI" and "nobody on the team can debug this system without it" is invisible until a critical failure makes it undeniable.

Marcus Fontoura of Microsoft puts it well in his book Human Agency in the Digital World: are your developers pilots or passengers? A pilot understands the flight system and can disengage autopilot the moment conditions demand it. A passenger just sits there and hopes the machine knows where to go.

Teams that benefit from AI over the long term are the ones who stay pilots.

"The Tail Wagging the Dog"

There's a pattern I see in teams that lean too heavily on AI: a "simple" API fix that touches dozens of files takes months because nobody really understands the generated code. When the AI suggests a solution, people check whether tests pass, not whether the logic is correct. Debugging becomes trial-and-error with the AI rather than understanding the root cause.

This is "the tail wagging the dog" problem: the AI ends up directing the team instead of the team directing the AI.

It turns out there's a geometric reason for this: Anthropic research explains that AI models can't distinguish between "good" and "bad" code paths - they look identical in the model's representation space. Think of a brilliant junior engineer who nails the main task but misses the side effects entirely.

The difference is that a junior engineer grows. They get code reviews, make mistakes, and learn the system. AI doesn't. Every prompt is the first prompt. The model has no institutional memory of what broke last quarter or why that specific design decision was made.

The Interface Mistake: Not Every Problem Needs a Chatbot

A classic example: a team that spent three months building a natural-language AI interface, and ended up with users who actively avoided it. The reason? Their actual workflow relied on structured forms and dropdowns, not free-form conversation.

Everyone's building chatbots because they demo well in board presentations. But your CFO doesn't want to "chat" about invoice approvals: they want dropdown menus, form validation, and clearly labeled fields.

The pattern repeats across industries. A natural-language interface feels modern and impressive. But the people who use it every day don't want to wonder whether they phrased their question correctly. They want reliable, predictable tools that do what they expect. AI should make those tools smarter - not replace them with a conversation.

The right use of AI is to fill existing structures more intelligently, not to replace your domain models with natural language.

The 70% Nobody's Talking About

Researchers from both OpenAI and Anthropic have recently said, each in their own way: "I don't write code anymore. 100% of it is AI-generated". Anthropic itself reports that 70–90% of its own codebase is AI-generated.

The underreported part: the infrastructure wasn't built for this. GitHub pull requests, code reviews, CI pipelines: all designed for humans who think in hours and days, not for agents committing code every few minutes. The review process that worked when a developer committed a few hundred lines per day breaks down when an agent can generate thousands in an hour. The bottleneck shifts from generation to evaluation - and most teams haven't restructured for that.

This also explains the paradox with Electron in Claude Code: Anthropic spent $20,000 on a "swarm" of AI agents to build a C compiler (software that runs at bare-metal level). Yet they built their own desktop app in Electron, a framework as far from the metal as you can get. Native APIs have become so complex that even giant companies choose web technologies. Our workflows keep running ahead of our ability to adapt tools around them.

The companies designing workflows built for agents (not for humans working alongside AI) are the ones that will grow 3–5x in the next year.

How to Stay the Pilot

The approach I recommend:

Verify, don't just run. When AI suggests a solution, ask "why"? not just "does it work"? Your AI assistant is a brilliant junior colleague, not an oracle. If a developer can't explain why the solution works, it doesn't go to production - regardless of who wrote it.

Use AI for the refactors humans skip. Renaming poorly named variables across an entire codebase, splitting 3,000-line files into modules, removing duplicated logic: these are where AI agents excel. The cost of code has dropped so far that zero tolerance for small code friction is now achievable. Use that leverage for the cleanup that always gets deprioritized.

Adapt your workflows. PR reviews, CI pipelines, even how you define acceptance criteria: all of it needs to evolve for a world where large amounts of code are AI-generated. Those workflows were designed for humans thinking in hours and days. Agents operating in minutes expose their limits quickly.

Protect the architects. AI can write code. Only humans can decide what to build, why, and how it connects to business outcomes. That judgment is the scarcest resource on any team - and the one most at risk when AI makes generating output feel effortless.

Takeaways

The 40% productivity gain is real, but so is the dependency risk. The same research that shows AI boosts output also shows that developers lose the ability to work without it after six months. Speed without understanding is a liability that surfaces late, and expensively.
AI in software is an interface problem as much as a capability problem. Chatbots impress in demos. Structured workflows serve the actual users. Match the interface to how your team actually works - not to what looks impressive in a board meeting.
The teams winning with AI aren't the ones using it most. They're the ones using it most deliberately. Verify outputs. Adapt workflows. Protect the architects. Use AI to do more of the right things - not to stop thinking about what the right things are.

AI in software engineering is a powerful lever, provided you're the one holding the handle. The question isn't "should we use AI"? It's "how do we stay the architects"? Once you have a clear answer to that, the 40% productivity gain is just the beginning.

Join the discussion on socials:

LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)
LinkedIn · Facebook (Hebrew)