AI in Production: Between Promise and Reality - Avoiding the Trap

There's a very specific moment that everyone who works with AI in production knows. Everything works perfectly in the demo, the team is excited, and then the system meets the real world - and something breaks. Not dramatically. No alarms, no red lights. Quietly, in corners nobody thought to check.

This article synthesizes insights from hands-on experience with AI systems in production, focusing on challenges that don't appear in vendor documentation: model blind spots, hidden security risks, and architecture decisions that look technical but are actually business decisions.

A Cow on a Highway: The Fundamental Problem That Hasn't Budged

AI models used to identify cows on highways as cars. Why? Because they learned to recognize the asphalt, not the vehicle. Cow on asphalt equals Mazda 3. Makes perfect sense, really.

We fixed it, pushed better data, and felt like we'd beaten the system. But the fundamental problem didn't budge an inch - AI is still limited to what it absorbed during training.

A cow on a highway is easy to spot. But today we're pushing models into complex domains, and that's where things get interesting. The mistakes aren't funny or obvious anymore. They're subtle, quiet, and look completely legitimate - until someone gets hurt.

That's exactly what makes it scary.

When AI Spits Out the "Right" Answer - Just Not Right for You

Here's something that happened to me recently. We needed a complex find-replace algorithm. The AI produced shell code - worked perfectly. But then we dropped it into our specific use case, and it crawled. We're talking turtle-with-bad-knees slow.

The frustrating part? It had all our context, knew exactly what we needed. But what it learned in AI school didn't cover this particular corner. So it just threw the "correct" generic solution at us, because that's what it had in the stack. It had no way to generate the right solution for us.

And that's not a bug. That's the model's expected behavior.

Your AI in production is probably making decisions like this right now. It spits out answers that look correct because that's what it knows, and you'll only discover the gap in edge cases - when a customer is angry, when performance tanks, or when someone asks a question the model simply doesn't know it can't answer.

The business risk here isn't theoretical. This is technical debt accumulating silently, decisions that look automated and efficient but are actually generating fragility that will surface at exactly the worst possible moment.

Who's the User? The Question That Defines Your Architecture

A week ago Anthropic published a blog post about the advantages of HTML over Markdown and sparked a heated debate. Now that the dust has settled, it's worth noticing something many people missed: every single example they gave was an output intended for humans.

And that brings us back to the most fundamental question in architecture: who is the user? Who consumes the output?

For machines and agents, Markdown and JSON are king. They save on token costs and are perfectly suited for LLM attention mechanisms. For humans, HTML can be the right choice - but with serious caveats.

Because even if we set aside efficiency and cost savings for a moment, there's a security risk that doesn't get enough airtime. JavaScript code generated by an LLM inside HTML is a reliable path to data leaks and XSS vulnerabilities, with a high risk of reputational damage. This isn't a theoretical scenario - it's an attack vector waiting to happen.

So the rule is simple: Markdown for AI-to-AI communication, and HTML for user interfaces - carefully, with sanitization, and with full awareness of the risks. This isn't just about efficiency. It's about building a product, something stable - not a fragile toy.

Why You Need an Expert in the Room

The common thread across all these challenges - models that fail silently, architecture decisions that look technical but impact security and costs, generic solutions masquerading as tailored ones - is that AI alone won't tell you there's a problem.

That's why you need an expert in the room. Someone whose gut fires when they see "correct" output that's disconnected from reality. Exactly the way your gut would fire if you saw a cow reversing out of a parking spot. Someone who knows the domain well enough to spot the algorithm's blind spots before they crash into your customers.

This doesn't mean don't use AI. The opposite. It means use it the way you'd use any powerful tool - with an experienced person who knows when the tool fits and when it's dangerous. A developer who takes AI output and ships it straight to production without asking "why"? isn't efficient - they're a liability.

The right approach is bionic: people using AI to be better, not people surrendering their thinking because the AI "already did the work".

Takeaways

The fundamental problem hasn't changed. AI is limited to what it learned in training. It doesn't know what it doesn't know, which means its mistakes look like correct answers. The more complex the domain, the higher the risk.

Architecture decisions are business decisions. The choice between HTML and Markdown, between a generic solution and a tailored one, between development speed and security - each one impacts costs, business risk, and time-to-market. Treating these as purely technical decisions means missing the bigger picture.

Human expertise isn't a bottleneck - it's a safety net. The expert in the room isn't the person slowing down the process. They're the person preventing you from discovering problems through your customers. Investing in expertise is the highest-ROI move you can make when running AI in production.

Ultimately, the question isn't whether to use AI. The question is whether you're building a product or a toy. A product demands thoughtful architecture, awareness of blind spots, and someone who knows the domain well enough to recognize when the AI is wrong.

Where has AI surprised you - for better or worse - in production this year?

Join the discussion on socials: