AI Hallucinations That Cost Companies Big Time

Mar 8

Written By Michele Edington (formerly Michele Gargiulo)

For a long time, the loud fear around artificial intelligence was that it would replace us all. AI was coming for our jobs from receptionist to sommelier, I’ve heard it all. Machines were here to do all the heavy lifting when it comes to thinking and make all the hard decisions for us, until it eventually outpaced us and made us irrelevant. Now, of course this was what companies promised the shareholders as they asked for billions of dollars to pour into their AI models, and the reality of it couldn’t be further than that.

What almost no one worried about though, because it sounded waaaay too boring to be dangerous, was what would happen when we let AI remember for us instead.

I’m not talking about memories in the sense that we have them, or in that nostalgic way with emotions, I mean the quiet, bureaucratic memory of modern life from policies, or codebases, to incident reports, customer interactions, internal documentation, etc etc. The stuff that quietly becomes truth once it’s written down.

Enter AI hallucinations and that’s where the real failures have already begun.

The Amazon Outage

I was personally effected by this blackout and unable to buy myself a cute little water bowl for my dog, so this is a personal story. Inside Amazon Web Services, there exists a reality most of us will never see: layers upon layers of legacy systems holding the modern internet together. Some of that code is decades old, which means some of it was written by engineers who no longer work there. A lot of it is so interconnected that changing one piece can ripple outward in the same way yanking on a spiderweb can.

This is exactly the environment where AI coding assistants are supposed to shine. That’s the promise of the interwebs and all the people talking about them anyway.

These coding assistants don’t get tired or lose context, and they aren’t scared to jump right into something unfamiliar. Nope, they just dive right in with no hesitation at all.

There was a widely discussed automation incident inside AWS infrastructure in late 2024. Some people online speculated that an AI coding assistant may have been involved, although I couldn’t find confirmed reporting that proves it. The idea was supposedly mundane involving refactoring, cleanup, and consistency. The kind of maintenance work engineers often avoid because it’s slow, risky, and mentally exhausting. The perfect procrastination job.

At some point in this fun little process, the AI encountered something that was inconsistent between different environments in their ecosystem, and it made a bold decision. Now, this part is speculation, and I read a lot on the internet about it, but it truthfully has not been confirmed, but they say that instead of patching around the inconsistency like a person might’ve, the AI deleted and recreated a production environment entirely. I guess from a statistical and objective standpoint, that made sense to the AI. In a lot of training simulations, full environment rebuilds are a clean way to resolve drift. That’s what I was taught in my online coding class anyway. In test environments, this is often encouraged as a better way of making sure your code doesn’t contradict itself.

Yeahhh, but the real production of it caused a service outage that lasted more than 12 hours. Customers like you and me were affected. My tiny dog named Riesling had to wait another day to get her fancy new water bowl (she hated it anyway). I would imagine at a company like Amazon, internal teams scrambled to fix the issue. Engineers initially assumed a human error, which I guess was sort of true. Maybe some misconfiguration or a botched deployment gone wrong.

Amazon later explained the incident as a permissions problem. People, they said, gave the system too much access. Which is true, technically, but also sort of incomplete as it glazes over the fact that their AI touched something it probably shouldn’t have. What failed was judgment, and more importantly, institutional awareness.

The system followed learned patterns that any new coder is taught in their classes. It applied logic that works thousands of times a day in non-critical environments. AI had no internal concept of “this is production”, which are really just labels that mean nothing to a probabilistic system. No human engineer would have deleted that environment because we all walk around and carry invisible context with us in the form of risk, blame, cost, and fear. AI carries none of that and just optimizes for coherence, not consequence.

This is the moment where the Amazon story stops being just an outage and becomes something larger.

Amazon is just the most visible example of a shift that’s already underway. Across industries, AI systems are no longer being used just to suggest changes. They’re being allowed to write, rewrite, summarize, and preserve reality…often with minimal review. The failures that result don’t look dramatic at first, they look clean and polished. This is why I keep pushing things like this The Rise of Independent Media: When People Stop Waiting to Be Told What’s Real on my blog. We need more people out there showing us the real world, not AI showing us it’s version of it.

Air Canada

For another fun example, in 2023, a customer asked a simple question to an AI chatbot on Air Canada’s website after she had lost a member of her family. The chatbot responded with a clear explanation of the airline’s bereavement fare refund policy, and she booked her plane ticket. The issue is that that policy didn’t exist. The chatbot hadn’t been malicious, it just had synthesized a plausible answer by blending fragments of real policies, industry norms, and linguistic patterns. On the interwebs we call this an AI hallucination.

The customer relied on it and when Air Canada denied the refund, arguing that the chatbot was mistaken, the case went to court. Of course a petty airline doesn’t want to just refund one person’s ticket even though their AI bot told her something. The judge ruled that the airline was responsible for the information its AI provided.

What makes this case important isn’t the refund or the fact that Air Canada was being petty, but the precedent it set. It had invented the truth on the fly, and then it was treated as a representative of institutional truth.

Deloitte

These weren’t the only stories that have caught my attention over the past few years. Some legal stuff also really interested me when I first saw a few law firms getting dinged using references to not real cases. Another example of that is in 2024, Deloitte delivered a report to the Australian government that included fabricated citations and nonexistent legal references.

This was a formal government document, for the record, not some cutsie blog post on my website. The AI had filled gaps with things that looked like real sources because, statistically, that’s what these documents normally contain. Since it couldn’t find real sources, it just made a bunch up. The structure of formatting, tone, citation style, etc was already there, the content merely needed to look complete.

Once discovered, Deloitte refunded part of the contract and reissued the report. Oopsies. The damage wasn’t just reputational though, it revealed how easily hallucinated information can slip into the highest levels of institution when no one expects the record itself to be lying.

The thing is, beyond headline names are stories that will never make the news. Mid-stage SaaS companies use AI to generate everything from incident postmortems to performance summaries to root-cause analyses. It’s honestly hard to know what’s even real online anymore, which is why I keep emphasizing find someone you trust and just follow them. All of these new systems feel miraculous as they take away a lot of the “busy” work you might have to get done. Documentation that appears instantly is sort of magical and who doesn’t want meetings to go faster? The issue is that months later, during audits or internal reviews, someone reads closely enough to notice timelines that don’t match logs. Sometimes they see causes that don’t line up with reality or details that no one remembers happening.

AI fills missing data with probability. It’s just how most of them are built right now. The major issue with this (besides the obvious part where it’s just wrong about a bunch of shit) is that sometimes because reports became part of official documentation, future decisions are based on them.

This is how hallucinations compound, by becoming the foundation future truth is built on.

Why We Don’t Catch This

I found a term online that calls this automation bias. Essentially, when a system appears authoritative, we defer to it, especially when we’re under time crunches or mentally overloaded.

Today and for most things, review means skimming or trusting formatting or assuming someone else verified the facts. AI outputs are optimized to pass exactly that kind of review and don’t even express uncertainty unless prompted. They sound confident even when they shouldn’t be.

There’s a crucial distinction we’ve blurred as our AI trend has grown and grown. AI can totally help with thinking and brainstorming, exploring possibilities when it’s 2am and your friends are all asleep. The thing is, remembering is different. Records shape accountability and define reality after the fact. What’s out there in record is what our history is made out of. AI has no concept of historical responsibility, because it doesn’t care if it’s wrong. For anything out there without consequence, there’s no weight of decisions made based on its words.

When we let AI become the record keeper we give it a role it was never designed to hold.

The AWS outage was about delegated authority without any sort of judgment. The AI did exactly what it was allowed to do, in exactly the way it had learned to do it. The failure was trust.

How many internal systems right now are quietly accumulating synthetic memory across the internet? How many policies, reports, or summaries are now written by models that never experienced the original events? If the record itself starts hallucinating, how would we even know what was real anymore?

AI Hallucinations That Cost Companies Big Time

The Amazon Outage

Air Canada

Deloitte

Why We Don’t Catch This

Other Reads You Might Enjoy:

The Best Wines to Pair With a Bubble Bath

Apocalyptic Skills You Never Thought Twice About