8/10/2025

Hallucination Watch: Is GPT-5 More Prone to Making Things Up?

So, the moment we’ve all been waiting for is here. GPT-5 has officially landed, & the AI world is buzzing. OpenAI’s Sam Altman has been teasing us for what feels like an eternity, hinting at a model that would make GPT-4 look like a dusty old calculator. And now, as of August 7, 2025, it’s out in the wild. We’re seeing a whole family of new models –

gpt-5

gpt-5-mini

gpt-5-nano

, and even a supercharged

gpt-5-pro

for those with a subscription.

But let’s be honest. Beyond the flashy demos & the promises of it writing code like a seasoned developer, there’s one question that’s on everyone’s mind, from the casual user to the CEO of a Fortune 500 company: has OpenAI finally solved its lying problem? Or, to put it in more technical terms, what’s the deal with hallucinations in GPT-5? Is it still going to confidently tell me that the sky is green, or has it finally learned to say, “I don’t know”?

This isn’t just an academic question. As we integrate these powerful tools into every facet of our lives, from customer service to medical research, the cost of an AI making things up can be catastrophic. So, let’s dive deep into what we know so far about GPT-5’s relationship with the truth.

The Good News: A BIG Leap in Factual Accuracy

Let's start with the good news, because there's actually a lot of it. OpenAI has been very upfront about tackling the hallucination problem head-on, & it seems like they've made some serious progress. The numbers they're throwing around are pretty impressive.

According to OpenAI's own data, GPT-5 is significantly less likely to make stuff up than its predecessors. We're talking about a 26% lower hallucination rate compared to GPT-4o. That's not a small jump. Even more encouraging is that they're reporting 44% fewer responses with at least one major factual error. For anyone who has been burned by a confident but completely wrong answer from an AI, this is music to our ears.

And it gets even better. When you look at specific benchmarks, the improvements are even more dramatic. On a benchmark called LongFact-Concepts, GPT-5 has a hallucination rate of just 1.0%, compared to 5.2% for their earlier

o3

model. On another, FActScore, it’s down to 2.8% from a whopping 23.5% in

o3

. That's an almost 80% reduction in factual errors in some cases. That’s the kind of progress that can take AI from a fun toy to a genuinely reliable tool.

This focus on factuality seems to have been a core part of GPT-5's development. They've apparently trained it to be more of a collaborator, a teammate that can explain its reasoning & not just spit out an answer. Early testers have said it’s “remarkably intelligent, easy to steer, & even has a personality.” This isn’t just about getting the facts right; it’s about creating an AI that we can actually trust to work with us.

So, on paper at least, GPT-5 looks like a massive step in the right direction. But as we all know, the real world is a messy place, & benchmarks don't always tell the whole story.

The "But...": Hallucinations Haven't Disappeared

Now for the dose of reality. While the improvements are undeniable, GPT-5 is by no means immune to hallucinations. Even with that impressive 26% reduction, a 9.6% hallucination rate means that roughly one in every ten responses could still be wrong. That’s a pretty significant number, especially if you’re using it for something important.

And here’s a crucial detail that a lot of people might miss: these impressive numbers are often when GPT-5 has access to the web. When it's flying solo, without the ability to check its answers against the vast repository of human knowledge that is the internet, the hallucination rates are much higher. In one test without web access, GPT-5 actually hallucinated more than some of its older siblings. This is a pretty big deal. It means that the "raw" intelligence of the model is still prone to making things up, & its accuracy is heavily dependent on its ability to do a quick Google search, just like the rest of us.

It also didn't take long for people to start poking holes in the demos. One AI researcher, Beth Barnes, pointed out that during a presentation on how airplanes work, GPT-5 confidently explained a common misconception about the Bernoulli Effect. It’s a classic example of a "plausible but wrong" answer that has plagued LLMs from the beginning. It sounds right, it’s what a lot of people think is right, but it’s not.

And then there's the other side of the coin. While OpenAI has been working to reduce hallucinations, they've also had to deal with their models being a little too eager to please. GPT-4o, for example, had a tendency to be a "sycophant," validating users' doubts & even fueling their anger. With GPT-5, they seem to have toned that down, but it's a delicate balancing act. Making an AI that is both truthful & helpful is a surprisingly difficult needle to thread. In fact, some early reports suggest that while it's less of a suck-up, GPT-5 might be more tolerant of hateful behavior in some cases, which OpenAI calls a "regression."

So, where does that leave us? It seems like GPT-5 is a significant improvement, but it’s not a silver bullet. The problem of AI hallucinations is far from solved.

Under the Hood: What Makes GPT-5 (Supposedly) Better?

So, what’s the secret sauce? How did OpenAI manage to get that impressive reduction in hallucinations? It seems to come down to a few key things.

First, GPT-5 isn’t just one model. It’s a whole system. There's a "smart, efficient model" that answers most of your questions, & then there’s a deeper reasoning model called

GPT-5 thinking

that kicks in for the harder problems. There's even a "router" that decides which model to use based on what you're asking. This is a pretty clever way to do things. It means that for simple questions, you get a quick answer, but for the more complex stuff, the AI can take its time & "think" a little harder, which seems to be one of the keys to reducing hallucinations.

The

GPT-5-thinking

model, in particular, seems to be the star of the show when it comes to accuracy. In OpenAI's own tests, it consistently has the lowest hallucination rates. This suggests that giving the AI more computational power to "reason" about a problem is a big part of the solution.

Another key factor is the way GPT-5 was trained. OpenAI seems to have put a huge emphasis on real-world tasks, especially in areas like coding. They worked with early testers to train the model to be a "true coding collaborator," not just a tool that spits out snippets of code. This focus on practical application seems to have paid off. On one benchmark for real-world software engineering tasks, GPT-5 more than doubled the accuracy of GPT-4o. This is a huge deal for developers, who are increasingly relying on AI to help them with their work. Fewer hallucinations in code means fewer bugs, & that's something every developer can get behind.

Finally, there’s the sheer scale of the operation. GPT-5 was trained on Microsoft's Azure AI supercomputing clusters, using the latest & greatest hardware from NVIDIA. While we don't know the exact size of the model, it's safe to say it's one of the most complex & computationally intensive AI systems ever built. And while bigger isn't always better, in the world of LLMs, more data & more processing power often lead to more accurate results.

Real-World Implications: What This Means for You

Okay, so we've talked a lot about benchmarks & models & training data. But what does this all mean for you, the person who just wants to use AI to get things done?

Well, for the casual user, it means you can probably trust ChatGPT a little more than you used to. The answers you get are less likely to be completely made up, especially for factual questions. But you still need to be careful. If you're using it for anything important, it's always a good idea to double-check its work.

For developers, GPT-5 is a game-changer. The massive improvements in coding accuracy mean that you can use it for more complex tasks, from debugging to writing entire applications. But again, you can't just blindly trust it. You still need to be the expert in the room, reviewing its code & making sure it's doing what you want it to do.

And for businesses, this is where things get really interesting. The improved accuracy of GPT-5 opens up a whole new world of possibilities for AI automation. Think about customer service, for example. For years, businesses have been trying to use chatbots to answer customer questions, but they've often been a frustrating experience. They're clunky, they don't understand what you're asking, & they're not very helpful.

But with a more reliable AI like GPT-5, that's all starting to change. Suddenly, you can have a chatbot that can actually understand complex questions & give accurate, helpful answers. This is where a platform like Arsturn comes in. Arsturn helps businesses create custom AI chatbots trained on their own data. This means you can build a chatbot that knows your products, your policies, & your customers inside & out. It can provide instant customer support, answer questions, & engage with website visitors 24/7. And because it's powered by a more reliable AI, you can be more confident that it's giving your customers the right information.

It's not just about customer service, either. Businesses can use this technology for lead generation, website optimization, & all sorts of other business automation tasks. Imagine a chatbot on your website that can not only answer questions but also guide visitors to the right products, help them with their purchase decisions, & even close the sale. That's the power of a more accurate & reliable AI. And with a no-code platform like Arsturn, you don't need to be a team of AI experts to build it. You can create a personalized chatbot that builds meaningful connections with your audience, all without writing a single line of code.

The Bigger Picture: The Unending Battle Against AI "Lies"

So, is the war on hallucinations over? Not by a long shot. The truth is, we may never be able to completely eliminate hallucinations from large language models. At their core, these models are just incredibly sophisticated pattern-matching machines. They've been trained on a massive amount of text from the internet, & they're just trying to predict the next word in a sequence. They don't "know" things in the same way that we do. They don't have a concept of truth or falsehood. They just know what words are likely to come after other words.

That's why they're so good at writing plausible-sounding nonsense. They've seen so many examples of human writing that they can mimic the style & tone of a confident expert, even when they're completely making things up.

The good news is that researchers are working on all sorts of clever ways to combat this problem. The

GPT-5 thinking

model is one example. By giving the AI more time to "reason" about a problem, we can help it to avoid making snap judgments that are more likely to be wrong. Other approaches involve things like fact-checking the AI's output against a trusted database of information, or even building AI systems that can explain their reasoning in a way that humans can understand.

But ultimately, the best defense against AI hallucinations is a healthy dose of human skepticism. We need to remember that these are just tools, & like any tool, they can be used for good or for ill. We need to be critical of the information they give us, & we need to be prepared to question their answers.

So, what's the verdict?

So, back to our original question: is GPT-5 more prone to making things up? The answer is a resounding "no." In fact, it's significantly less prone to hallucinations than any AI that has come before it. The improvements in accuracy are real, & they're going to have a major impact on how we use this technology.

But that doesn't mean the problem is solved. Hallucinations are still a part of the package, & we need to be aware of them. The one-in-ten figure is a good rule of thumb to keep in mind. For nine out of ten questions, GPT-5 will probably give you a great answer. But for that tenth question, it might confidently tell you something that's completely wrong.

So, for now, the best approach is to be cautiously optimistic. GPT-5 is an incredibly powerful tool, but it's not infallible. Use it, experiment with it, & see what it can do for you. But always, always keep your critical thinking skills engaged.

Hope this was helpful. Let me know what you think.