GPT-5 vs. Claude Opus: Why Many Users Are Disappointed

8/10/2025

Here’s the thing about hype: it’s a double-edged sword. For months, the tech world was buzzing with whispers about GPT-5. The successor to the model that arguably brought AI into the mainstream was supposed to be another massive leap, a revolutionary step towards something truly special. The expectations were, to put it mildly, through the roof.

Then it dropped. And almost immediately, another narrative started to form, one of disappointment. It wasn’t that GPT-5 was bad, per se. It’s just that for a growing number of users, especially those who had spent time with Anthropic's Claude Opus, it felt… underwhelming.

The sentiment isn’t just scattered complaints on social media. It’s a genuine, palpable feeling among developers, writers, & creators who rely on these tools daily. They’re finding that while GPT-5 is competent, Claude Opus often feels more capable, more intuitive, & more… intelligent.

So what's the real story here? Why are so many users who have tried both models leaning towards Claude? Honestly, it’s not one single thing. It’s a combination of mismatched expectations, subtle but crucial differences in performance, & a sense that OpenAI might have overpromised & underdelivered. Let's break it down.

The Great Expectation Mismatch

Remember the jump from GPT-3.5 to GPT-4? It was MONUMENTAL. GPT-3.5 was impressive, but GPT-4 was a different beast entirely. It could reason better, follow complex instructions with greater accuracy, & its creative writing abilities were a huge step up. It felt like a true generational leap.

Naturally, everyone expected the same kind of jump from GPT-4 to GPT-5. The community was bracing for another "wow" moment. Instead, what they got felt more like an incremental update, a GPT-4.5 masquerading as a full-blown successor. Many users on platforms like Reddit have described the release as feeling more like a "cost optimization release" than a groundbreaking new model. This sentiment is pretty widespread; people were geared up for a revolution & got an evolution instead.

This is a classic case of hype backfiring. OpenAI, with its history of dramatic reveals, has conditioned us to expect mind-blowing progress. When GPT-5 arrived & was "just" a bit better—and in some cases, arguably worse in specific areas—it was bound to disappoint. The feeling wasn't just that GPT-5 wasn't a huge leap, but that it didn't even consistently outperform its predecessors or competitors in ways that mattered to real-world users.

One user on the tech-focused news aggregator Hacker News pointed out that as AI models get better, the performance between different companies' offerings is getting closer. GPT-5, Claude Opus, Gemini 2.5 Pro—they're all clustered at the high end. This makes the competition fiercer & any perceived lack of significant improvement more glaring. When you're at the top, just being "good" isn't good enough anymore. You have to be exceptional.

The Generalist vs. The Specialist: Where Claude Opus Shines

This is probably the most significant factor in the GPT-5 vs. Claude Opus debate. It’s a story about two different philosophies of intelligence.

GPT-5, from many user accounts, seems to have been trained to be an incredible performer on popular tasks. If you ask it to write a NextJS app or perform a common coding task, it’s fast & efficient. It’s been optimized for the mainstream, the 80% of use cases that most people will throw at it.

Claude Opus, on the other hand, appears to be a much better generalizer. This is a HUGE deal. Generalization is the ability to take what you've learned & apply it to new, unseen situations. It’s the difference between memorizing facts for a test & truly understanding the subject matter.

A fantastic example of this comes from a developer who works with a niche, low-code platform. This platform has its own unique stack & scripting language, something that a model like GPT-5 would have had very little, if any, training data on. The developer found that they could feed the documentation to Claude Opus, & the model could then learn the rules & start writing working code. It could generalize from the docs to a practical application. GPT-5, when faced with the same task, was reportedly unable to make that leap. It couldn't get beyond its training set.

This is a recurring theme. For users who are not working in the most popular frameworks or who have unique, complex problems, Claude Opus consistently comes out on top. It has a knack for reasoning through novel problems that GPT-5 seems to struggle with. This adaptability is what makes it feel "smarter" to many. It doesn't just retrieve information; it seems to understand it on a deeper level.

The "Small Model Smell" & The Problem with Benchmarks

Another common complaint about GPT-5 is what some have called the "small model smell." It’s a bit of an abstract concept, but it refers to a certain rigidity & lack of creativity that can be characteristic of less powerful models. It suggests that the model is sticking very closely to its training data & is less capable of genuine "thought" or "reasoning."

This feeling is backed up by some more concrete criticisms. One article on Medium took a deep dive into OpenAI's own marketing materials for GPT-5 & found some glaring issues. The author pointed to a graph presented by OpenAI during a livestream that was, frankly, a mess. The bar heights didn't match the scores they were supposed to represent, making GPT-5's performance look more impressive than it actually was. What's more, the graph conveniently left out comparisons to key competitors like Claude Opus.

When the author ran their own benchmarks, they found that while GPT-5 did score well, Google's Gemini 2.5 Pro was actually better in several key areas: faster, higher median accuracy, & a higher success rate, all for the same cost. The conclusion was stark: GPT-5 wasn't the undisputed king that the marketing suggested. In fact, for many real-world tasks, you'd be better off using a different model, sometimes even an older OpenAI model like o3-mini, which offered a better balance of cost, speed, & performance.

This reliance on questionable benchmarks & flashy presentations has eroded some trust. Users are now more skeptical of the official claims & are relying more on their own head-to-head comparisons. And in those comparisons, the "PhD-level intelligence" that OpenAI promised with GPT-5 often fails to materialize. The same Medium article showed the model failing to answer basic 9th-grade level questions correctly. When a model is hyped as "super-intelligent" but can't handle high-school homework, the disappointment is inevitable.

The Coding Conundrum

Nowhere is the battle between these two AI giants more intense than in the world of coding. Developers have become some of the most avid users of large language models, using them as pair programmers, debuggers, & code generators.

Here again, the story is not a simple one of GPT-5 vs. Claude Opus. It’s more nuanced & depends heavily on the specific context & tools being used. Some developers, particularly those using tools like Cursor, have found GPT-5 to be incredibly powerful. However, many others, especially those trying to solve more complex or "agentic" coding tasks, find Claude Opus to be superior.

One of the interesting discoveries is that the same model can perform very differently depending on the platform you use to access it. A user on Hacker News noted that using Opus 4.1 through GitHub Copilot was a terrible experience, even worse than older models. But when they used the same model through a dedicated tool like Claude Code, the results were excellent. This suggests that the way a model is prompted & integrated into a workflow (the "system prompt" & surrounding tooling) plays a MASSIVE role in its performance.

This is where companies building on top of these foundational models can make a huge difference. For businesses looking to leverage AI for their own needs, like customer service or lead generation, the raw power of the model is only half the equation. The other half is the user experience & the ability to customize the AI for specific tasks.

This is where a platform like Arsturn comes into the picture. Businesses often need more than just a generic chatbot. They need an AI that understands their specific products, services, & customers. Arsturn helps businesses create custom AI chatbots trained on their own data. This is the key. Instead of relying on a model that might be great at writing a Python script but knows nothing about your company's return policy, you can build a chatbot that is an expert in your business. It can provide instant customer support, answer highly specific questions, & engage with website visitors 24/7 with relevant, accurate information. This level of customization sidesteps the entire "which model is best for everything" debate & focuses on what's best for a specific business need.

The ability of a model like Claude to generalize from provided documentation is exactly what makes tools like this so powerful. When an AI can learn your business inside & out, it becomes a truly valuable asset.

The Intangibles: Feel, Flow, & Trust

Beyond benchmarks & feature lists, there's an intangible quality to these AI models that heavily influences user preference. Many users describe Claude models as just "getting it." There’s a flow to the conversation, an intuitiveness that makes interaction feel more natural & less like you're fighting with a machine.

Part of this might be the larger context windows that Anthropic's models often feature, allowing for more detailed instructions & a better memory of the conversation. But it's more than that. It's about the tone, the style of the responses, & the model's ability to grasp nuance & intent.

When a user feels like they have to "babysit" a model, constantly re-phrasing prompts & correcting its mistakes, the magic quickly fades. The ideal AI assistant is one that feels like an extension of your own thought process. For many, Claude Opus is closer to that ideal than GPT-5.

This also ties back to the trust issue. When OpenAI releases what many perceive to be a botched graph, it creates a sense of skepticism. It makes users wonder what else is being exaggerated. Anthropic, by contrast, has generally been more measured in its marketing, letting the performance of its models speak for itself. This has helped them build a loyal following of users who trust the product to deliver.

In the fast-evolving world of AI, businesses need a partner they can rely on. When it comes to automating critical functions like lead generation or customer engagement, you need an AI that is not only powerful but also reliable & transparent. This is the value proposition of conversational AI platforms. For instance, Arsturn allows businesses to build no-code AI chatbots that can boost conversions & provide personalized customer experiences. It’s not about chasing the latest, buzziest model. It’s about building meaningful connections with your audience through a tool that is trained on your data & speaks with your voice. This creates a level of trust & personalization that a one-size-fits-all model can't replicate.

The Final Verdict: It's About the Right Tool for the Job

So, is GPT-5 a disappointment? The answer is a classic "it depends." If you were expecting a world-changing leap in intelligence that would render all other models obsolete, then yes, you were probably disappointed. The reality is that the AI landscape is more competitive than ever, & the crown for "best model" is very much up for grabs.

The real reason so many users who have tried Claude Opus are let down by GPT-5 is that they’ve had a taste of a different kind of intelligence. They've experienced a model that excels at generalization, that can tackle novel & complex problems with a surprising degree of insight, & that, for many, simply feels smarter & more intuitive to work with.

GPT-5 is a powerful tool, no doubt. But it’s not the universally superior model that many hoped it would be. Its struggles with niche tasks, the questions surrounding its real-world performance versus the marketing hype, & the fact that it feels more like an incremental improvement have all contributed to this sense of disappointment.

Ultimately, the conversation is shifting. It's moving away from "which model is the smartest?" & towards "which model is the right tool for my specific problem?" And for a growing number of users, from solo developers to large enterprises, the answer to that question is increasingly Claude Opus or a customized solution built on a flexible platform. The future of AI isn't about one model to rule them all; it's about a diverse ecosystem of specialized tools designed to solve real-world problems.

Hope this was helpful & gives you a clearer picture of what's going on in the AI world right now. Let me know what you think