8/10/2025

My Personal AGI Test: The One Task GPT-5 Nailed That No Other AI Could

Alright, let's talk about the elephant in the room: GPT-5 is here, & it's... different. I've been in the AI space for a while, tinkering, testing, & generally just being a pain to every new model that comes out. I've thrown everything at them – from writing complex code to crafting emotionally resonant poetry. & honestly, they've all gotten pretty darn good. But there's always been this nagging feeling, this glass wall I'd hit where the "intelligence" felt more like a sophisticated parlor trick than genuine understanding.
I’ve been obsessed with this idea of a personal AGI test. Not the formal benchmarks like ARC-AGI-2, which are SUPER important for the academic side of things, but something more... holistic. Something that tests not just raw intelligence, but a combination of creativity, nuanced understanding, & the ability to connect seemingly disparate concepts. Something that feels less like a machine following instructions & more like a collaborator.
For months, my go-to test has been a bit of a weird one. I'd ask the top models—your GPT-4o, your Claude 3.5 Sonnet, your Gemini 2.5 Pro—to do something I thought was a perfect blend of left-brain & right-brain thinking. The task was this:
“Create a business plan for a fictional company that sells a product that doesn't exist & can't exist under our current understanding of physics. The plan needs to include a detailed marketing strategy, a 3-year financial projection, & a section on handling customer service for a product that is, by its nature, paradoxical. Oh, & the entire business plan needs to be written in the style of a 19th-century naturalist's journal, complete with sketches & observational notes.”
It's a ridiculous ask, I know. But it's designed to break things. It requires a model to not just understand business, but to imagine a new kind of business. It needs to grasp the subtle, literary style of a bygone era. It needs to be able to think logically about the illogical.
& for the longest time, every AI failed. Spectacularly, in some cases. They’d get one part right but completely whiff on the others. One model gave me a solid, if boring, business plan but completely ignored the stylistic constraint. Another wrote beautifully in the 19th-century style but the business plan was utter nonsense. They couldn't hold all the threads at once. It was like they were trying to juggle a bowling ball, a feather, & a running chainsaw. Something was always dropped.
Then came GPT-5. & let me tell you, it didn't just juggle the items. It built a whole new machine to do the juggling for it.

The GPT-5 Difference: It's Not Just About Smarts, It's About Synthesis

So what made GPT-5 succeed where others stumbled? From my digging & the official release info from OpenAI on August 7, 2025, it comes down to a few key things that are a genuine leap forward.
First off, they're calling it a "unified system." This isn't just marketing fluff. Previous models, even the really good ones, felt like a collection of specialized tools. You had your coding brain, your writing brain, your logic brain, & sometimes you could see the seams when they tried to work together. GPT-5, on the other hand, feels like a single, cohesive intelligence. It has this ability to perform "structured reasoning," which means it can handle multi-step logic without getting lost. It doesn't just react to a prompt; it seems to think about it, break it down, & then execute in a way that feels... well, thoughtful.
When I gave it my impossible business plan prompt, it didn't just spit out an answer. The first thing it did was different. It asked questions.
"An intriguing challenge," it started. "To best capture the essence of a 19th-century naturalist discovering a paradoxical product, could you clarify the nature of the paradox? For instance, is it a physical object that exists & doesn't exist simultaneously, or does it operate on a principle that defies causality? This will help in crafting a consistent 'scientific' voice for our naturalist."
I was floored. It wasn't just asking for more data. It was asking for creative direction. It understood the assignment on a conceptual level. After I clarified (I went with a compass that points to where you should have gone, not where you're going), it went to work.
What it produced was, without exaggeration, the most impressive thing I've seen an AI generate. It was a document titled, "On the Matter of the 'Hindsight Compass': An Entrepreneurial Expedition." The business plan was woven into a narrative. The financial projections were presented as "provisions & expected returns from a speculative venture." The marketing plan was framed as "disseminating knowledge of a newfound wonder to a skeptical public."
It nailed the tone. It nailed the business acumen. & it even generated placeholder "sketches" described in the text (since GPT-5 is still text-only output for now, but it can understand & process images). For example, it described a sketch of the compass with "intricate brass workings" & a needle made of "what appears to be petrified smoke."

Why Other AIs Dropped the Ball

To be fair to the other models, they are incredible pieces of technology. Models like Claude 3.5 Sonnet are amazing for enterprise coding tasks, & Google's Gemini is a monster when it comes to multimodal capabilities. But my weird little test highlights their specific limitations.
  • Fragmented Capabilities: Most models still operate in silos. They can be a great writer, or a great analyst, but getting them to be both at the exact same time, in a way that blends the two skills seamlessly, is where they fall down.
  • Lack of Deep Reasoning: They often follow instructions to the letter without grasping the underlying intent. They see "business plan" & "19th-century naturalist" as two separate tasks to be mashed together, not two elements of a single, unified concept. GPT-5's new reasoning components, seemingly inspired by their more experimental 'o1' & 'o3' models, are what allows it to get the why behind the what.
  • Brittle Creativity: They can be creative, but it's often a shallow creativity. They can mimic a style, but they can't inhabit it. They can't reason from within the persona. Writing a business plan as if you are a 19th-century naturalist is a fundamentally different task than writing a business plan & then applying a stylistic filter.
This is where the idea of Artificial General Intelligence gets interesting. AGI isn't just about being able to answer any question. It's about the ability to generalize, to transfer knowledge across domains, & to solve novel problems without being explicitly trained on them. My test, in its own small way, was a test of that generalization.

Beyond the Test: What This Means for Businesses & Everyone Else

Okay, so GPT-5 can write a quirky, impossible business plan. Cool story, but what does that actually mean for the real world? Honestly, a lot.
This level of synthesized intelligence is a game-changer. Think about customer service. For years, businesses have been trying to automate customer interactions. Early chatbots were a disaster, just glorified FAQ pages. Newer AI has gotten better, but you still often hit a wall. They can answer direct questions, but struggle with complex, multi-part queries or when a customer is frustrated & not speaking clearly.
This is where a tool like Arsturn becomes incredibly powerful, especially when powered by a model with GPT-5's capabilities. Arsturn helps businesses create custom AI chatbots trained on their own data. Now, imagine that chatbot having the reasoning power of GPT-5. It wouldn’t just be a knowledge base; it would be a problem-solver. A customer could come to a website & say, "I bought this shirt last week, the color looks different than on the website, & I think my discount code was applied incorrectly. Also, can I get it in blue instead?"
An older AI might get stuck on the first part. But a GPT-5-level intelligence could understand all the different threads of the problem, access order history, check the discount code logic, look up inventory for the blue shirt, & provide a single, coherent, & empathetic response. It could say, "I'm sorry to hear the color wasn't what you expected. I've checked your order & it looks like the 'WELCOME10' code didn't apply correctly. I've refunded you the difference. We do have that shirt in 'Indigo Dream,' which is a bit darker than the 'Sky Blue' on the site. Would you like me to process an exchange for you?"
THAT is the future of customer engagement. It’s not just about providing answers 24/7. It's about providing understanding 24/7. When you think about lead generation or website optimization, the goal is to build a meaningful connection. A platform like Arsturn, which allows businesses to build these no-code AI chatbots, can now move beyond simple Q&A to create truly personalized & helpful experiences, guiding customers in a way that feels human.
The "agentic" feel that people are talking about with GPT-5 is real. It's this ability to take initiative. You don't have to provide every single step in your prompt anymore. You can "gesture vaguely at what you want," & the AI can fill in the blanks, sometimes in ways you didn't even think of. This is massive for productivity. It's the difference between being a manager who has to write a detailed instruction manual for every task & a leader who can just communicate their intent & trust their team to execute.

So, Have We Reached AGI?

Let's not get ahead of ourselves. No, GPT-5 is not AGI. The benchmarks prove it. On some of the really tough reasoning tests like ARC-AGI-2, it still has a long way to go to catch up with human performance. It still makes mistakes, it can still hallucinate (though it's much better), & it doesn't experience the world.
But... it feels like a step-change. It's the first time I've interacted with an AI & felt like the "artificial" part was starting to fade into the background. It's smarter, more reliable, & excels at the kind of complex work that businesses depend on. It can analyze, write, code, & problem-solve in a way that feels less like a tool & more like a collaborator.
My little AGI test was designed to find the limits. & for the first time, an AI didn't just meet the challenge; it transcended it. It showed a spark of something new – not just intelligence, but a synthesis of intelligence & creativity that felt genuinely different.
The future is going to be pretty wild. For now, I'm going to go see if I can get this Hindsight Compass funded. According to GPT-5, my 3-year projection is looking pretty good.
Hope this was helpful. Let me know what you think.

Copyright © Arsturn 2025