GPT-5's Memory Problem: Understanding AI Context Windows

8/12/2025

Why GPT-5 Can't Remember Your Project Context Like Previous Models Did

Alright, let's talk about GPT-5. The hype was HUGE, right? We were all expecting this monumental leap towards AGI, an AI that would not just answer our questions but anticipate them, remember our entire project history, and basically finish our sentences. Instead, for a lot of us who use these tools daily, the launch felt… off.

If you've been working on a big project—coding a complex app, writing a novel, or analyzing a mountain of research—you might have felt it too. A weird sense of digital amnesia. You feed it a long document, have a detailed conversation, and then suddenly, it forgets a key detail from 10 messages ago. It’s like talking to someone who’s brilliant for five minutes but has the short-term memory of a goldfish.

Turns out, you’re not going crazy. For many users, especially those who were paying for ChatGPT Plus, the AI's effective memory didn't just fail to improve; it actually got worse. This is the great, frustrating irony of the GPT-5 launch. It’s a phenomenon some are calling “AI shrinkflation”—you’re paying the same, but the product feels like it’s been quietly downsized.

So, what in the world is going on? The answer is a messy mix of business decisions, surprising technical hurdles, & a fundamental shift in how OpenAI wants us to interact with its AI.

The Great Context Downgrade: What Actually Happened?

The root of the problem isn't that the underlying technology took a step backward. It's that OpenAI completely changed how it serves its models to us, the users.

Before the "GPT-5" launch, if you were a ChatGPT Plus subscriber, you had access to a variety of powerful models. One of the workhorses for people doing heavy-duty tasks was a model called o3. While it wasn't officially advertised with a specific token limit in the main interface, the community figured out it had a pretty generous context window, somewhere in the ballpark of 64,000 tokens. This was enough to handle hefty codebases, long research papers, or a significant chunk of a book. It was reliable.

Then came the big GPT-5 announcement. But here’s the twist: GPT-5 isn’t really one giant, new model. It’s better to think of it as a smart “router.” When you type a prompt, this router looks at your request & decides which internal model is best for the job. Is it a simple question? It’ll use a fast, cheap model. Is it a complex reasoning task? It might call on a more powerful, slower model.

On the surface, this sounds efficient. But the devil is in the details of the new subscription tiers. Here’s how the context window—the AI’s short-term memory—breaks down now in the ChatGPT app:

Free Users: 8,000 tokens
Plus Subscribers: 32,000 tokens
Pro & Enterprise: 128,000 tokens

Do you see the problem? That Plus subscriber who was happily cruising along with a ~64k context window with the old o3 model suddenly had their memory HALVED to 32k overnight. And to make it worse, the old models were retired. This wasn't an upgrade; for anyone whose work depended on long context, it was a forced downgrade, hidden behind the shiny new "GPT-5" label.

It’s no wonder people were frustrated. Context is EVERYTHING for productive, deep work. It’s the difference between an AI that can co-author a complex report with you & one that just keeps asking, "Now, what were we talking about?"

Why Is a Good Memory So Hard for an AI Anyway?

This whole situation begs a bigger question: why is context so difficult for these models in the first place? Why can't we just have an infinite memory? The reason goes back to the very architecture that makes them so smart: the Transformer.

The magic inside a Transformer is a mechanism called "self-attention." In VERY simple terms, for every word (or "token") in your prompt, the model looks at every other word to understand the relationships between them. This is how it figures out that in the sentence "The delivery truck blocked the driveway, so it was late," the word "it" refers to the delivery, not the driveway.

It's a brilliant design, but it has a massive scaling problem. The amount of computation required grows quadratically with the length of the input. This is known as O(n²) complexity.

Think of it like a business meeting. If you have 4 people, and everyone needs to speak one-on-one with everyone else, you have 6 conversations. If you double it to 8 people, you don't get 12 conversations; you get 28. If you have 100 people, you have nearly 5,000 conversations. The complexity explodes.

The same thing happens inside the AI. Doubling the context length doesn't double the work; it quadruples it. Processing a 100,000-token prompt is EXPONENTIALLY more difficult & expensive than processing a 10,000-token one. This is the technical & economic wall that all AI companies are up against.

But here’s another layer of nuance. Even for models that boast a massive context window—like Gemini 1.5 Pro with its 2 million tokens—there's a difference between the context window & the model's effective working memory. Just because you can stuff 500 pages of a book into the prompt doesn't mean the model can effectively reason about all of it. Research has shown that a model's performance can start to break down long before the hard token limit is hit, especially on tasks that require tracking many different details or plot points. It’s like being able to hold a whole library in your hands but only being able to actually read one paragraph at a time.

The Real-World Pain: When Context Fails

This isn't just a theoretical problem. The fallout from this context crunch is tangible & deeply frustrating for professionals who have built workflows around these tools.

For Coders: Imagine feeding your entire application's source code to the AI to help you debug a tricky issue. You have a long conversation, pointing out different functions & files. Then, suddenly, the AI starts hallucinating function names or forgetting the purpose of a class you defined 20 messages ago. The "all-seeing" coding partner you relied on is now just a confused junior dev.
For Researchers: You upload a 50-page academic paper, hoping the AI can help you summarize the methodology & cross-reference findings. The first few answers are great. But as you dig deeper, the model starts to contradict itself, forgetting key statistics from the introduction or misinterpreting the conclusion. The time you thought you were saving is now spent re-reading & fact-checking the AI.
For Writers & Marketers: You're trying to write a long piece of content, like a detailed whitepaper or a chapter of a novel. You've given the AI your style guide, character backstories, & plot outline. Halfway through, the tone starts to drift, a character's personality changes, or it forgets a critical plot point established earlier. The thread is lost, & the creative flow is broken.

Users have reported GPT-5's behavior as "rushed" & its memory as "glitchy." Sometimes it feels like it's having a conversation with someone else entirely. It’s this unreliability that cuts the deepest. When you can’t trust the tool’s memory, you can’t trust the tool.

The Path Forward: Stop Prompting a Model, Start Engineering Context for an Agent

So, are we doomed to AI amnesia? Not necessarily. But we do have to change how we think. The era of just dumping a huge blob of text into a prompt & hoping for the best is ending. The future is about shifting from prompting a model to engineering context for an agent.

What's the difference?

A model is reactive. It takes one input & produces one output. It has no memory beyond the current conversation.
An agent is autonomous. It can be given a complex goal, break it down into steps, use tools, learn from its mistakes, & maintain a state of memory over time.

GPT-5, with its new router architecture & tool-using capabilities, is clearly designed to be more of an agent. But an agent is only as good as the context it's given. The new, critical skill for working with AI is "Context Engineering." It's the art of giving the AI all the information it needs to solve a problem.

This goes WAY beyond a single prompt. Good context engineering involves managing several layers of information:

System Prompt/Instructions: This is the agent's constitution. The core rules, its persona, its goals. You don't just say "You are a helpful assistant." You say, "You are a senior Python developer specializing in the Django framework. You always provide code examples & explain the 'why' behind your suggestions."
State/History (Short-Term Memory): This is the current conversation. It's still important, but you have to be aware of its limits.
Retrieved Information (RAG): This is where you give the agent access to an external brain. Using Retrieval-Augmented Generation, the agent can pull in fresh, relevant information from documents, databases, or APIs precisely when it's needed, instead of trying to hold it all in its short-term memory.
Available Tools: You define what the agent can do. Can it search the web? Can it access your company's CRM? Can it run code? Giving an agent tools is more powerful than giving it raw text.
Long-Term Memory: This is a persistent knowledge base. A place where the agent can store key facts, user preferences, or project summaries across many different conversations.

This is a big shift. It means our job is less about writing the perfect magic prompt & more about being an architect of the AI's informational world.

When Your Business Can't Afford Amnesia: The Case for a Stable, Dedicated Knowledge Base

Adapting your personal workflow is one thing. But what if you're a business? You can't build a reliable customer support system or an internal knowledge base on a platform where the core functionality, like the memory, can be cut in half without warning. The volatility is just too high.

This is where a solution like Arsturn comes in. Instead of relying on a general-purpose model's fleeting, shared memory, Arsturn helps businesses create their own custom AI chatbots trained exclusively on their own data.

Think of it as creating a permanent, external memory for your business or project. It doesn’t matter if a public model's context window changes; your Arsturn bot's knowledge, trained on your website content, product documents, support tickets, & FAQs, remains stable & consistent.

Here’s why that’s a game-changer:

Amnesia-Proof Knowledge: The AI's brain is built on your information & only your information. Its memory is locked in & doesn't degrade or get confused by the millions of other conversations happening on a public platform.
The 24/7 Expert: For your customers, it provides instant, accurate answers drawn directly from your approved sources. For your internal team, it acts as a project expert that never forgets a key spec, a client preference, or a policy update. It's the perfect memory, on demand.
No-Code Simplicity: You don't need to be an AI Ph.D. to build this. Arsturn is a no-code platform that lets you build, train, & deploy a highly capable AI chatbot that can engage with website visitors, generate leads, & provide personalized customer support, all based on the stable foundation of your own data.

It’s about taking back control. Instead of being at the mercy of the next big tech update, you're building an AI asset that is reliable, secure, & perfectly tailored to your business needs.

So, here's the thing. GPT-5 is an incredibly powerful system, but its rollout has highlighted a massive disconnect between the needs of serious users & the product strategy of AI companies. The path forward requires us to be smarter "context engineers." But for businesses that depend on consistent, reliable information, the real solution might be to stop renting a slice of a public AI's brain & start building one of your own.

It's a weird shift, for sure. Hope this breakdown was helpful & clears up some of the confusion. Let me know what you think