RTX 5090 vs 4090: Best GPU for Local AI & Ollama Rigs

8/12/2025

Here's the thing about building a rig for local AI: it's a completely different beast than building a gaming PC. Sure, there's a lot of overlap. You still need a good CPU, plenty of fast RAM, & a beefy power supply. But when it comes to the GPU, the calculus changes. It's not just about raw framerates anymore. It's about VRAM, memory bandwidth, & specialized cores that can handle the unique demands of large language models.

For a while now, the RTX 4090 has been the undisputed king of consumer-grade AI. Its 24GB of VRAM was a game-changer, opening the door for enthusiasts & small businesses to run some seriously powerful models right from their desktops. But now, there's a new challenger on the horizon: the RTX 5090.

The hype around this card is INSANE. And for good reason. It's the first consumer card built on Nvidia's new Blackwell architecture, which is already making waves in the data center world. The specs are eye-watering, the performance claims are bold, & the price tag is... well, we'll get to that.

So, the big question is: should you stick with the tried-and-true 4090, or is it time to start saving up for the 5090? Let's dive in & figure it out.

The Reigning Champ: A Quick Look at the RTX 4090

It's hard to overstate the impact the RTX 4090 has had on the local AI scene. Before the 4090, if you wanted to run a decent-sized LLM, you were pretty much stuck with renting cloud instances or shelling out for a professional-grade workstation card. The 4090 changed all that.

With its 24GB of GDDR6X VRAM, it was the first consumer card that could comfortably handle 13B models & even some of the smaller 30B models, especially with quantization. For anyone using Ollama to experiment with different open-source models, the 4090 has been an absolute dream.

But it's not just about the VRAM. The 4090's Ada Lovelace architecture brought some serious improvements in AI performance, with 4th-gen Tensor Cores that are specifically designed to accelerate the kind of math that makes LLMs tick. It's a powerful, reliable card that has proven its worth time & time again.

The New Contender: What's the Big Deal with the RTX 5090?

The RTX 5090 is more than just an incremental upgrade. It's a generational leap. Built on the new Blackwell architecture, this card is designed from the ground up for the AI era. Nvidia has been very clear about their focus on AI, & the 5090 is the first consumer-facing product that really shows what they've been working on.

So, what makes it so special? Let's break down the key differences.

Specs & Architecture: A Tale of Two Titans

Spec	RTX 4090	RTX 5090
Architecture	Ada Lovelace	Blackwell
CUDA Cores	16,384	21,760
VRAM	24 GB GDDR6X	32 GB GDDR7
Memory Bus	384-bit	512-bit
Memory Bandwidth	1,008 GB/s	1,792 GB/s
Tensor Cores	4th Gen	5th Gen
RT Cores	3rd Gen	4th Gen
TDP	450W	575W
MSRP	$1,599	$1,999

Just looking at the numbers, you can see the 5090 is in a different league. Over 5,000 more CUDA cores, a wider memory bus, & a nearly 80% increase in memory bandwidth. That's not a small jump. That's a HUGE leap.

The move to the Blackwell architecture is the real story here. Blackwell introduces 5th-generation Tensor Cores, which are even more efficient at handling the low-precision data types that are common in AI workloads. This means that not only can the 5090 process more data at once, but it can do it faster & more efficiently than the 4090.

VRAM: The Most Important Spec for Your Ollama Rig

If you're building a rig specifically for Ollama, VRAM is king. It's the single most important factor that will determine which models you can run & how fast they'll be. Think of VRAM as your GPU's workspace. The bigger your workspace, the bigger the projects you can take on.

With LLMs, the entire model needs to be loaded into VRAM to run at full speed. If the model is too big to fit, you have to offload parts of it to your system RAM, which is MUCH slower. This is where the 5090's 32GB of GDDR7 memory really shines.

That extra 8GB of VRAM over the 4090 is a massive deal. It's the difference between comfortably running a 30B parameter model & struggling to get it to load. It's the difference between experimenting with the latest & greatest open-source models & being stuck with the smaller, less capable ones. For serious Ollama users, that 32GB is the holy grail.

And it's not just the amount of VRAM, it's the speed. The 5090 uses the new GDDR7 memory standard, which is significantly faster than the GDDR6X in the 4090. This, combined with the wider 512-bit memory bus, gives the 5090 a staggering 1,792 GB/s of memory bandwidth. That's a 78% increase over the 4090. What does that mean in plain English? The 5090 can feed data to its cores MUCH faster, which is critical for the massive datasets involved in running LLMs.

Performance: What Do the Benchmarks Say?

Okay, so the specs are impressive. But how does that translate to real-world performance?

The short answer is: the 5090 is a monster.

In gaming, it's a significant step up from the 4090, especially at 4K. But honestly, if you're just gaming, both of these cards are probably overkill. Where the 5090 really pulls away is in AI & content creation workloads.

One benchmark showed the 5090 with a 154% lead in AI performance over the 4090. Another showed it with a 37% improvement in AI text generation tests using the Llama 3.1 model. In Blender, a popular 3D rendering application, the 5090 is about 35% faster than the 4090.

These numbers are still preliminary, as the card was just announced at CES 2025 with a release date of January 30, 2025. But all the early signs point to a massive performance uplift, especially in the areas that matter most for Ollama users.

Power, Price, & Practicalities

Of course, all that power comes at a cost. And I'm not just talking about the $1,999 price tag. The RTX 5090 is also a very power-hungry card, with a TGP of 575W. That's a big jump from the 4090's 450W, & it means you'll need a seriously beefy power supply to run it. Nvidia is recommending a minimum of a 1000W PSU.

Interestingly, despite the higher power draw, the 5090 is actually a slimmer card than the 4090. It features a new dual-slot cooler design, compared to the massive triple-slot cooler on the 4090. This is a welcome change for anyone who's struggled to fit a 4090 into a standard-sized case.

So, Which One is Right for Your Ollama Rig?

This is the million-dollar question. Or, well, the $2,000 question. And the answer, as always, is: it depends.

The Case for the RTX 5090

If you want the absolute best of the best, with no compromises, the RTX 5090 is the card to get. It's that simple.

You're a serious AI developer or researcher. If you're fine-tuning models, experimenting with different architectures, or just need to iterate as quickly as possible, the 5090's performance & VRAM will be a massive advantage. The time you save in training & inference will likely be worth the extra cost.
You want to run the biggest & best open-source models. With 32GB of VRAM, the 5090 will be able to handle models that are simply out of reach for the 4090. If you want to be on the cutting edge of local AI, the 5090 is the only choice.
You're building a new rig from scratch & have the budget. If you're already investing in a high-end CPU, motherboard, & RAM, it makes sense to pair it with the best GPU you can afford. The 5090 will be a more future-proof investment, ensuring your rig can handle the next generation of LLMs.

The Case for the RTX 4090

The RTX 5090 might be the new king, but that doesn't mean the 4090 is suddenly obsolete. Far from it. The 4090 is still an incredibly capable card, & it might be the smarter choice for a lot of people.

You're on a tighter budget. The 5090's $1,999 price tag is a lot to swallow. With the release of the 50-series, you can expect the price of used 4090s to drop, making it a much more accessible option.
You're primarily an enthusiast or hobbyist. If you're just using Ollama to experiment with different models, chat with your own personal AI assistant, or generate some creative text, the 4090 is more than enough. It can handle up to 13B models with ease & can even tackle some larger ones.
You already have a high-quality 850W-1000W PSU. Upgrading to a 5090 might require you to also upgrade your power supply, adding to the overall cost. If your current setup can handle a 4090, sticking with it could save you a lot of money & hassle.

The Arsturn Connection: Putting Your Local LLM to Work

Okay, so you've built your dream Ollama rig. You've got a powerful LLM running locally, and it's fast, private, & all yours. Now what?

This is where things get REALLY interesting, especially for small business owners, developers, or entrepreneurs. A powerful local LLM isn't just a cool toy; it's a powerful business tool waiting to be unleashed. And one of the most practical & high-impact ways to use it is to power a custom AI chatbot for your website.

Here's the thing, though: building a chatbot from scratch is hard. You need to handle the front-end interface, the back-end logic, the API connections... it's a whole thing. And that's where a platform like Arsturn comes in.

Arsturn is a no-code platform that lets you build custom AI chatbots trained on your own data. You can upload your documents, point it to your website, or feed it information from your local LLM. Arsturn handles all the complicated stuff, allowing you to create & deploy a smart, helpful chatbot in minutes, not months.

Imagine having a chatbot on your website that can instantly answer customer questions, provide 24/7 support, & even help with lead generation, all powered by the Ollama rig you just built. That's a pretty cool way to get a real return on your hardware investment. With a powerful GPU like the 5090, you could run a highly specialized model for your industry, & Arsturn could be the bridge that connects that model to your customers. It's a perfect example of how this kind of consumer-grade hardware is starting to enable some seriously professional-grade applications.

Final Thoughts

The RTX 5090 is, without a doubt, a revolutionary piece of hardware. It represents a new era of consumer-grade AI, & it's going to open up a world of possibilities for anyone interested in running LLMs locally. The performance gains & the massive 32GB of VRAM make it the undisputed champion for any serious Ollama user.

But let's be real: it's also a VERY expensive piece of kit. And for many people, the RTX 4090 is still more than enough. It's a fantastic card that can handle a wide range of models & will continue to be a workhorse in the local AI community for years to come.

Ultimately, the choice between the RTX 5090 & the RTX 4090 comes down to your budget & your specific needs. Are you trying to push the absolute limits of what's possible with local AI, or are you looking for a powerful yet practical solution for your home or small business?

Either way, it's an exciting time to be involved in this space. The hardware is getting more powerful, the software is getting more accessible, & the possibilities are expanding every single day.

Hope this was helpful! Let me know what you think.