8/11/2025

Sharing is Caring: How to Configure Proxmox for Efficient GPU Passthrough to Multiple Ollama VMs

Hey everyone! So, you've got a beefy GPU & you're looking to dive into the world of local AI with Ollama. That's awesome. But then you hit a snag: you want to run multiple Ollama instances, maybe for different models or for different users, but you've only got one GPU. How do you share it efficiently across multiple virtual machines in Proxmox?
Honestly, it's a common question, & the answer isn't as simple as just clicking a button. Standard GPU passthrough is a one-to-one deal – one GPU for one VM. But don't worry, it's not a dead end. Turns out, there are a few ways to tackle this, each with its own set of pros, cons, & technical hurdles.
In this guide, I'm going to walk you through the different methods for sharing a single GPU with multiple VMs or containers in Proxmox, specifically with the goal of running multiple Ollama instances. We'll cover everything from the "enterprise" solution to the more "hacky" but effective methods.

The Challenge: One GPU, Many VMs

Before we dive in, let's be clear about the problem. When you do a standard PCI passthrough in Proxmox, you're essentially giving a physical GPU exclusively to a single virtual machine. This is great for performance, but it's not great for sharing. If you try to pass the same GPU to another VM, it just won't work.
So, what are our options? We're going to explore three main paths:
  1. NVIDIA vGPU: The official, enterprise-grade solution for sharing NVIDIA GPUs.
  2. SR-IOV (Single Root I/O Virtualization): A hardware-based feature that lets you split a GPU into multiple virtual "slices."
  3. LXC Containers: A lightweight alternative to VMs that makes GPU sharing MUCH easier.
Let's break 'em down.

Method 1: NVIDIA vGPU - The "By the Book" Approach

If you've got a supported NVIDIA data center GPU (think Tesla, Quadro, or the A-series cards), then vGPU is the "official" way to go. It's a powerful technology that lets you carve up a single GPU into multiple virtual GPUs (vGPUs) that can be assigned to different VMs.
How it Works:
NVIDIA vGPU works by installing a special driver, the "GPU Manager," on the Proxmox host. This manager software creates virtual GPU profiles that you can then assign to your VMs. Each VM gets its own dedicated slice of the GPU's power, & the GPU Manager handles the scheduling & resource allocation.
The Pros:
  • Official Support: This is a fully supported solution from both NVIDIA & Proxmox (with a valid subscription).
  • Granular Control: You can create different vGPU profiles with varying amounts of VRAM & compute power, tailoring each VM's resources to its specific needs.
  • Great for VDI & AI: vGPU is widely used in enterprise environments for virtual desktop infrastructure (VDI) & for sharing GPUs for AI workloads.
The Cons:
  • Cost: This is the biggest drawback. You need a supported NVIDIA data center GPU, which can be expensive. On top of that, you need a vGPU software license from NVIDIA, which is a recurring cost.
  • Complexity: The setup process is more involved than standard passthrough. You'll need to download the vGPU software, install the GPU Manager on the host, & then install specific guest drivers in each VM.
  • Limited Hardware Support: vGPU is only available on specific NVIDIA GPUs. Your shiny new GeForce RTX card won't cut it.
The Gist of the Setup:
  1. Check Your Hardware: Make sure you have a supported NVIDIA GPU.
  2. Get the Goods: You'll need to sign up for an NVIDIA Enterprise account to get the vGPU software & a trial license.
  3. Prep Your Host: This involves enabling virtualization features in your BIOS like VT-d & SR-IOV.
  4. Install the GPU Manager: You'll install the NVIDIA GPU Manager on your Proxmox host.
  5. Create vGPU Profiles: You'll define the different vGPU profiles you want to use.
  6. Assign to VMs: In the Proxmox web interface, you'll add a PCI device to your VM & select the vGPU profile you want to assign.
  7. Install Guest Drivers: Finally, you'll install the NVIDIA guest drivers inside each VM.
Is it for you? If you're running a business or a serious homelab with a supported NVIDIA card & the budget for licensing, vGPU is a robust & reliable solution. For the average home user, though, the cost is likely a deal-breaker.

Method 2: SR-IOV - The Hardware-Based Split

SR-IOV, or Single Root I/O Virtualization, is a hardware feature that allows a single PCI device, like a GPU, to appear as multiple separate physical devices. These are called Virtual Functions (VFs), & each VF can be passed through to a separate VM.
How it Works:
With SR-IOV, the GPU's own hardware is responsible for splitting itself into multiple VFs. The hypervisor (Proxmox) then just sees these VFs as individual PCI devices that can be passed through to VMs. This is different from vGPU, which relies on a software manager on the host.
The Pros:
  • Near-Native Performance: Because it's a hardware-based solution, SR-IOV has very low overhead & can offer performance that's very close to native.
  • No Licensing Fees (Usually): Unlike vGPU, SR-IOV is a feature of the hardware itself, so you don't typically need to pay for special licenses.
  • Not Just for NVIDIA: While some NVIDIA cards support SR-IOV, it's also available on some Intel & AMD GPUs. In fact, it's a popular way to share newer Intel iGPUs.
The Cons:
  • Limited & Sometimes Experimental Hardware Support: SR-IOV is not available on all GPUs. It's more common on enterprise-grade cards, though it's becoming more common on consumer-grade hardware, especially Intel's integrated GPUs. Getting it to work can sometimes be a bit experimental, especially on newer hardware.
  • Can be Tricky to Configure: The setup process can be complex & may require you to compile custom kernel modules or fiddle with kernel parameters.
  • VFs are Identical: Unlike vGPU, where you can create different profiles, the VFs created by SR-IOV are typically identical in terms of their capabilities.
The Gist of the Setup:
  1. Check Your Hardware: You'll need to confirm that your GPU & your motherboard's BIOS support SR-IOV.
  2. Enable in BIOS: You'll need to enable SR-IOV, VT-d, & other virtualization features in your BIOS.
  3. Enable on the Host: This usually involves loading specific kernel modules & setting kernel parameters to tell the GPU to create the VFs.
  4. Pass Through the VFs: Once the VFs are created, they'll appear as separate PCI devices in Proxmox, & you can pass them through to your VMs just like you would with a regular GPU.
Is it for you? If you have a supported GPU (especially a newer Intel iGPU) & you're comfortable with some command-line work, SR-IOV is a great way to get hardware-accelerated GPU sharing without the licensing costs of vGPU.

Method 3: LXC Containers - The Simple & Effective Route

Now, for what is often the most practical solution for home users: LXC containers. LXC (Linux Containers) are a lightweight alternative to full-blown virtual machines. They share the host system's kernel, which makes them incredibly efficient. & it's this kernel sharing that makes GPU passthrough to multiple containers a whole lot easier.
How it Works:
Instead of virtualizing the hardware, you're essentially just making the host's GPU available to the containers. You install the GPU drivers on the Proxmox host itself, & then you configure your LXC containers to be able to access the GPU's device files. Because all the containers are using the same host drivers, they can all access the GPU at the same time.
The Pros:
  • Simplicity: This is by far the easiest method to set up. There are no special licenses or hardware features required.
  • Efficiency: LXC containers have very little overhead, so you can run a lot of them without a huge performance penalty.
  • Works with Most GPUs: As long as you can get the drivers working on your Proxmox host, you should be able to share the GPU with your LXC containers.
The Cons:
  • Less Isolation: Because containers share the host's kernel, they're not as isolated as VMs. A problem in one container could potentially affect the host or other containers.
  • Driver Management: You need to make sure that the driver version on the host is compatible with the software you're running inside the containers. This can sometimes be a bit of a pain.
The Gist of the Setup:
  1. Install Drivers on the Host: The first step is to install the appropriate GPU drivers on your Proxmox host.
  2. Identify GPU Device Files: You'll need to find the device files for your GPU, which are usually located in
    1 /dev/dri
    or
    1 /dev/nvidia*
    .
  3. Configure the LXC Container: You'll need to edit the container's configuration file to allow it to access the GPU device files. This involves adding a few lines to the container's
    1 .conf
    file.
  4. Set Permissions: You may need to adjust user & group permissions to ensure that the user inside the container has the right to access the GPU.
Is it for you? For running multiple Ollama instances on a single GPU, the LXC container approach is often the best bet. It's the most straightforward to set up, it's efficient, & it doesn't require any special hardware or licenses.

So, Which One Should You Choose?

Here's a quick rundown to help you decide:
  • Go with vGPU if: You have a supported NVIDIA data center card, you need official support, & you're not afraid of the licensing costs.
  • Go with SR-IOV if: You have a GPU that supports it (like a newer Intel iGPU), you want the performance of hardware-based virtualization, & you're up for a bit of a technical challenge.
  • Go with LXC Containers if: You want the simplest & most cost-effective way to share your GPU with multiple Ollama instances, & you're okay with the slightly lower level of isolation that containers provide.

A Note on Customer Experience

Now, you might be thinking, "What does all this have to do with my business?" Well, if you're using Ollama to power a customer-facing chatbot or an internal knowledge base, the performance & reliability of your setup are key. This is where a platform like Arsturn can come in handy. Arsturn helps businesses create custom AI chatbots trained on their own data. These chatbots can provide instant customer support, answer questions, & engage with website visitors 24/7. A properly configured, GPU-accelerated Proxmox server can provide the robust backend that a service like Arsturn needs to deliver a seamless & responsive user experience. Imagine having multiple, specialized chatbots running on your own hardware, each one fine-tuned for a different aspect of your business. That's the kind of power that this kind of setup can unlock.

Conclusion

So there you have it – a tour of the different ways you can share a single GPU with multiple Ollama VMs or containers in Proxmox. It's not a one-size-fits-all situation, so you'll need to weigh the pros & cons of each method & choose the one that best fits your hardware, your budget, & your technical comfort level.
For most home users, I'd say the LXC container route is the way to go. It's the most accessible & it gets the job done without a lot of fuss. But if you've got the hardware & the need for more advanced features, vGPU & SR-IOV are powerful options to have in your back pocket.
I hope this was helpful! Let me know what you think, & if you have any questions, feel free to drop them in the comments below. Happy virtualizing!

Copyright © Arsturn 2025