Use Local LLMs like GLM 4.5 & Qwen with Claude Code

8/11/2025

So, you've been hearing the buzz about running large language models locally, right? It’s a pretty exciting frontier. You get more control, potentially more privacy, & the ability to tinker with models that aren't the usual suspects. One of the questions that keeps popping up is how to get these powerful local models, like Qwen & GLM 4.5, to play nicely with sophisticated coding environments like Claude Code.

Honestly, it's a bit of a game-changer when you get it working. You're essentially taking a super-smart, specialized coding assistant & hooking it up to a powerful, agentic command-line interface. The result? A highly customized & efficient development workflow. Let's dive into how this all works.

Why Even Bother Integrating Local LLMs with Claude Code?

First off, let's talk about why this is such a compelling idea. Claude Code is an incredible tool right out of the box. It’s designed for those of us who live in the terminal. It excels at things like file editing, running tests, handling git commands, & even end-to-end debugging, all through natural language. It's got that agentic workflow down, meaning it can reason & take actions on its own to a certain degree.

But here's the thing: what if you could swap out the brain? That's where local LLMs (or, more accurately, models you choose to run, whether on your own hardware or through a flexible API) come in.

Specialized Models: Models like GLM 4.5 & the Qwen series are absolute powerhouses when it comes to coding. GLM 4.5, for example, has shown some seriously impressive performance on coding benchmarks, sometimes even rivaling the big names like GPT-4. It’s built with agentic reasoning in mind, which makes it a natural fit for Claude Code.
Cost-Effectiveness: Let's be real, constantly hitting the APIs of the big proprietary models can get pricey. Using other models through services like OpenRouter or Novita AI can be significantly cheaper. GLM 4.5's API, for instance, is priced very competitively, making it a great option for extensive use.
Experimentation & Flexibility: Maybe you've found that a particular model is just better at your specific kind of work, whether that's frontend development, data science, or backend logic. Being able to plug different models into your workflow lets you find the perfect fit.

The "Local" Misconception: API is King (For Now)

Before we get into the nitty-gritty, let's clear up a common point of confusion. When people talk about integrating "local" LLMs in this context, they're often not talking about running a massive model on their laptop. While that's the ultimate goal for some, the most practical & common method right now is to use an API that provides access to these models.

Think of it as a middle ground. You're not using the default Claude model, but you're also not wrestling with the complexities of hosting a 355-billion-parameter model yourself. You're using a service that hosts the model for you & gives you an API key. This is the approach we'll be focusing on because it's the most accessible & well-documented way to get this done.

The Key Ingredient: Claude Code Router

So, how do you actually make the connection? The magic happens through a tool called the Claude Code Router. This nifty piece of software acts as a middleman. It intercepts the requests from Claude Code & forwards them to the LLM provider of your choice.

The router is brilliant because it handles the messy business of translating between different API formats. Claude Code is obviously built to talk to Anthropic's own models, but the router can make it talk to almost any model that has an API endpoint, as long as it's configured correctly.

This is what allows you to use models from Zhipu AI (the creators of GLM) or access a whole library of models through a service like OpenRouter.

Let's Get Practical: Integrating GLM 4.5 with Claude Code

GLM 4.5 is a fantastic model to start with because it's so well-suited for this kind of work. It’s a massive 355-billion-parameter model that uses a Mixture-of-Experts (MoE) architecture, making it incredibly efficient for its size. It has a huge 128K token context window, which is great for understanding large codebases.

Here’s a step-by-step guide to get it up & running, based on what others have successfully done.

Step 1: Get Your Tools in Order

First, you’ll need to install Claude Code itself & the Claude Code Router. You can typically do this through npm: