Claude code with Private AI

In this article we are going to see how to use Claude code with our own private models Gemma4, Qwen3, GPT OSS 120 or uncensored ones.

Prerequisites

LM Studio installed wherever you want

Installation

Install Claude Code

curl -fsSL https://claude.ai/install.sh | bash

Configuration

Create a new file under ~/.claude/lmstudio.private-ai-server.json and add the following content:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://127.0.0.1:1234/",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "API_TIMEOUT _MS": "3000000",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1,
    "ANTHROPIC_MODEL": "default_model",
    "ANTHROPIC_SMALL_FAST_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "default_model"
  }
}

Test

Load a huge thinking Open Source model in LM Studio and set the context to the maximum limit. Then run the following command in your app repository claude --settings ~/.claude/lmstudio.private-ai-server.json. And finally select default_model using /model command after claude has started.

Voila you can now burn millions of tokens without spending a dime. Of course it’s not as fast as using Anthropic directly and not as good as using Opus 4.7 but for some non complex stuffs it’s enough. Also with a good multi-agent setup you could let it work autonomously while you sleep. This idea makes me dream personally. More to come.