In this article we are going to see how to use Claude code with our own private models Gemma4, Qwen3, GPT OSS 120 or uncensored ones.


Prerequisites

  • LM Studio installed wherever you want

Installation

Install Claude Code

curl -fsSL https://claude.ai/install.sh | bash

Configuration

Create a new file under ~/.claude/lmstudio.private-ai-server.json and add the following content:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://127.0.0.1:1234/",
    "ANTHROPIC_AUTH_TOKEN": "dummy",
    "API_TIMEOUT _MS": "3000000",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1,
    "ANTHROPIC_MODEL": "default_model",
    "ANTHROPIC_SMALL_FAST_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "default_model",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "default_model"
  }
}

Test

Load a huge thinking Open Source model in LM Studio and set the context to the maximum limit. Then run the following command in your app repository claude --settings ~/.claude/lmstudio.private-ai-server.json. And finally select default_model using /model command after claude has started.


Voila you can now burn millions of tokens without spending a dime. Of course it’s not as fast as using Anthropic directly and not as good as using Opus 4.7 but for some non complex stuffs it’s enough. Also with a good multi-agent setup you could let it work autonomously while you sleep. This idea makes me dream personally. More to come.