Mar 4, 2026

How to Run Qwen3.5-9B Locally with LM Studio for OpenClaw

Run your own AI agent entirely on your machine using LM Studio and OpenClaw — no API costs, no data leaving your computer, just 3 steps to full local inference.

Server Compass TeamMar 4, 2026
How to Run Qwen3.5-9B Locally with LM Studio for OpenClaw

Want to run AI agents without sending your data to external APIs? With LM Studio and OpenClaw, you can run Qwen3.5-9B entirely on your local machine—no API costs, complete privacy, and full control over your inference pipeline.

This guide walks you through the three steps to get a local AI agent running on your machine.

Why Run AI Models Locally?

Running large language models locally offers several advantages:

  • Privacy: Your data never leaves your machine
  • No API costs: Pay once for the hardware, run unlimited inference
  • Low latency: No network round-trips for each request
  • Offline capability: Works without an internet connection
  • Full control: Customize model parameters, context length, and behavior

Step 1: Install LM Studio

LM Studio is a desktop application that makes it easy to download, run, and serve local LLMs. It handles all the complexity of model loading, quantization, and serving an OpenAI-compatible API.

  1. Visit lmstudio.ai
  2. Download the installer for your operating system (Windows, macOS, or Linux)
  3. Run the installer and follow the setup wizard

LM Studio requires a decent GPU for optimal performance, but can also run on CPU-only systems (albeit slower). For Qwen3.5-9B, you'll want at least 16GB of RAM or a GPU with 8GB+ VRAM.

Step 2: Download Qwen3.5-9B

Qwen3.5-9B is a capable 9-billion parameter model that strikes a good balance between speed and quality. It's excellent for coding tasks, drafting, and general assistance.

  1. Open LM Studio
  2. Go to the model browser or visit lmstudio.ai/models/qwen/qwen3.5-9b
  3. Click Download to fetch the model weights

The download size is typically 5-10GB depending on the quantization level. Choose Q4_K_M for a good balance of quality and memory usage, or Q8_0 if you have the VRAM to spare.

Step 3: Load the Model and Start the Server

Once downloaded, you need to load the model and start the local inference server:

  1. Go to the Chat tab in LM Studio
  2. Select Qwen3.5-9B from the model dropdown
  3. Click Load to load the model into memory
  4. Switch to the Developer tab
  5. Click Start Server
LM Studio Developer tab showing how to load Qwen3.5-9B model and start the local server

The server runs at http://127.0.0.1:1234 by default. You can verify it's working by visiting:

http://127.0.0.1:1234/v1/models

You should see a JSON response listing your loaded model. Your local AI is now ready to accept requests.

Using Qwen as a Sub-Model in OpenClaw

Small local models like Qwen3 9B are great for fast drafting and coding tasks, but they come with important security considerations. Smaller models are:

  • More susceptible to prompt injection attacks
  • Less reliable at following tool-safety rules
  • Easier to manipulate into unintended behaviors

For this reason, we recommend using local models as subagents rather than your main agent.

The safest approach is a hybrid architecture:

  • Main agent: Use a top-tier model (Claude Opus, GPT-5) with full tool access
  • Subagent: Use Qwen3 only for contained tasks (summaries, refactors, drafts)
  • Sandboxing: Restrict tools so even if the subagent is manipulated, the blast radius is minimal

Why This Matters

Consider these risk scenarios:

  • Untrusted inbox risk: If your agent reads content from Reddit, Telegram, or the web, that content could contain prompt injection attacks. Smaller models are more likely to comply with malicious instructions.
  • Tool misuse risk: Web, browser, and file tools can leak sensitive data if the model is coerced into using them improperly.
  • Containment: A subagent with no web access and strict sandboxing can't roam your system or exfiltrate data.

Configuration for a Locked-Down Qwen Worker

Here's an example OpenClaw configuration that creates a secure Qwen worker:

{
  "agents": {
    "list": {
      "qwen-worker": {
        "model": "lmstudio/qwen3.5-9b",
        "sandbox": { "mode": "all" },
        "tools": {
          "deny": ["group:web", "browser"]
        }
      }
    }
  }
}

This configuration:

  • Points to your local LM Studio server
  • Enables full sandboxing for all operations
  • Denies access to web and browser tools

You can then spawn this worker for bounded tasks from your main agent, or configure OpenClaw to use it as a subagent automatically.

Best Practices for Local AI Agents

  • Never use local models as your main inbox agent that processes untrusted content
  • Always sandbox subagents with minimal tool permissions
  • Use local models for bounded tasks: code refactoring, summarization, draft generation
  • Keep sensitive operations (file writes, web requests, system commands) on your main agent with a more capable model
  • Monitor resource usage: local models can consume significant RAM and GPU resources

Conclusion

Running Qwen3.5-9B locally with LM Studio and OpenClaw gives you a powerful, private AI assistant. Your data stays on your machine, you pay no API costs, and you have complete control over the inference pipeline.

Just remember: use local models as specialized workers for contained tasks, not as your primary agent handling untrusted inputs. With the right architecture, you get the best of both worlds—the privacy of local inference and the reliability of top-tier cloud models for critical operations.

Your AI agent now runs 100% locally. Happy building.

Related reading