Model Providers

Supported LLM providers and models for OpenClaw

OpenClaw supports a wide range of LLM providers, from subscription-based services like Anthropic and OpenAI to local models you run entirely on your machine. Choose the provider that best fits your needs for privacy, cost, and performance.

Recommended Providers

Anthropic (Claude)

Recommended: Anthropic Pro/Max (100/200) + Opus 4.6

  • Why: Long-context strength and better prompt-injection resistance
  • Models: Claude Opus 4.6, Opus 4.5, Claude Sonnet, Claude Haiku
  • Authentication: OAuth (subscription) or API keys
  • Best For: High-quality responses, long conversations, security-conscious users

OpenAI (GPT)

Supports ChatGPT and Codex models:

  • Models: GPT-4, GPT-3.5, Codex (including gpt-5.3-codex)
  • Authentication: OAuth (subscription) or API keys
  • Best For: Code generation, general purpose tasks

Other Supported Providers

  • Gemini - Google's Gemini models, including Gemini 3.1 (google/gemini-3.1-pro-preview) from 2026.2.21
  • Volcano Engine (Doubao) / BytePlus - Doubao, Kimi, GLM, DeepSeek; coding variants via volcengine-plan; onboard with volcengine-api-key
  • Moonshot - Moonshot AI (Kimi) models
  • Minimax - Minimax models (popular for local-like performance)
  • Vercel AI Gateway - Unified gateway for multiple providers
  • OpenRouter - Access to multiple models through one API
  • Bedrock - AWS Bedrock models
  • GLM - GLM (Zhipu) models via Volcano Engine or Z.AI
  • Zai - Z.AI provider for GLM-5, GLM-4.7, Coding Plan
  • xAI (Grok) - xAI Grok models
  • Hugging Face Inference - Run models via the Hugging Face Inference API (first-class onboarding and API key support)
  • vLLM - Self-hosted OpenAI-compatible inference. Llama, Mistral, Qwen. High throughput.

Ollama (Local + Cloud)

Ollama is the fastest way to run OpenClaw on Mac and Linux. With Ollama 0.17+, a single command installs and configures everything:

One command
ollama launch openclaw --model kimi-k2.5:cloud

Ollama supports local models (full privacy, no API costs) and cloud models (e.g. kimi-k2.5, minimax-m2.5, glm-5) with full context. OpenClaw integrates via the native Ollama API for streaming and tool calling. Use baseUrl: "http://host:11434" (not /v1). See Ollama + OpenClaw tutorial and docs.openclaw.ai/providers/ollama.

Local Models (General)

Run models entirely on your machine for complete privacy:

  • Complete Privacy - No data leaves your machine
  • No API Costs - Free to run (after initial setup)
  • Offline Capable - Works without internet
  • Customizable - Fine-tune and modify models

Configure local models in your Gateway configuration. Requires sufficient hardware (GPU recommended for best performance). Ollama (above) is the recommended path for local setup. For high-throughput self-hosting, see vLLM.

Hardware Requirements (Local Models)

Rough guidelines for running local models. Actual needs depend on model size and inference server (Ollama, vLLM, etc.):

  • 7B models — 8–16GB VRAM (or CPU with 16GB+ RAM, slower)
  • 13B models — 16–24GB VRAM
  • 34B–70B models — 48–80GB VRAM or multi-GPU

For lighter setups, use Ollama (optimized for consumer hardware), cloud-backed models via Ollama, or API providers.

Authentication Methods

OAuth (Subscription-Based)

For subscription services like Claude Pro/Max and ChatGPT Plus:

  • Authenticate using your subscription account
  • No API keys needed
  • Uses your subscription limits
  • More convenient for personal use

API Keys

For pay-per-use or API-based access:

  • Configure API keys in credentials
  • Pay per token/request
  • More control over costs
  • Better for production/enterprise use

Model Failover

Configure automatic failover for reliability:

Failover Config
{
  "agent": {
    "model": "anthropic/claude-opus-4-6",
    "fallback": [
      "anthropic/claude-sonnet",
      "openai/gpt-4"
    ]
  }
}

If the primary model fails or is unavailable, OpenClaw automatically tries fallback models in order.

Profile Rotation

Rotate between multiple authentication profiles:

  • Use multiple API keys or OAuth accounts
  • Distribute load across profiles
  • Handle rate limits automatically
  • Increase reliability and throughput

Model Routing & Cost

Running an always-on agent with a single premium model for every task can get expensive. Many users adopt tiered model routing:

  • Main chat: Use a strong model (e.g. Claude Opus) for direct conversations and important decisions.
  • Heartbeats and cron: Use a cheaper, fast model (e.g. Gemini Flash) for scheduled “wake up and check” tasks—see Automation for cron and heartbeats.
  • Subagents / background work: Route background tasks to a capable but cost-effective model so the main session stays responsive.

Costs vary with usage (from tens to hundreds of dollars per month). For example setups and cost management, see Example Setups & Model Routing.

If you prefer automatic per-request routing—each query sent to the cheapest model that can handle it, without assigning tiers manually—third-party plugins can do that. For example, ClawRouter (BlockRunAI) runs routing locally, supports 41+ models through one wallet, and uses pay-per-request (USDC on Base). Set your model to blockrun/auto after installing the plugin. See the Skills list for discovery; we don’t endorse or guarantee third-party tools.

Usage Tracking

OpenClaw tracks model usage:

  • Monitor token usage per model
  • Track costs (for API-based providers)
  • View usage statistics
  • Set usage limits and alerts

Configuration Examples

Minimal Configuration

Minimal Config
{
  "agent": {
    "model": "anthropic/claude-opus-4-6"
  }
}

With Failover

Failover Config
{
  "agent": {
    "model": "anthropic/claude-opus-4-6",
    "fallback": ["anthropic/claude-sonnet"]
  }
}

See also