vLLM with OpenClaw
Self-host LLMs locally with vLLM—OpenAI-compatible inference
Self-host LLMs locally with vLLM—OpenAI-compatible inference
vLLM is a high-throughput inference server for LLMs. It exposes an OpenAI-compatible API, so OpenClaw can connect to it like any OpenAI endpoint. Run Llama, Mistral, Qwen, and other models locally for full privacy and zero API costs.
Prefer Ollama if you want simpler one-command setup. Use vLLM when you need maximum throughput, multi-model serving, or compatibility with vLLM's model format.
Point OpenClaw at your vLLM server's OpenAI-compatible endpoint:
{
"agent": {
"model": "meta-llama/Llama-3.2-8B-Instruct",
"provider": "openai",
"baseUrl": "http://localhost:8000/v1",
"apiKey": "dummy"
}
}
vLLM often accepts apiKey: "dummy" for local servers. Set baseUrl to your vLLM instance (default port 8000). Use the model name vLLM serves (e.g. from --model flag).
vLLM typically needs:
See Hardware requirements for local models. For lighter setups, use Ollama or cloud providers.