Voice Features
Voice wake, talk mode, and audio interaction with OpenClaw
Voice wake, talk mode, and audio interaction with OpenClaw
This page covers OpenClaw’s voice features. You’ll find voice wake (say a phrase to activate OpenClaw hands-free), talk mode (speak and hear replies, with ElevenLabs for high-quality speech), how audio and voice notes are handled, and how to have the agent call you outbound. Config snippets show how to turn voice wake and talk mode on; platform notes explain where each feature works (macOS, iOS, Android). If you’re setting up voice for the first time, the voice setup tutorial walks through wake, talk, and ElevenLabs.
Voice wake keeps the app listening for a wake phrase so you can start a conversation without touching the keyboard. You get hands-free activation; processing is local where the platform allows. On macOS it plugs into system speech recognition and the menu bar app, with a configurable wake word and background listening. On iOS and Android, voice wake is available through the companion nodes. Enable it in config with voice.wake.enabled: true and set voice.wake.wakeWord (e.g. "hey openclaw"). See the Configuration section below for a full example.
Talk mode is two-way voice: you speak, the agent replies with synthesized speech. You get natural back-and-forth without typing. OpenClaw uses ElevenLabs for text-to-speech by default; you enable talk mode in config with voice.talk.enabled: true and set voice.talk.provider and voice.talk.voice. Supported on macOS and on iOS/Android via the companion apps. Step-by-step: Voice setup tutorial.
OpenClaw can send and receive voice messages (voice notes) on supported channels. Incoming audio can be transcribed automatically and the text fed into the agent as a normal message; you configure transcription hooks so the pipeline uses your chosen service and the result is treated as user input. That gives you voice notes as first-class messages and consistent handling of audio across channels. Config is per channel; for media and pipelines in general, see Media handling.
Your agent can place a call to you—for a morning brief, an alert, or a check-in. The flow uses ElevenAgents (ElevenLabs) and Twilio plus a skill so the agent initiates the call. Full walkthrough: Have OpenClaw call you.
Voice wake and talk mode are toggled and tuned in your main config file (~/.openclaw/openclaw.json or ~/.clawdbot/moltbot.json). Add a voice block and set wake.enabled / talk.enabled; for talk mode, set provider (e.g. elevenlabs) and voice. Restart the Gateway after changing config.
Voice wake
{
"voice": {
"wake": {
"enabled": true,
"wakeWord": "hey openclaw"
}
}
}
Talk mode
{
"voice": {
"talk": {
"enabled": true,
"provider": "elevenlabs",
"voice": "default"
}
}
}
More options: Configuration guide.
Voice is useful when you can’t type—driving, cooking, or multitasking—or when you prefer speaking (accessibility, quick questions). You get the same agent logic; the interface is speech in and out instead of text.
Voice wake and talk mode are available on macOS (menu bar app, system speech recognition, background processing), and on iOS and Android through the companion nodes that connect to your Gateway. Install the right app for your device and ensure the node is paired; then voice features follow the same config. Companion apps.