Voice Features

Voice wake, talk mode, and audio interaction with OpenClaw

This page covers OpenClaw’s voice features. You’ll find voice wake (say a phrase to activate OpenClaw hands-free), talk mode (speak and hear replies, with ElevenLabs for high-quality speech), how audio and voice notes are handled, and how to have the agent call you outbound. Config snippets show how to turn voice wake and talk mode on; platform notes explain where each feature works (macOS, iOS, Android). If you’re setting up voice for the first time, the voice setup tutorial walks through wake, talk, and ElevenLabs.

Voice wake

Voice wake keeps the app listening for a wake phrase so you can start a conversation without touching the keyboard. You get hands-free activation; processing is local where the platform allows. On macOS it plugs into system speech recognition and the menu bar app, with a configurable wake word and background listening. On iOS and Android, voice wake is available through the companion nodes. Enable it in config with voice.wake.enabled: true and set voice.wake.wakeWord (e.g. "hey openclaw"). See the Configuration section below for a full example.

Talk mode

Talk mode is two-way voice: you speak, the agent replies with synthesized speech. You get natural back-and-forth without typing. OpenClaw uses ElevenLabs for text-to-speech by default; you enable talk mode in config with voice.talk.enabled: true and set voice.talk.provider and voice.talk.voice. Supported on macOS and on iOS/Android via the companion apps. Step-by-step: Voice setup tutorial.

Audio handling

OpenClaw can send and receive voice messages (voice notes) on supported channels. Incoming audio can be transcribed automatically and the text fed into the agent as a normal message; you configure transcription hooks so the pipeline uses your chosen service and the result is treated as user input. That gives you voice notes as first-class messages and consistent handling of audio across channels. Config is per channel; for media and pipelines in general, see Media handling.

Outbound phone calls

Your agent can place a call to you—for a morning brief, an alert, or a check-in. The flow uses ElevenAgents (ElevenLabs) and Twilio plus a skill so the agent initiates the call. Full walkthrough: Have OpenClaw call you.

Configuration

Voice wake and talk mode are toggled and tuned in your main config file (~/.openclaw/openclaw.json or ~/.clawdbot/moltbot.json). Add a voice block and set wake.enabled / talk.enabled; for talk mode, set provider (e.g. elevenlabs) and voice. Restart the Gateway after changing config.

Voice wake

Voice wake
{
  "voice": {
    "wake": {
      "enabled": true,
      "wakeWord": "hey openclaw"
    }
  }
}

Talk mode

Talk mode
{
  "voice": {
    "talk": {
      "enabled": true,
      "provider": "elevenlabs",
      "voice": "default"
    }
  }
}

More options: Configuration guide.

When voice helps

Voice is useful when you can’t type—driving, cooking, or multitasking—or when you prefer speaking (accessibility, quick questions). You get the same agent logic; the interface is speech in and out instead of text.

Platform support

Voice wake and talk mode are available on macOS (menu bar app, system speech recognition, background processing), and on iOS and Android through the companion nodes that connect to your Gateway. Install the right app for your device and ensure the node is paired; then voice features follow the same config. Companion apps.

Related docs