Media Skills

Music, voice, and image generation for OpenClaw

Media skills add music control, text-to-speech, and image generation to OpenClaw. Install from ClawHub with openclaw skills install <skill-name>.

Music and Audio

  • Spotify - Music playback and search
  • Sonos CLI - Control Sonos speakers
  • Speech / TTS - Text-to-speech and speech. Voice

Image generation

Skills in this category call external image APIs (DALL·E, Stable Diffusion hosts, etc.). The agent can generate or edit images when you ask in chat—useful for thumbnails, mockups, and social posts. Pair with Content creation for workflow ideas.

  • Search ClawHub for image, dalle, or flux skill names.
  • Review API keys and per-image cost before enabling in production.

Install and enable

  1. Browse clawhub.ai/skills or ClawHub guide.
  2. Follow safe install checklist for community packages.
  3. Run openclaw skills install <skill-name> and confirm with openclaw skills list.
  4. Restart the Gateway if the skill docs require it.

Voice-specific setup (wake word, TTS) is covered on Voice and Voice setup tutorial.

Other categories

Productivity - Calendar, Gmail, Todoist. Dev and Infrastructure - GitHub, Docker, n8n. Research - Web search, deep research. Media - Spotify, Sonos, voice. Channels - Slack, Discord, WhatsApp. All Skills - Full list.

Permissions and tokens

Media skills need OAuth or LAN tokens. Store via Secrets. Test on a private channel before enabling playback in public rooms.

Latency and LAN

Sonos and LAN TTS skills fail when the Gateway runs in a remote VPS but speakers are at home—run those skills on a Gateway that can reach your LAN or use cloud TTS instead.