OpenClaw supports rich media across all channels. Send and receive images, audio files, videos, and documents. The media pipeline handles processing, transcription, and storage efficiently.
Moltbot can handle images in multiple ways:
- Send Images - Share photos and screenshots
- Receive Images - Process and analyze images
- Image Analysis - Extract text, identify objects, describe scenes
- Camera Access - Take photos via iOS/Android nodes
- Screenshots - Capture and share screenshots
Image Processing
Images are processed to:
- Extract text (OCR)
- Describe content
- Identify objects and scenes
- Analyze for context
Audio handling includes:
- Voice Notes - Send and receive voice messages
- Audio Files - Process audio files
- Transcription - Automatic transcription of audio
- Voice Interaction - Voice wake and talk mode
Transcription Hooks
Configure transcription hooks to automatically transcribe voice notes:
{
"hooks": {
"transcription": {
"enabled": true,
"provider": "whisper"
}
}
}
Transcribed text is processed as regular messages, allowing you to interact via voice.
Video handling capabilities:
- Send Videos - Share video files
- Receive Videos - Process video content
- Video Analysis - Extract frames, analyze content
- Size Limits - Configurable size caps
The media pipeline handles:
- Upload - Receiving media from channels
- Storage - Temporary file storage
- Processing - Transcription, analysis, extraction
- Cleanup - Automatic temp file lifecycle management
- Size Management - Enforce size limits
Size Limits
Configure media size limits:
{
"media": {
"maxSize": "10MB",
"imageMaxSize": "5MB",
"audioMaxSize": "10MB",
"videoMaxSize": "50MB"
}
}
Moltbot provides tools for media handling:
- Camera - Take photos via nodes
- Images - Process and analyze images
- Audio - Handle audio files and transcription
- Location - Send and receive location data