1. Speech Translation Overview
Learn how our speech translation products work—covering in-person meetings, multilingual video calls, and live broadcast translation tools.
Speech Translation is the core feature that powers real-time, multilingual conversations across in-person, video call, and broadcast modes. It enables participants to speak naturally in their own language while the system transcribes, translates, and optionally speaks the translation aloud — all in real time. This feature supports over 120 language and dialect pairs, and is designed to work across a wide range of use cases: from live customer interactions to internal staff meetings, remote consultations, and community engagement sessions. It’s a flexible, AI-powered alternative to live interpreters — fast, cost-effective, and available on demand. Behind the scenes, Speech Translation combines three services:
- Transcription: Capturing what is said with high accuracy and speaker separation
- Translation: Rendering the message into each participant’s preferred language
- Voiceover (optional): Using neural voices to play translated audio aloud
You can customise how translation behaves using glossaries, prompts, and preferred voice profiles. You can also enable or disable voice playback, choose from different speaker layouts, and download transcripts at the end of each conversation.
In the sections that follow, you’ll learn how to configure Speech Translation for:
- 🧳 In-person mode
- 🧑💻 Video Call mode
- 🎙️ Broadcast mode