Text-to-Speech
Overview
Text-to-speech on the PlayKit platform — capabilities and how it works across SDKs
Text-to-Speech
Text-to-Speech (TTS) turns written text into spoken audio — for NPC dialogue, narration, accessibility readouts, or any in-game voice feedback.
Use the default-tts-model alias to stay on the recommended TTS model, or pass a specific model name.
What you can do
- Pick a voice from a large multilingual catalog, or blend several voices into a custom timbre. → Voices
- Get subtitle timestamps alongside the audio — word- or sentence-level timings for captions, karaoke highlighting, and lip-sync. → Subtitle Timestamps
- Shape delivery with inline markup — pauses and non-verbal sounds like
[laughs]— plus an emotion setting. → Tone & Markup
Two response shapes
| Request | Returns |
|---|---|
| Synthesize | Audio only (raw bytes, e.g. mp3/pcm/wav). Lowest overhead. |
| Synthesize with timestamps | A JSON envelope with the audio and an alignment of word/sentence timings. |
Timestamps are opt-in: a separate call/endpoint so plain synthesis stays lightweight. See Subtitle Timestamps for the format.
In your SDK
For installation and method names, see: