Text-to-speech on the PlayKit platform — capabilities and how it works across SDKs

Text-to-Speech

Text-to-Speech (TTS) turns written text into spoken audio — for NPC dialogue, narration, accessibility readouts, or any in-game voice feedback.

Use the default-tts-model alias to stay on the recommended TTS model, or pass a specific model name.

What you can do

Pick a voice from a large multilingual catalog, or blend several voices into a custom timbre. → Voices
Get subtitle timestamps alongside the audio — word- or sentence-level timings for captions, karaoke highlighting, and lip-sync. → Subtitle Timestamps
Shape delivery with inline markup — pauses and non-verbal sounds like [laughs] — plus an emotion setting. → Tone & Markup

Request	Returns
Synthesize	Audio only (raw bytes, e.g. mp3/pcm/wav). Lowest overhead.
Synthesize with timestamps	A JSON envelope with the audio and an `alignment` of word/sentence timings.

Timestamps are opt-in: a separate call/endpoint so plain synthesis stays lightweight. See Subtitle Timestamps for the format.

For installation and method names, see: