PlayKit.ai
Text-to-Speech

Overview

Text-to-speech on the PlayKit platform — capabilities and how it works across SDKs

Text-to-Speech

Text-to-Speech (TTS) turns written text into spoken audio — for NPC dialogue, narration, accessibility readouts, or any in-game voice feedback.

Use the default-tts-model alias to stay on the recommended TTS model, or pass a specific model name.

What you can do

  • Pick a voice from a large multilingual catalog, or blend several voices into a custom timbre. → Voices
  • Get subtitle timestamps alongside the audio — word- or sentence-level timings for captions, karaoke highlighting, and lip-sync. → Subtitle Timestamps
  • Shape delivery with inline markup — pauses and non-verbal sounds like [laughs] — plus an emotion setting. → Tone & Markup

Two response shapes

RequestReturns
SynthesizeAudio only (raw bytes, e.g. mp3/pcm/wav). Lowest overhead.
Synthesize with timestampsA JSON envelope with the audio and an alignment of word/sentence timings.

Timestamps are opt-in: a separate call/endpoint so plain synthesis stays lightweight. See Subtitle Timestamps for the format.

In your SDK

For installation and method names, see: