PlayKit.ai

Text-to-Speech

Turn text into spoken audio using TTSClient

Text-to-Speech

TTSClient turns text into spoken audio — for NPC dialogue, narration, accessibility readouts, or any in-game voice feedback. You provide text and an optional voice, and you get back playable audio (optionally with word timings).

Create TTSClient

import { PlayKitSDK } from 'playkit-sdk';

const sdk = new PlayKitSDK({
  gameId: 'your-game-id',
  developerToken: 'your-token'
});

await sdk.initialize();

// Uses the default model alias "default-tts-model"
const tts = sdk.createTTSClient();

// Or specify a model explicitly
// const tts = sdk.createTTSClient('default-tts-model');

Synthesize Speech

Provide text and get back audio:

const result = await tts.synthesize({
  text: 'Welcome to the kingdom, traveler.'
});

console.log('Format:', result.format);              // e.g. "mp3"
console.log('Characters billed:', result.usageCharacters);
console.log('Audio length (ms):', result.audioLengthMs);

The result contains the raw audio bytes:

interface TTSResult {
  audio: ArrayBuffer;        // Raw audio data
  format: string;            // Output format / content type (e.g. "mp3")
  usageCharacters: number;   // Characters billed for this request
  audioLengthMs?: number;    // Audio duration in milliseconds (when available)
}

Play in the Browser

synthesizeToObjectURL returns a URL you can hand straight to an Audio element:

const url = await tts.synthesizeToObjectURL({
  text: 'Welcome to the kingdom, traveler.'
});

const audio = new Audio(url);
await audio.play();
audio.onended = () => URL.revokeObjectURL(url);  // release when done

The default output is MP3, which browsers play natively.

Play from the Audio Buffer

If you already have the ArrayBuffer from synthesize, build the object URL yourself:

const result = await tts.synthesize({ text: 'Quest complete!' });

const blob = new Blob([result.audio], { type: `audio/${result.format}` });
const url = URL.createObjectURL(blob);

const audio = new Audio(url);
await audio.play();
audio.onended = () => URL.revokeObjectURL(url);

Config Options

interface TTSConfig {
  text: string;                 // required
  model?: string;
  voice?: string;               // voice id; mutually exclusive with voiceMix
  voiceMix?: { voice: string; weight: number }[];
  voiceSettings?: {
    speed?: number;             // 0.5–2
    volume?: number;            // 0–10
    pitch?: number;             // -12–12
    emotion?: string;           // e.g. "happy"
  };
  outputFormat?: string;        // e.g. "mp3", "mp3_44100_128", "pcm_24000"
  language?: string;            // pronunciation hint, e.g. "English"
  providerOptions?: Record<string, unknown>;
}
OptionDescription
textThe text to speak (required). Supports inline markup.
voiceVoice id. See Voices.
voiceMixBlend up to 4 voices with weights — see Voices.
voiceSettingsspeed, volume, pitch, emotion.
outputFormatAudio format string. Defaults to MP3.
languageBiases pronunciation toward a language.
providerOptionsAdvanced options for capabilities not covered by the fields above.

Choosing a voice

const a = await tts.synthesize({ text: 'Hello, adventurer.' });           // default voice

const b = await tts.synthesize({
  text: 'Hello, adventurer.',
  voice: 'English_Trustworthy_Man'
});

Browse and preview voices on the PlayKit Dashboard. Voice ids are case-sensitive. See Voices for voice mixing.

Tuning delivery

// Slower, deeper delivery for a wise mentor
const mentor = await tts.synthesize({
  text: 'Listen carefully, young one.',
  voiceSettings: { speed: 0.9, pitch: -2 }
});

// Brighter, happier delivery for an excited companion
const companion = await tts.synthesize({
  text: 'We found the treasure!',
  voiceSettings: { speed: 1.15, emotion: 'happy' }
});

You can also place pause and interjection markup directly in text:

await tts.synthesize({
  text: "That's hilarious [laughs] let me catch my breath [pause 1s] okay."
});

Synthesize with Timestamps

synthesizeWithTimestamps returns the audio plus an alignment of word (or sentence) timings — for captions, karaoke highlighting, and click-to-seek. See Subtitle Timestamps for the format and use cases.

const result = await tts.synthesizeWithTimestamps({
  text: 'Hello world. This is a timing test.',
  granularity: 'word'   // 'word' (default) or 'sentence'
});

for (const item of result.alignment?.items ?? []) {
  console.log(`${item.startMs}–${item.endMs}ms: ${item.text}`);
}
interface TTSTimestampsResult extends TTSResult {
  alignment: {
    granularity: string;
    items: {
      text: string;
      startMs: number;
      endMs: number;
      textStart?: number;   // char offset in the input text
      textEnd?: number;
    }[];
  } | null;
}

Highlight the active word as audio plays by comparing the playback clock to each item's startMs/endMs:

const blob = new Blob([result.audio], { type: `audio/${result.format}` });
const audio = new Audio(URL.createObjectURL(blob));
const items = result.alignment?.items ?? [];

audio.ontimeupdate = () => {
  const ms = audio.currentTime * 1000;
  const active = items.findIndex((it) => ms >= it.startMs && ms < it.endMs);
  // render `active` highlight…
};

await audio.play();

Limits and Notes

  • Maximum 10000 characters per request. Split longer text into multiple calls.
  • The default output is MP3. Pass outputFormat (e.g. "pcm_24000") for a different format.
  • Billing is per character — usageCharacters reports how many characters were billed.

Voice Chat Integration

Combine text generation with speech synthesis to give your NPC a voice:

const chat = sdk.createChatClient();
const tts = sdk.createTTSClient();

async function respondWithVoice(playerMessage: string) {
  const reply = await chat.chat(playerMessage, 'You are a friendly game NPC.');

  const url = await tts.synthesizeToObjectURL({ text: reply });
  const audio = new Audio(url);
  await audio.play();
  audio.onended = () => URL.revokeObjectURL(url);

  return reply;
}

Error Handling

import { PlayKitError } from 'playkit-sdk';

try {
  const result = await tts.synthesize({ text: 'Hello' });
} catch (error) {
  if (error instanceof PlayKitError) {
    switch (error.code) {
      case 'NOT_AUTHENTICATED':
        console.log('Need to login');
        break;
      default:
        console.log('Synthesis failed:', error.message);
    }
  }
}

Best Practices

  1. Stay under the limit: keep each request at or below 10000 characters; split longer scripts.
  2. Reuse the client: create the TTSClient once and reuse it for many requests.
  3. Release object URLs: call URL.revokeObjectURL once playback finishes to free memory.
  4. Cache repeated lines: synthesize fixed dialogue once and reuse the audio.
  5. Match the character: use voice, voiceSettings, and markup to fit each character.

Next Steps