Text-to-Speech
Turn text into spoken audio using TTSClient
Text-to-Speech
TTSClient turns text into spoken audio — for NPC dialogue, narration, accessibility readouts, or any in-game voice feedback. You provide text and an optional voice, and you get back playable audio (optionally with word timings).
Create TTSClient
import { PlayKitSDK } from 'playkit-sdk';
const sdk = new PlayKitSDK({
gameId: 'your-game-id',
developerToken: 'your-token'
});
await sdk.initialize();
// Uses the default model alias "default-tts-model"
const tts = sdk.createTTSClient();
// Or specify a model explicitly
// const tts = sdk.createTTSClient('default-tts-model');Synthesize Speech
Provide text and get back audio:
const result = await tts.synthesize({
text: 'Welcome to the kingdom, traveler.'
});
console.log('Format:', result.format); // e.g. "mp3"
console.log('Characters billed:', result.usageCharacters);
console.log('Audio length (ms):', result.audioLengthMs);The result contains the raw audio bytes:
interface TTSResult {
audio: ArrayBuffer; // Raw audio data
format: string; // Output format / content type (e.g. "mp3")
usageCharacters: number; // Characters billed for this request
audioLengthMs?: number; // Audio duration in milliseconds (when available)
}Play in the Browser
synthesizeToObjectURL returns a URL you can hand straight to an Audio element:
const url = await tts.synthesizeToObjectURL({
text: 'Welcome to the kingdom, traveler.'
});
const audio = new Audio(url);
await audio.play();
audio.onended = () => URL.revokeObjectURL(url); // release when doneThe default output is MP3, which browsers play natively.
Play from the Audio Buffer
If you already have the ArrayBuffer from synthesize, build the object URL yourself:
const result = await tts.synthesize({ text: 'Quest complete!' });
const blob = new Blob([result.audio], { type: `audio/${result.format}` });
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
await audio.play();
audio.onended = () => URL.revokeObjectURL(url);Config Options
interface TTSConfig {
text: string; // required
model?: string;
voice?: string; // voice id; mutually exclusive with voiceMix
voiceMix?: { voice: string; weight: number }[];
voiceSettings?: {
speed?: number; // 0.5–2
volume?: number; // 0–10
pitch?: number; // -12–12
emotion?: string; // e.g. "happy"
};
outputFormat?: string; // e.g. "mp3", "mp3_44100_128", "pcm_24000"
language?: string; // pronunciation hint, e.g. "English"
providerOptions?: Record<string, unknown>;
}| Option | Description |
|---|---|
text | The text to speak (required). Supports inline markup. |
voice | Voice id. See Voices. |
voiceMix | Blend up to 4 voices with weights — see Voices. |
voiceSettings | speed, volume, pitch, emotion. |
outputFormat | Audio format string. Defaults to MP3. |
language | Biases pronunciation toward a language. |
providerOptions | Advanced options for capabilities not covered by the fields above. |
Choosing a voice
const a = await tts.synthesize({ text: 'Hello, adventurer.' }); // default voice
const b = await tts.synthesize({
text: 'Hello, adventurer.',
voice: 'English_Trustworthy_Man'
});Browse and preview voices on the PlayKit Dashboard. Voice ids are case-sensitive. See Voices for voice mixing.
Tuning delivery
// Slower, deeper delivery for a wise mentor
const mentor = await tts.synthesize({
text: 'Listen carefully, young one.',
voiceSettings: { speed: 0.9, pitch: -2 }
});
// Brighter, happier delivery for an excited companion
const companion = await tts.synthesize({
text: 'We found the treasure!',
voiceSettings: { speed: 1.15, emotion: 'happy' }
});You can also place pause and interjection markup directly in text:
await tts.synthesize({
text: "That's hilarious [laughs] let me catch my breath [pause 1s] okay."
});Synthesize with Timestamps
synthesizeWithTimestamps returns the audio plus an alignment of word (or sentence) timings — for captions, karaoke highlighting, and click-to-seek. See Subtitle Timestamps for the format and use cases.
const result = await tts.synthesizeWithTimestamps({
text: 'Hello world. This is a timing test.',
granularity: 'word' // 'word' (default) or 'sentence'
});
for (const item of result.alignment?.items ?? []) {
console.log(`${item.startMs}–${item.endMs}ms: ${item.text}`);
}interface TTSTimestampsResult extends TTSResult {
alignment: {
granularity: string;
items: {
text: string;
startMs: number;
endMs: number;
textStart?: number; // char offset in the input text
textEnd?: number;
}[];
} | null;
}Highlight the active word as audio plays by comparing the playback clock to each item's startMs/endMs:
const blob = new Blob([result.audio], { type: `audio/${result.format}` });
const audio = new Audio(URL.createObjectURL(blob));
const items = result.alignment?.items ?? [];
audio.ontimeupdate = () => {
const ms = audio.currentTime * 1000;
const active = items.findIndex((it) => ms >= it.startMs && ms < it.endMs);
// render `active` highlight…
};
await audio.play();Limits and Notes
- Maximum 10000 characters per request. Split longer text into multiple calls.
- The default output is MP3. Pass
outputFormat(e.g."pcm_24000") for a different format. - Billing is per character —
usageCharactersreports how many characters were billed.
Voice Chat Integration
Combine text generation with speech synthesis to give your NPC a voice:
const chat = sdk.createChatClient();
const tts = sdk.createTTSClient();
async function respondWithVoice(playerMessage: string) {
const reply = await chat.chat(playerMessage, 'You are a friendly game NPC.');
const url = await tts.synthesizeToObjectURL({ text: reply });
const audio = new Audio(url);
await audio.play();
audio.onended = () => URL.revokeObjectURL(url);
return reply;
}Error Handling
import { PlayKitError } from 'playkit-sdk';
try {
const result = await tts.synthesize({ text: 'Hello' });
} catch (error) {
if (error instanceof PlayKitError) {
switch (error.code) {
case 'NOT_AUTHENTICATED':
console.log('Need to login');
break;
default:
console.log('Synthesis failed:', error.message);
}
}
}Best Practices
- Stay under the limit: keep each request at or below 10000 characters; split longer scripts.
- Reuse the client: create the
TTSClientonce and reuse it for many requests. - Release object URLs: call
URL.revokeObjectURLonce playback finishes to free memory. - Cache repeated lines: synthesize fixed dialogue once and reuse the audio.
- Match the character: use
voice,voiceSettings, and markup to fit each character.
Next Steps
- Combine with NPC Conversations to give characters a voice
- Explore Text Generation for generating the lines to speak