Text-to-Speech
Turn text into spoken audio and play it through an AudioSource
Text-to-Speech
PlayKit_TextToSpeechClient turns text into spoken audio you can play in your game — to voice NPC dialogue, narration, tutorials, or any in-game voice feedback. Unity requests PCM audio and decodes it into an AudioClip, so the result plays directly through any AudioSource.
Before You Begin
- Make sure you've completed SDK initialization
- Have an
AudioSourcein your scene to play the synthesized audio
Create the Client
using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;
public class TextToSpeechSetup : MonoBehaviour
{
private PlayKit_TextToSpeechClient ttsClient;
void Start()
{
// Uses the default model alias "default-tts-model"
ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();
// Or specify a model explicitly
// ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient("default-tts-model");
}
}Synthesize and Play
Play Through an AudioSource
SynthesizeToAudioClipAsync returns a Unity AudioClip you can hand straight to an AudioSource:
using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;
public class SpeakLine : MonoBehaviour
{
[SerializeField] private AudioSource audioSource;
private PlayKit_TextToSpeechClient ttsClient;
void Start()
{
ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();
}
public async UniTask Speak(string text)
{
AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
text,
voice: null, // Use default voice
speed: null, // Use default speed
cancellationToken: this.GetCancellationTokenOnDestroy()
);
if (clip != null)
{
audioSource.clip = clip;
audioSource.Play();
}
}
}Inspect the Raw Result
If you need the raw audio data and usage details, use SynthesizeAsync:
public async UniTask Synthesize(string text)
{
PlayKit_SpeechResult result = await ttsClient.SynthesizeAsync(
text,
voice: null,
speed: null,
cancellationToken: this.GetCancellationTokenOnDestroy()
);
if (result.Success)
{
Debug.Log($"Format: {result.Format}");
Debug.Log($"Characters billed: {result.UsageCharacters}");
Debug.Log($"Audio length: {result.AudioLengthMs} ms");
// result.AudioData holds the raw audio bytes
}
else
{
Debug.LogError($"Synthesis failed: {result.Error}");
}
}Voice and Speed Options
Choose a Voice
AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
"Greetings, traveler.",
voice: "English_Trustworthy_Man",
speed: null,
cancellationToken: this.GetCancellationTokenOnDestroy()
);If you pass null for voice, the default voice is used.
Browse and preview voices on the PlayKit Dashboard. Voice ids are case-sensitive. See Voices for the catalog and voice mixing.
Adjust Speed
// Slower delivery for a wise mentor
var mentorClip = await ttsClient.SynthesizeToAudioClipAsync(
"Listen carefully, young one.",
voice: null,
speed: 0.9f,
cancellationToken: this.GetCancellationTokenOnDestroy()
);Inline Markup
Place pause and interjection markup directly in the text:
await ttsClient.SynthesizeToAudioClipAsync(
"That's hilarious [laughs] let me catch my breath [pause 1s] okay.",
voice: null,
speed: null,
cancellationToken: this.GetCancellationTokenOnDestroy()
);Synthesize with Timestamps
SynthesizeWithTimestampsAsync returns the audio plus an Alignment of word (or sentence) timings — for captions, karaoke highlighting, and click-to-seek. See Subtitle Timestamps for the format and use cases.
public async UniTask SpeakWithCaptions(string text)
{
PlayKit_SpeechResult result = await ttsClient.SynthesizeWithTimestampsAsync(
text,
voice: null,
granularity: "word", // "word" (default) or "sentence"
speed: null,
cancellationToken: this.GetCancellationTokenOnDestroy()
);
if (!result.Success) return;
audioSource.clip = result.ToAudioClip();
audioSource.Play();
if (result.Alignment != null)
{
foreach (var item in result.Alignment.Items)
{
Debug.Log($"{item.StartMs}-{item.EndMs}ms: {item.Text}");
}
}
}Each PlayKit_SpeechAlignmentItem has Text, StartMs, EndMs, and optional TextStart / TextEnd (character offsets in the input text). Highlight the active word by comparing audioSource.time * 1000f against each item's StartMs/EndMs.
Limits and Notes
- Maximum 10000 characters per request. Split longer text into multiple calls.
- Unity requests PCM audio and decodes it into an
AudioClip, so playback works through anyAudioSource. - Billing is per character —
UsageCharactersreports how many characters were billed. - Always check
result.Successbefore using the result.
Voice Chat Integration
Combine AI conversation with speech synthesis to give your NPC a voice:
using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using System.Collections.Generic;
using UnityEngine;
public class TalkingNpc : MonoBehaviour
{
[SerializeField] private AudioSource audioSource;
private PlayKit_AIChatClient chatClient;
private PlayKit_TextToSpeechClient ttsClient;
private List<PlayKit_ChatMessage> chatHistory = new List<PlayKit_ChatMessage>();
void Start()
{
chatClient = PlayKitSDK.Factory.CreateChatClient();
ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();
chatHistory.Add(new PlayKit_ChatMessage
{
Role = "system",
Content = "You are a friendly game NPC."
});
}
public async UniTask RespondWithVoice(string playerMessage)
{
chatHistory.Add(new PlayKit_ChatMessage { Role = "user", Content = playerMessage });
var config = new PlayKit_ChatConfig(chatHistory) { Temperature = 0.7f };
var aiResponse = await chatClient.TextGenerationAsync(
config,
this.GetCancellationTokenOnDestroy()
);
if (!aiResponse.Success) return;
chatHistory.Add(new PlayKit_ChatMessage { Role = "assistant", Content = aiResponse.Response });
AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
aiResponse.Response,
voice: null,
speed: null,
cancellationToken: this.GetCancellationTokenOnDestroy()
);
if (clip != null)
{
audioSource.clip = clip;
audioSource.Play();
}
}
}Best Practices
- Stay under the limit: keep each request at or below 10000 characters; split longer scripts.
- Reuse the client: create the
PlayKit_TextToSpeechClientonce and reuse it for many requests. - Cache repeated lines: synthesize fixed dialogue once and reuse the
AudioClip. - Match the character: use
voice,speed, and markup to fit each character. - Error handling: check
result.Successand handle failure cases.
Next Steps
- Combine with NPC Conversations to give characters a voice
- Read the API Reference for complete API documentation