Text-to-Speech

PlayKit_TextToSpeechClient turns text into spoken audio you can play in your game — to voice NPC dialogue, narration, tutorials, or any in-game voice feedback. Unity requests PCM audio and decodes it into an AudioClip, so the result plays directly through any AudioSource.

Before You Begin

Make sure you've completed SDK initialization
Have an AudioSource in your scene to play the synthesized audio

Create the Client

using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;

public class TextToSpeechSetup : MonoBehaviour
{
    private PlayKit_TextToSpeechClient ttsClient;

    void Start()
    {
        // Uses the default model alias "default-tts-model"
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();

        // Or specify a model explicitly
        // ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient("default-tts-model");
    }
}

Synthesize and Play

Play Through an AudioSource

SynthesizeToAudioClipAsync returns a Unity AudioClip you can hand straight to an AudioSource:

using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;

public class SpeakLine : MonoBehaviour
{
    [SerializeField] private AudioSource audioSource;

    private PlayKit_TextToSpeechClient ttsClient;

    void Start()
    {
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();
    }

    public async UniTask Speak(string text)
    {
        AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
            text,
            voice: null,   // Use default voice
            speed: null,   // Use default speed
            cancellationToken: this.GetCancellationTokenOnDestroy()
        );

        if (clip != null)
        {
            audioSource.clip = clip;
            audioSource.Play();
        }
    }
}

Inspect the Raw Result

If you need the raw audio data and usage details, use SynthesizeAsync:

public async UniTask Synthesize(string text)
{
    PlayKit_SpeechResult result = await ttsClient.SynthesizeAsync(
        text,
        voice: null,
        speed: null,
        cancellationToken: this.GetCancellationTokenOnDestroy()
    );

    if (result.Success)
    {
        Debug.Log($"Format: {result.Format}");
        Debug.Log($"Characters billed: {result.UsageCharacters}");
        Debug.Log($"Audio length: {result.AudioLengthMs} ms");
        // result.AudioData holds the raw audio bytes
    }
    else
    {
        Debug.LogError($"Synthesis failed: {result.Error}");
    }
}

Voice and Speed Options

Choose a Voice

AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
    "Greetings, traveler.",
    voice: "English_Trustworthy_Man",
    speed: null,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

If you pass null for voice, the default voice is used.

Browse and preview voices on the PlayKit Dashboard. Voice ids are case-sensitive. See Voices for the catalog and voice mixing.

Adjust Speed

// Slower delivery for a wise mentor
var mentorClip = await ttsClient.SynthesizeToAudioClipAsync(
    "Listen carefully, young one.",
    voice: null,
    speed: 0.9f,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

Inline Markup

Place pause and interjection markup directly in the text:

await ttsClient.SynthesizeToAudioClipAsync(
    "That's hilarious [laughs] let me catch my breath [pause 1s] okay.",
    voice: null,
    speed: null,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

Synthesize with Timestamps

SynthesizeWithTimestampsAsync returns the audio plus an Alignment of word (or sentence) timings — for captions, karaoke highlighting, and click-to-seek. See Subtitle Timestamps for the format and use cases.

public async UniTask SpeakWithCaptions(string text)
{
    PlayKit_SpeechResult result = await ttsClient.SynthesizeWithTimestampsAsync(
        text,
        voice: null,
        granularity: "word",   // "word" (default) or "sentence"
        speed: null,
        cancellationToken: this.GetCancellationTokenOnDestroy()
    );

    if (!result.Success) return;

    audioSource.clip = result.ToAudioClip();
    audioSource.Play();

    if (result.Alignment != null)
    {
        foreach (var item in result.Alignment.Items)
        {
            Debug.Log($"{item.StartMs}-{item.EndMs}ms: {item.Text}");
        }
    }
}

Each PlayKit_SpeechAlignmentItem has Text, StartMs, EndMs, and optional TextStart / TextEnd (character offsets in the input text). Highlight the active word by comparing audioSource.time * 1000f against each item's StartMs/EndMs.

Limits and Notes

Maximum 10000 characters per request. Split longer text into multiple calls.
Unity requests PCM audio and decodes it into an AudioClip, so playback works through any AudioSource.
Billing is per character — UsageCharacters reports how many characters were billed.
Always check result.Success before using the result.

Voice Chat Integration

Combine AI conversation with speech synthesis to give your NPC a voice:

using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using System.Collections.Generic;
using UnityEngine;

public class TalkingNpc : MonoBehaviour
{
    [SerializeField] private AudioSource audioSource;

    private PlayKit_AIChatClient chatClient;
    private PlayKit_TextToSpeechClient ttsClient;
    private List<PlayKit_ChatMessage> chatHistory = new List<PlayKit_ChatMessage>();

    void Start()
    {
        chatClient = PlayKitSDK.Factory.CreateChatClient();
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();

        chatHistory.Add(new PlayKit_ChatMessage
        {
            Role = "system",
            Content = "You are a friendly game NPC."
        });
    }

    public async UniTask RespondWithVoice(string playerMessage)
    {
        chatHistory.Add(new PlayKit_ChatMessage { Role = "user", Content = playerMessage });

        var config = new PlayKit_ChatConfig(chatHistory) { Temperature = 0.7f };
        var aiResponse = await chatClient.TextGenerationAsync(
            config,
            this.GetCancellationTokenOnDestroy()
        );

        if (!aiResponse.Success) return;

        chatHistory.Add(new PlayKit_ChatMessage { Role = "assistant", Content = aiResponse.Response });

        AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
            aiResponse.Response,
            voice: null,
            speed: null,
            cancellationToken: this.GetCancellationTokenOnDestroy()
        );

        if (clip != null)
        {
            audioSource.clip = clip;
            audioSource.Play();
        }
    }
}

Best Practices

Stay under the limit: keep each request at or below 10000 characters; split longer scripts.
Reuse the client: create the PlayKit_TextToSpeechClient once and reuse it for many requests.
Cache repeated lines: synthesize fixed dialogue once and reuse the AudioClip.
Match the character: use voice, speed, and markup to fit each character.
Error handling: check result.Success and handle failure cases.

Next Steps

Combine with NPC Conversations to give characters a voice
Read the API Reference for complete API documentation

Text-to-Speech

On this page