PlayKit.ai

Text-to-Speech

Turn text into spoken audio and play it through an AudioSource

Text-to-Speech

PlayKit_TextToSpeechClient turns text into spoken audio you can play in your game — to voice NPC dialogue, narration, tutorials, or any in-game voice feedback. Unity requests PCM audio and decodes it into an AudioClip, so the result plays directly through any AudioSource.

Before You Begin

  • Make sure you've completed SDK initialization
  • Have an AudioSource in your scene to play the synthesized audio

Create the Client

using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;

public class TextToSpeechSetup : MonoBehaviour
{
    private PlayKit_TextToSpeechClient ttsClient;

    void Start()
    {
        // Uses the default model alias "default-tts-model"
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();

        // Or specify a model explicitly
        // ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient("default-tts-model");
    }
}

Synthesize and Play

Play Through an AudioSource

SynthesizeToAudioClipAsync returns a Unity AudioClip you can hand straight to an AudioSource:

using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using UnityEngine;

public class SpeakLine : MonoBehaviour
{
    [SerializeField] private AudioSource audioSource;

    private PlayKit_TextToSpeechClient ttsClient;

    void Start()
    {
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();
    }

    public async UniTask Speak(string text)
    {
        AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
            text,
            voice: null,   // Use default voice
            speed: null,   // Use default speed
            cancellationToken: this.GetCancellationTokenOnDestroy()
        );

        if (clip != null)
        {
            audioSource.clip = clip;
            audioSource.Play();
        }
    }
}

Inspect the Raw Result

If you need the raw audio data and usage details, use SynthesizeAsync:

public async UniTask Synthesize(string text)
{
    PlayKit_SpeechResult result = await ttsClient.SynthesizeAsync(
        text,
        voice: null,
        speed: null,
        cancellationToken: this.GetCancellationTokenOnDestroy()
    );

    if (result.Success)
    {
        Debug.Log($"Format: {result.Format}");
        Debug.Log($"Characters billed: {result.UsageCharacters}");
        Debug.Log($"Audio length: {result.AudioLengthMs} ms");
        // result.AudioData holds the raw audio bytes
    }
    else
    {
        Debug.LogError($"Synthesis failed: {result.Error}");
    }
}

Voice and Speed Options

Choose a Voice

AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
    "Greetings, traveler.",
    voice: "English_Trustworthy_Man",
    speed: null,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

If you pass null for voice, the default voice is used.

Browse and preview voices on the PlayKit Dashboard. Voice ids are case-sensitive. See Voices for the catalog and voice mixing.

Adjust Speed

// Slower delivery for a wise mentor
var mentorClip = await ttsClient.SynthesizeToAudioClipAsync(
    "Listen carefully, young one.",
    voice: null,
    speed: 0.9f,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

Inline Markup

Place pause and interjection markup directly in the text:

await ttsClient.SynthesizeToAudioClipAsync(
    "That's hilarious [laughs] let me catch my breath [pause 1s] okay.",
    voice: null,
    speed: null,
    cancellationToken: this.GetCancellationTokenOnDestroy()
);

Synthesize with Timestamps

SynthesizeWithTimestampsAsync returns the audio plus an Alignment of word (or sentence) timings — for captions, karaoke highlighting, and click-to-seek. See Subtitle Timestamps for the format and use cases.

public async UniTask SpeakWithCaptions(string text)
{
    PlayKit_SpeechResult result = await ttsClient.SynthesizeWithTimestampsAsync(
        text,
        voice: null,
        granularity: "word",   // "word" (default) or "sentence"
        speed: null,
        cancellationToken: this.GetCancellationTokenOnDestroy()
    );

    if (!result.Success) return;

    audioSource.clip = result.ToAudioClip();
    audioSource.Play();

    if (result.Alignment != null)
    {
        foreach (var item in result.Alignment.Items)
        {
            Debug.Log($"{item.StartMs}-{item.EndMs}ms: {item.Text}");
        }
    }
}

Each PlayKit_SpeechAlignmentItem has Text, StartMs, EndMs, and optional TextStart / TextEnd (character offsets in the input text). Highlight the active word by comparing audioSource.time * 1000f against each item's StartMs/EndMs.

Limits and Notes

  • Maximum 10000 characters per request. Split longer text into multiple calls.
  • Unity requests PCM audio and decodes it into an AudioClip, so playback works through any AudioSource.
  • Billing is per character — UsageCharacters reports how many characters were billed.
  • Always check result.Success before using the result.

Voice Chat Integration

Combine AI conversation with speech synthesis to give your NPC a voice:

using Cysharp.Threading.Tasks;
using PlayKit_SDK;
using PlayKit_SDK.Public;
using System.Collections.Generic;
using UnityEngine;

public class TalkingNpc : MonoBehaviour
{
    [SerializeField] private AudioSource audioSource;

    private PlayKit_AIChatClient chatClient;
    private PlayKit_TextToSpeechClient ttsClient;
    private List<PlayKit_ChatMessage> chatHistory = new List<PlayKit_ChatMessage>();

    void Start()
    {
        chatClient = PlayKitSDK.Factory.CreateChatClient();
        ttsClient = PlayKitSDK.Factory.CreateTextToSpeechClient();

        chatHistory.Add(new PlayKit_ChatMessage
        {
            Role = "system",
            Content = "You are a friendly game NPC."
        });
    }

    public async UniTask RespondWithVoice(string playerMessage)
    {
        chatHistory.Add(new PlayKit_ChatMessage { Role = "user", Content = playerMessage });

        var config = new PlayKit_ChatConfig(chatHistory) { Temperature = 0.7f };
        var aiResponse = await chatClient.TextGenerationAsync(
            config,
            this.GetCancellationTokenOnDestroy()
        );

        if (!aiResponse.Success) return;

        chatHistory.Add(new PlayKit_ChatMessage { Role = "assistant", Content = aiResponse.Response });

        AudioClip clip = await ttsClient.SynthesizeToAudioClipAsync(
            aiResponse.Response,
            voice: null,
            speed: null,
            cancellationToken: this.GetCancellationTokenOnDestroy()
        );

        if (clip != null)
        {
            audioSource.clip = clip;
            audioSource.Play();
        }
    }
}

Best Practices

  1. Stay under the limit: keep each request at or below 10000 characters; split longer scripts.
  2. Reuse the client: create the PlayKit_TextToSpeechClient once and reuse it for many requests.
  3. Cache repeated lines: synthesize fixed dialogue once and reuse the AudioClip.
  4. Match the character: use voice, speed, and markup to fit each character.
  5. Error handling: check result.Success and handle failure cases.

Next Steps