PlayKit.ai
Text-to-Speech

Tone & Markup

Inline pause and interjection markup, plus the emotion setting

Tone & Markup

You can shape delivery in two ways: inline markup placed directly in the text you send, and an emotion setting. Markup lives inside the text, so it works the same in every SDK.

Pauses

Insert a timed pause with either syntax:

Wait for it [pause 1.5s] surprise!
Wait for it <break time="1.5s"/> surprise!

The duration is in seconds, from 0.01 to 99.99.

Interjections (non-verbal sounds)

Wrap a non-verbal sound in square brackets and it is spoken as that sound rather than read aloud:

That's hilarious [laughs] let me catch my breath [breath] okay.

Interjections require a model that supports them (the default TTS model does). On models without interjection support, using a tag is rejected so you find out early.

Available tags

TagSound
[laughs]Laughter
[chuckle]Light laugh
[coughs]Cough
[clears throat]Throat clear
[groans]Groan
[breath]Audible breath
[pant]Panting
[inhale]Breathe in
[exhale]Breathe out
[gasps]Gasp
[sniffs]Sniff
[sighs]Sigh
[snorts]Snort
[burps]Burp
[lip-smacking]Lip smack
[humming]Humming
[hissing]Hiss
[emm]Filler "emm"
[sneezes]Sneeze

Common singular/plural variants are accepted (e.g. [laugh], [gasp]).

Unknown tags

Bracketed text that isn't a known tag is removed by default — brackets are markup, not spoken content. To keep such text and read it literally instead, set strip_unknown_tags to false.

Emotion

Independently of markup, you can set an emotion for the whole utterance (a voice setting, not inline markup):

happy, sad, angry, fearful, disgusted, surprised, calm, fluent

The model also infers a fitting emotion from the text, so setting it is optional.

For the exact call in your language, see the JavaScript or Unity TTS guide.