Tone & Markup
Inline pause and interjection markup, plus the emotion setting
Tone & Markup
You can shape delivery in two ways: inline markup placed directly in the text you send, and an emotion setting. Markup lives inside the text, so it works the same in every SDK.
Pauses
Insert a timed pause with either syntax:
Wait for it [pause 1.5s] surprise!
Wait for it <break time="1.5s"/> surprise!The duration is in seconds, from 0.01 to 99.99.
Interjections (non-verbal sounds)
Wrap a non-verbal sound in square brackets and it is spoken as that sound rather than read aloud:
That's hilarious [laughs] let me catch my breath [breath] okay.Interjections require a model that supports them (the default TTS model does). On models without interjection support, using a tag is rejected so you find out early.
Available tags
| Tag | Sound |
|---|---|
[laughs] | Laughter |
[chuckle] | Light laugh |
[coughs] | Cough |
[clears throat] | Throat clear |
[groans] | Groan |
[breath] | Audible breath |
[pant] | Panting |
[inhale] | Breathe in |
[exhale] | Breathe out |
[gasps] | Gasp |
[sniffs] | Sniff |
[sighs] | Sigh |
[snorts] | Snort |
[burps] | Burp |
[lip-smacking] | Lip smack |
[humming] | Humming |
[hissing] | Hiss |
[emm] | Filler "emm" |
[sneezes] | Sneeze |
Common singular/plural variants are accepted (e.g. [laugh], [gasp]).
Unknown tags
Bracketed text that isn't a known tag is removed by default — brackets are markup, not spoken content. To keep such text and read it literally instead, set strip_unknown_tags to false.
Emotion
Independently of markup, you can set an emotion for the whole utterance (a voice setting, not inline markup):
happy, sad, angry, fearful, disgusted, surprised, calm, fluent
The model also infers a fitting emotion from the text, so setting it is optional.
For the exact call in your language, see the JavaScript or Unity TTS guide.