Tone & Markup

You can shape delivery in two ways: inline markup placed directly in the text you send, and an emotion setting. Markup lives inside the text, so it works the same in every SDK.

Pauses

Insert a timed pause with either syntax:

Wait for it [pause 1.5s] surprise!
Wait for it <break time="1.5s"/> surprise!

The duration is in seconds, from 0.01 to 99.99.

Interjections (non-verbal sounds)

Wrap a non-verbal sound in square brackets and it is spoken as that sound rather than read aloud:

That's hilarious [laughs] let me catch my breath [breath] okay.

Interjections require a model that supports them (the default TTS model does). On models without interjection support, using a tag is rejected so you find out early.

Available tags

Tag	Sound
`[laughs]`	Laughter
`[chuckle]`	Light laugh
`[coughs]`	Cough
`[clears throat]`	Throat clear
`[groans]`	Groan
`[breath]`	Audible breath
`[pant]`	Panting
`[inhale]`	Breathe in
`[exhale]`	Breathe out
`[gasps]`	Gasp
`[sniffs]`	Sniff
`[sighs]`	Sigh
`[snorts]`	Snort
`[burps]`	Burp
`[lip-smacking]`	Lip smack
`[humming]`	Humming
`[hissing]`	Hiss
`[emm]`	Filler "emm"
`[sneezes]`	Sneeze

Common singular/plural variants are accepted (e.g. [laugh], [gasp]).

Unknown tags

Bracketed text that isn't a known tag is removed by default — brackets are markup, not spoken content. To keep such text and read it literally instead, set strip_unknown_tags to false.

Emotion

Independently of markup, you can set an emotion for the whole utterance (a voice setting, not inline markup):

happy, sad, angry, fearful, disgusted, surprised, calm, fluent

The model also infers a fitting emotion from the text, so setting it is optional.

For the exact call in your language, see the JavaScript or Unity TTS guide.

Tone & Markup

Tone & Markup

Pauses

Interjections (non-verbal sounds)

Available tags

Unknown tags

Emotion

On this page