Converts text to speech audio using the specified voice configuration. Returns MP3 audio ready for playback or download. Useful for voice preview during agent configuration and testing voice quality before deployment.
Text synthesis configuration
Text-to-speech synthesis request configuration
Text content to convert to speech
1 - 5000Voice identifier for synthesis. Use a cloned voice ID or public voice ID from the available voices.
1TTS model to use for synthesis. Model availability depends on the provider.
Language code for synthesis. Format depends on provider (e.g., "en", "en-US").
1TTS provider name to use for synthesis
1Speech speed multiplier. Value of 1.0 is normal speed, values less than 1.0 are slower, values greater than 1.0 are faster.
Provider-specific configuration options. Each provider supports different options. Consult provider documentation for available settings.
Inline pronunciation rules for preview support. These rules are applied during synthesis without being stored in a dictionary.
Base class for pronunciation rules that control how specific text is spoken during TTS synthesis. Rules use polymorphic serialization with a type discriminator to support different pronunciation methods (alias or phonetic).
Pronunciation dictionary ID to use for synthesis. When provided, the dictionary rules will be applied during synthesis.
Returns synthesized audio as MP3 successfully
The response is of type file.