Synthesize text to speech - Dasha BlackBox Documentation

const options = { method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify({ text: '<string>', voiceId: '<string>', model: '<string>', language: '<string>', provider: '<string>', speed: 123, vendorSpecific: {}, inlinePronunciationRules: [{text: '<string>', alias: '<string>'}], pronunciationDictionaryId: '<string>' }) }; fetch('https://blackbox.dasha.ai/api/v1/voice/synthesize', options) .then(res => res.json()) .then(res => console.log(res)) .catch(err => console.error(err));

Body

Text synthesis configuration

Text-to-speech synthesis request configuration

text

string

required

Text content to convert to speech

Required string length: 1 - 5000

voiceId

string

required

Voice identifier for synthesis. Use a cloned voice ID or public voice ID from the available voices.

Minimum string length: 1

model

string | null

required

TTS model to use for synthesis. Model availability depends on the provider.

language

string

required

Language code for synthesis. Format depends on provider (e.g., "en", "en-US").

Minimum string length: 1

provider

string

required

TTS provider name to use for synthesis

Minimum string length: 1

speed

number<double>

Speech speed multiplier. Value of 1.0 is normal speed, values less than 1.0 are slower, values greater than 1.0 are faster.

vendorSpecific

object

Provider-specific configuration options. Each provider supports different options. Consult provider documentation for available settings.

Show child attributes

inlinePronunciationRules

object[] | null

Inline pronunciation rules for preview support. These rules are applied during synthesis without being stored in a dictionary.

Base class for pronunciation rules that control how specific text is spoken during TTS synthesis. Rules use polymorphic serialization with a type discriminator to support different pronunciation methods (alias or phonetic).

Option 1
Option 2

Show child attributes

pronunciationDictionaryId

string | null

Pronunciation dictionary ID to use for synthesis. When provided, the dictionary rules will be applied during synthesis.

Response

Returns synthesized audio as MP3 successfully

The response is of type file.