Skip to main content
POST
/
api
/
v1
/
voice
/
synthesize
Synthesize text to speech
const options = {
  method: 'POST',
  headers: {Authorization: 'Bearer <token>', 'Content-Type': 'application/json'},
  body: JSON.stringify({
    text: '<string>',
    voiceId: '<string>',
    model: '<string>',
    language: '<string>',
    provider: '<string>',
    speed: 123,
    vendorSpecific: {},
    inlinePronunciationRules: [{text: '<string>', alias: '<string>'}],
    pronunciationDictionaryId: '<string>'
  })
};

fetch('https://blackbox.dasha.ai/api/v1/voice/synthesize', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));
"<string>"

Body

Text synthesis configuration

Text-to-speech synthesis request configuration

text
string
required

Text content to convert to speech

Required string length: 1 - 5000
voiceId
string
required

Voice identifier for synthesis. Use a cloned voice ID or public voice ID from the available voices.

Minimum string length: 1
model
string | null
required

TTS model to use for synthesis. Model availability depends on the provider.

language
string
required

Language code for synthesis. Format depends on provider (e.g., "en", "en-US").

Minimum string length: 1
provider
string
required

TTS provider name to use for synthesis

Minimum string length: 1
speed
number<double>

Speech speed multiplier. Value of 1.0 is normal speed, values less than 1.0 are slower, values greater than 1.0 are faster.

vendorSpecific
object

Provider-specific configuration options. Each provider supports different options. Consult provider documentation for available settings.

inlinePronunciationRules
object[] | null

Inline pronunciation rules for preview support. These rules are applied during synthesis without being stored in a dictionary.

Base class for pronunciation rules that control how specific text is spoken during TTS synthesis. Rules use polymorphic serialization with a type discriminator to support different pronunciation methods (alias or phonetic).

pronunciationDictionaryId
string | null

Pronunciation dictionary ID to use for synthesis. When provided, the dictionary rules will be applied during synthesis.

Response

Returns synthesized audio as MP3 successfully

The response is of type file.