TTS Providers Reference

Detailed configuration for each TTS provider. For basic setup, see Voice & Speech.

ElevenLabs

Industry-leading voice quality with extensive customization. Browse voices → Models:

eleven_multilingual_v2 — Best quality, 29+ languages
eleven_turbo_v2_5 — Balanced speed and quality
eleven_flash_v2_5 — Fastest inference

Speed range: 0.70x to 1.20x

Configuration options

Option	Range	Default	Effect
`similarity_boost`	0.0–1.0	0.75	Voice consistency with original
`stability`	0.0–1.0	0.5	Output consistency across generations
`style`	0.0–1.0	0.3	Speaker style exaggeration
`use_speaker_boost`	boolean	true	Enhanced clarity
`optimize_streaming_latency`	0–4	4	Trade quality for speed

Example

ElevenLabs configuration example

ttsConfig: {
  vendor: "ElevenLabs",
  voiceId: "21m00Tcm4TlvDq8ikWAM",
  model: "eleven_turbo_v2_5",
  speed: 1.0,
  vendorSpecificOptions: {
    similarity_boost: 0.8,
    stability: 0.6,
    style: 0.4,
    use_speaker_boost: true,
    optimize_streaming_latency: 3
  }
}

Cartesia

Granular emotion control with wide speed range. Browse voices → Models:

sonic-3.5 — Latest generation (recommended)

Speed range: 0x to 2.0x (values below 0.25x sent as 0.25x)

Emotion system

Combine emotions for nuanced delivery:

Dimension	Intensities
anger	lowest, low, high, highest
positivity	lowest, low, high, highest
surprise	lowest, low, high, highest
sadness	lowest, low, high, highest
curiosity	lowest, low, high, highest

Start with subtle emotions (low levels). Combining too many high-intensity emotions sounds unnatural.

Example

Cartesia configuration example

ttsConfig: {
  vendor: "Cartesia",
  voiceId: "cartesia-voice-id",
  model: "sonic-3.5",
  speed: 1.2,
  vendorSpecificOptions: {
    emotions: [
      "positivity:high",
      "curiosity:low"
    ]
  }
}

Inworld

Character-focused voices for gaming and interactive media. Browse voices → Models:

inworld-tts-1.5-max — Highest quality
inworld-tts-1.5-mini — Balanced speed and quality (recommended)
inworld-tts-1 — Original model

Speed range: 0.80x to 1.50x

Configuration options

Option	Default	Effect
`temperature`	0.8	Voice expressiveness
`pitch`	0.0	Pitch adjustment (+/-)

Example

Inworld configuration example

ttsConfig: {
  vendor: "Inworld",
  voiceId: "inworld-voice-id",
  model: "inworld-tts-1.5-mini",
  speed: 1.0,
  vendorSpecificOptions: {
    temperature: 0.9,
    pitch: 0.2
  }
}

LMNT

Consistent, lightweight synthesis. Browse voices → Models:

blizzard — Standard model

Speed range: Fixed at 1.0x (no adjustment)

Example

LMNT configuration example

ttsConfig: {
  vendor: "Lmnt",
  voiceId: "lmnt-voice-id",
  model: "blizzard"
}

LMNT does not support speed adjustment. Voice always plays at 1.0x.

Advanced settings

Responsiveness

Controls how quickly the agent begins speaking after the user finishes.

Value	Behavior
1.0	Most responsive — minimal delay (recommended)
0.7	Slight delay added
0.5	Moderate delay
0.0	Maximum delay

Responsiveness configuration example

ttsConfig: {
  responsiveness: 1.0
}

Interruption sensitivity

Controls how easily users can interrupt the agent while it is speaking. Lower values require more user speech before the agent stops; higher values make it easier to cut in.

Value	Behavior
1.0	Easiest to interrupt — agent yields quickly (default)
0.5	Moderate — more user speech required before the agent stops
0.0	Non-interruptible — the agent is not interrupted by normal speech

When left unset, the agent uses its default interruptible behavior (equivalent to 1.0).

Interruption sensitivity configuration example

ttsConfig: {
  interruptionSensitivity: 1.0
}

Dynamic speed adjustment

Allow agents to adapt speech pace when users request it (“Can you speak more slowly?”).

Speed adjustment configuration example

ttsConfig: {
  speed: 1.0,
  speedAdjustment: {
    version: "v1",
    strategy: "OnRequest"  // or "Disabled"
  }
}

Strategy	Behavior
`OnRequest`	Agent adjusts speed when user requests (default)
`Disabled`	Speed remains fixed

Speed ranges by provider

Provider	Min	Max	Default	Recommended
ElevenLabs	0.70x	1.20x	1.0x	0.9x–1.2x
Cartesia	0x (sent as 0.25x)	2.0x	1.0x	0.8x–1.3x
Inworld	0.80x	1.50x	1.0x	0.9x–1.1x
LMNT	1.0x	1.0x	1.0x	1.0x (fixed)

Provider decision guide

Choose ElevenLabs if:

You need highest voice quality
Brand-specific voice cloning is important
Advanced customization is required

Choose Cartesia if:

Emotional expression is important
You need wide speed range (0x-2.0x)

Choose Inworld if:

You’re building character-driven experiences
Gaming or interactive media is your use case

Choose LMNT if:

You need consistent, predictable output
Minimal configuration is desired

Pronunciation dictionary

Customize how your agent pronounces brand names, acronyms, and technical terms.

Provider support

Provider	Alias rules	Phoneme rules
ElevenLabs	Yes	No
Cartesia	Yes	Yes
Inworld	No	No
LMNT	No	No

Alias rules replace one word with another spelling. Phoneme rules specify exact pronunciation using IPA notation.

Create a dictionary

API example: Create pronunciation dictionary

const response = await fetch('https://blackbox.dasha.ai/api/v1/pronunciation-dictionaries', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Company Terms",
    provider: "Cartesia",
    rules: [
      {
        type: "alias",
        stringToReplace: "API",
        replacement: "A P I"
      },
      {
        type: "alias",
        stringToReplace: "SQL",
        replacement: "sequel"
      },
      {
        type: "phoneme",
        stringToReplace: "Dasha",
        phoneme: "ˈdɑːʃə",
        alphabet: "ipa"
      }
    ]
  })
});

const dictionary = await response.json();
console.log('Dictionary ID:', dictionary.id);
// Output: pd_abc123def456

Reference in agent config

Pronunciation dictionary reference example

ttsConfig: {
  vendor: "Cartesia",
  voiceId: "your-voice-id",
  pronunciationDictionary: {
    id: "pd_abc123def456",
    hash: "a1b2c3d4e5f6"
  }
}

Create a single, comprehensive dictionary for your organization and reference it across all agents.

Multilingual configuration

For agents that switch languages mid-conversation:

Multilingual configuration example

config: {
  primaryLanguage: "en-US",
  ttsConfig: {
    version: "v1",
    vendor: "ElevenLabs",
    voiceId: "multilingual-voice-id",
    model: "eleven_multilingual_v2",
    speed: 1.0
  },
  features: {
    languageSwitching: {
      isEnabled: true
    }
  }
}

Voice & Speech

Basic voice configuration

Dashboard Testing

Test voice quality in the browser

Introduction

Create

Test

Deploy

Monitor

Advanced Features

Troubleshooting

ElevenLabs

Configuration options

Example

Cartesia

Emotion system

Example

Inworld

Configuration options

Example

LMNT

Example

Advanced settings

Responsiveness

Interruption sensitivity

Dynamic speed adjustment

Speed ranges by provider

Provider decision guide

Pronunciation dictionary

Provider support

Create a dictionary

Reference in agent config

Multilingual configuration

Voice & Speech

Dashboard Testing

​ElevenLabs

​Configuration options

​Example

​Cartesia

​Emotion system

​Example

​Inworld

​Configuration options

​Example

​LMNT

​Example

​Advanced settings

​Responsiveness

​Interruption sensitivity

​Dynamic speed adjustment

​Speed ranges by provider

​Provider decision guide

​Pronunciation dictionary

​Provider support

​Create a dictionary

​Reference in agent config

​Multilingual configuration

​Related

Voice & Speech

Dashboard Testing

ElevenLabs

Configuration options

Example

Cartesia

Emotion system

Example

Inworld

Configuration options

Example

LMNT

Example

Advanced settings

Responsiveness

Interruption sensitivity

Dynamic speed adjustment

Speed ranges by provider

Provider decision guide

Pronunciation dictionary

Provider support

Create a dictionary

Reference in agent config

Multilingual configuration

Related