Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.blackbox.dasha.ai/llms.txt

Use this file to discover all available pages before exploring further.

Detailed configuration for each TTS provider. For basic setup, see Voice & Speech.

ElevenLabs

Industry-leading voice quality with extensive customization. Browse voices → Models:
  • eleven_multilingual_v2 — Best quality, 29+ languages
  • eleven_turbo_v2_5 — Balanced speed and quality
  • eleven_flash_v2_5 — Fastest inference
Speed range: 0.70x to 1.20x

Configuration options

OptionRangeDefaultEffect
similarity_boost0.0–1.00.75Voice consistency with original
stability0.0–1.00.5Output consistency across generations
style0.0–1.00.3Speaker style exaggeration
use_speaker_boostbooleantrueEnhanced clarity
optimize_streaming_latency0–44Trade quality for speed

Example

ttsConfig: {
  vendor: "ElevenLabs",
  voiceId: "21m00Tcm4TlvDq8ikWAM",
  model: "eleven_turbo_v2_5",
  speed: 1.0,
  vendorSpecificOptions: {
    similarity_boost: 0.8,
    stability: 0.6,
    style: 0.4,
    use_speaker_boost: true,
    optimize_streaming_latency: 3
  }
}

Cartesia

Granular emotion control with wide speed range. Browse voices → Models:
  • sonic-3 — Latest generation (recommended)
  • sonic-2 — Previous generation
  • sonic — Original model
Speed range: 0x to 2.0x (values below 0.25x sent as 0.25x)

Emotion system

Combine emotions for nuanced delivery:
DimensionIntensities
angerlowest, low, high, highest
positivitylowest, low, high, highest
surpriselowest, low, high, highest
sadnesslowest, low, high, highest
curiositylowest, low, high, highest
Start with subtle emotions (low levels). Combining too many high-intensity emotions sounds unnatural.

Example

ttsConfig: {
  vendor: "Cartesia",
  voiceId: "cartesia-voice-id",
  model: "sonic-3",
  speed: 1.2,
  vendorSpecificOptions: {
    emotions: [
      "positivity:high",
      "curiosity:low"
    ]
  }
}

Inworld

Character-focused voices for gaming and interactive media. Browse voices → Models:
  • inworld-tts-1.5-max — Highest quality
  • inworld-tts-1.5-mini — Balanced speed and quality (recommended)
  • inworld-tts-1 — Original model
Speed range: 0.80x to 1.50x

Configuration options

OptionDefaultEffect
temperature0.8Voice expressiveness
pitch0.0Pitch adjustment (+/-)

Example

ttsConfig: {
  vendor: "Inworld",
  voiceId: "inworld-voice-id",
  model: "inworld-tts-1.5-mini",
  speed: 1.0,
  vendorSpecificOptions: {
    temperature: 0.9,
    pitch: 0.2
  }
}

LMNT

Consistent, lightweight synthesis. Browse voices → Models:
  • blizzard — Standard model
Speed range: Fixed at 1.0x (no adjustment)

Example

ttsConfig: {
  vendor: "Lmnt",
  voiceId: "lmnt-voice-id",
  model: "blizzard"
}
LMNT does not support speed adjustment. Voice always plays at 1.0x.

Advanced settings

Responsiveness

Controls how quickly the agent begins speaking after the user finishes.
ValueBehavior
1.0Most responsive — minimal delay (recommended)
0.7Slight delay added
0.5Moderate delay
0.0Maximum delay
ttsConfig: {
  responsiveness: 1.0
}

Dynamic speed adjustment

Allow agents to adapt speech pace when users request it (“Can you speak more slowly?”).
ttsConfig: {
  speed: 1.0,
  speedAdjustment: {
    version: "v1",
    strategy: "OnRequest"  // or "Disabled"
  }
}
StrategyBehavior
OnRequestAgent adjusts speed when user requests (default)
DisabledSpeed remains fixed

Speed ranges by provider

ProviderMinMaxDefaultRecommended
ElevenLabs0.70x1.20x1.0x0.9x–1.2x
Cartesia0x (sent as 0.25x)2.0x1.0x0.8x–1.3x
Inworld0.80x1.50x1.0x0.9x–1.1x
LMNT1.0x1.0x1.0x1.0x (fixed)

Provider decision guide

Choose ElevenLabs if:
  • You need highest voice quality
  • Brand-specific voice cloning is important
  • Advanced customization is required
Choose Cartesia if:
  • Emotional expression is important
  • You need wide speed range (0x-2.0x)
Choose Inworld if:
  • You’re building character-driven experiences
  • Gaming or interactive media is your use case
Choose LMNT if:
  • You need consistent, predictable output
  • Minimal configuration is desired

Pronunciation dictionary

Customize how your agent pronounces brand names, acronyms, and technical terms.

Provider support

ProviderAlias rulesPhoneme rules
ElevenLabsYesNo
CartesiaYesYes
InworldNoNo
LMNTNoNo
Alias rules replace one word with another spelling. Phoneme rules specify exact pronunciation using IPA notation.

Create a dictionary

const response = await fetch('https://blackbox.dasha.ai/api/v1/pronunciation-dictionaries', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    name: "Company Terms",
    provider: "Cartesia",
    rules: [
      {
        type: "alias",
        stringToReplace: "API",
        replacement: "A P I"
      },
      {
        type: "alias",
        stringToReplace: "SQL",
        replacement: "sequel"
      },
      {
        type: "phoneme",
        stringToReplace: "Dasha",
        phoneme: "ˈdɑːʃə",
        alphabet: "ipa"
      }
    ]
  })
});

const dictionary = await response.json();
console.log('Dictionary ID:', dictionary.id);
// Output: pd_abc123def456

Reference in agent config

ttsConfig: {
  vendor: "Cartesia",
  voiceId: "your-voice-id",
  pronunciationDictionary: {
    id: "pd_abc123def456",
    hash: "a1b2c3d4e5f6"
  }
}
Create a single, comprehensive dictionary for your organization and reference it across all agents.

Multilingual configuration

For agents that switch languages mid-conversation:
config: {
  primaryLanguage: "en-US",
  ttsConfig: {
    version: "v1",
    vendor: "ElevenLabs",
    voiceId: "multilingual-voice-id",
    model: "eleven_multilingual_v2",
    speed: 1.0
  },
  features: {
    languageSwitching: {
      isEnabled: true
    }
  }
}

Voice & Speech

Basic voice configuration

Dashboard Testing

Test voice quality in the browser