Create a custom cloned voice from audio samples

const form = new FormData();
form.append('Name', '<string>');
form.append('Description', '<string>');
form.append('Language', '<string>');
form.append('Provider', 'ElevenLabs');
form.append('ProviderSpecific.ElevenLabs.RemoveBackgroundNoise', 'true');
form.append('ProviderSpecific.Cartesia.Mode', '<string>');
form.append('ProviderSpecific.Cartesia.Enhance', 'true');
form.append('ProviderSpecific.Cartesia.Transcript', '<string>');
form.append('Labels', '{}');
form.append('audioFiles', '<string>');
form.append('audioFiles.items', '{
  "fileName": "example-file"
}');

const options = {method: 'POST', headers: {Authorization: 'Bearer <token>'}};

options.body = form;

fetch('https://blackbox.dasha.ai/api/v1/voice/clone', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "id": "<string>",
  "provider": "ElevenLabs",
  "category": "Public",
  "name": "<string>",
  "voiceId": "<string>",
  "description": "<string>",
  "language": "<string>",
  "labels": {},
  "previewUrl": "<string>",
  "createdTime": "2023-11-07T05:31:56Z",
  "lastUpdatedTime": "2023-11-07T05:31:56Z"
}

Create a custom cloned voice from audio samples

Creates a new custom voice by cloning from provided audio samples using multipart form data. This process analyzes the audio characteristics and generates a synthetic voice that mimics the original speaker. The cloned voice can then be used for text-to-speech synthesis in agent configurations.

Voice Cloning Process:

Audio Analysis: Extracts vocal characteristics, tone, and speech patterns
Model Training: Creates a custom voice model using provider-specific algorithms
Quality Validation: Ensures the cloned voice meets quality standards
Organization Storage: Stores the cloned voice for organization-specific use

Provider-Specific Features:

ElevenLabs: Advanced voice cloning with emotion and style preservation
Cartesia: Fast cloning optimized for conversational AI applications
Dasha: Platform-native cloning with consistent quality and integration

Audio Requirements:

Format: WAV, MP3, or FLAC audio files
Quality: Minimum 16kHz sample rate, preferably 44.1kHz
Duration: 30 seconds to 10 minutes of clear speech
Content: Clean speech without background noise or music
Speaker: Single speaker with consistent tone and volume
File Count: At least 1 audio file is required
Total Size: Maximum 15MB combined size for all audio files

Form Fields:

name: Unique identifier for the cloned voice
description: Detailed description of voice characteristics
language: Primary language for the voice model
provider: Voice cloning service provider (ElevenLabs, Cartesia, Dasha)
providerSpecific: JSON string with provider-specific configuration
labels: JSON string with custom metadata tags
audioFiles: One or more audio files for voice cloning

Processing Time:

ElevenLabs: 1-3 minutes for standard cloning
Cartesia: 30 seconds to 2 minutes for fast cloning
Dasha: 1-2 minutes for platform-native processing

Quality Considerations:

Audio Quality: Higher quality input produces better cloned voices
Speaker Consistency: Consistent tone and speaking style improves results
Language Matching: Voice performance is optimized for specified language
Content Variety: Diverse speech patterns create more versatile voices

Common Use Cases:

Creating branded voices for customer service agents
Personalizing voice assistants with familiar voices
Multilingual agent deployment with consistent voice identity
Executive or spokesperson voice replication for announcements
Accessibility applications with personalized speech synthesis

Post-Processing:

Cloned voices appear in the organization’s voice library
Quality preview available immediately after processing
Voice can be tested using the synthesize endpoint
Voice settings can be fine-tuned based on initial results

POST

api

voice

clone

Create a custom cloned voice from audio samples

const form = new FormData();
form.append('Name', '<string>');
form.append('Description', '<string>');
form.append('Language', '<string>');
form.append('Provider', 'ElevenLabs');
form.append('ProviderSpecific.ElevenLabs.RemoveBackgroundNoise', 'true');
form.append('ProviderSpecific.Cartesia.Mode', '<string>');
form.append('ProviderSpecific.Cartesia.Enhance', 'true');
form.append('ProviderSpecific.Cartesia.Transcript', '<string>');
form.append('Labels', '{}');
form.append('audioFiles', '<string>');
form.append('audioFiles.items', '{
  "fileName": "example-file"
}');

const options = {method: 'POST', headers: {Authorization: 'Bearer <token>'}};

options.body = form;

fetch('https://blackbox.dasha.ai/api/v1/voice/clone', options)
  .then(res => res.json())
  .then(res => console.log(res))
  .catch(err => console.error(err));

{
  "id": "<string>",
  "provider": "ElevenLabs",
  "category": "Public",
  "name": "<string>",
  "voiceId": "<string>",
  "description": "<string>",
  "language": "<string>",
  "labels": {},
  "previewUrl": "<string>",
  "createdTime": "2023-11-07T05:31:56Z",
  "lastUpdatedTime": "2023-11-07T05:31:56Z"
}

Body

multipart/form-data

Name

string

required

Name identifier for the cloned voice

Required string length: 1 - 100

Description

string

required

Description of voice characteristics

Required string length: 1 - 1000

Language

string

required

Primary language for the voice model

Provider

enum<string>

required

Voice cloning service provider

Available options:

ElevenLabs,

Cartesia,

Dasha,

Inworld,

Lmnt

ProviderSpecific.ElevenLabs.RemoveBackgroundNoise

boolean

Whether to remove background noise from audio

ProviderSpecific.Cartesia.Mode

string

Cloning mode: Stability or Similarity

ProviderSpecific.Cartesia.Enhance

boolean

Whether to enhance audio quality

ProviderSpecific.Cartesia.Transcript

string

Transcript for voice cloning

Labels

object

Custom metadata labels

Show child attributes

audioFiles

file[]

Audio files for voice cloning

Response

Returns the created cloned voice details

Response DTO for TTS voice cloning operations

string

required

Unique identifier for the voice

Minimum string length: 1

provider

enum<string>

required

TTS provider (ElevenLabs, Cartesia, Dasha, Inworld, Lmnt)

Available options:

ElevenLabs,

Cartesia,

Dasha,

Inworld,

Lmnt

Agents

CallResults

Calls

Mcp

Misc

SipAliases

Voice

WebhookTest

WebIntegrations

WebSocket

Document

KnowledgeBase

Search

Create a custom cloned voice from audio samples

Body

Response