How can we help? 👋

Voice AI: Model Configuration

When setting up a voice bot, you can choose from a number of configuration options with regards to the bot’s underlying systems. Navigate to the Model Configuration section of your voice bot edit page.

Voice architecture

Your voice bot can utilise one of the two available voice architectures:

  • Speech-to-Text-to-Speech (S4TS): This architecture offers the most flexibility with regards to languages, models and voices. The main trade-off is that it’s not as quick to respond as Speech-to-Speech.
  • Speech-to-Speech (STS): This architecture offers faster response times than S4TS. However, it is much less configurable.

Speech to Speech (STS)

Notion image

STS is the simplest architecture to configure. You have the following options:

  • Realtime Model: the model that will be used to handle all aspects of your voice bot.
  • Voice ID: the voice the bot will use for responses.

Speech to Text to Speech (S4TS)

S4TS offers a wealth of configuration options for more specific use-cases.

Speech to Text

You can choose from two providers when deciding how the voice bot will listen to a customer.

Deepgram

Deepgram offers the following configuration options:

  • Language: Deepgram offers a number of languages as well as multilingual support.
  • Model: The Deepgram models available for selection are based on which language you’ve chosen.
  • Smart Formatting: Enable this feature to apply additional formatting to transcripts, for better readability.
Notion image

Visit Deepgram’s documentation page to learn more about their language and model options.

Google Speech Speech-to-Text

Google speech offers the following configuration options.

  • Language: Over 140 languages are available to choose from.
  • Model: The models available for selection are based on which language you’ve chosen.
Notion image
💡

We currently only support Google Speech-to-Text V1.

Visit Google’s documentation page to learn more about their language and model options.

LLM Model

The LLM model is used to understand what the customer has said and generate an appropriate response. It offers the following options:

  • Model: This is the OpenAI model that should be used.
  • Temperature: It is advised not to manually set this value. This is a value between 0 and 1. The lower the value, the more likely the LLM model is to produce the same output when given the same input. Note: temperature can’t be set for models later than GPT-4.1.
Notion image

Text to Speech

You can choose from two providers when deciding how your bot should sound when responding to a customer.

ElevenLabs

We offer a selection of high-quality voices provided by ElevenLabs. All ElevenLabs voices can speak any of their supported languages. However, each voice has a subset of preferred languages that sound more natural.

To edit your voice:

  • Click Change Voice to open the voice selection modal.
    • Notion image
  • From here, you are able to filter and preview voices in their various preferred languages.
    • Notion image
  • There are additional advanced ElevenLabs configuration options. We advise to stick to the defaults unless you have a specific need to change them.
💡

Reach out to your Talkative CSM if you’d like to use an ElevenLabs voice that isn’t currently in our voice library.

Google Speech Text-to-Speech

Google Speech offers over 1000 voice options, covering a number of languages.

To edit your voice:

  • Click Change Voice to open the voice selection modal.
    • Notion image
  • From here, you can filter and preview the various voice options.
    • Notion image
 

Additional S4TS options

  • Allow Interrupts: enable this setting to allow customers to provide a response to the voice bot before the bot has finished speaking.
 
Did this answer your question?
😞
😐
🤩