How can we help? 👋

Voice AI: Bot Development Guidance

Integrations

Tool Response Security Guidance

Tools should return only the minimum amount of data required to complete a task. The response should contain only information that the user is allowed to see or is already aware of.

LLMs should not be trusted to safely ignore sensitive fields in an API response. Any data returned by a tool could potentially be exposed in the model’s output, so responses must be carefully restricted.


Tool Response Structure Guidance

Responses should be structured in a way that is simple and easy for the model to interpret. If structured data must be returned, ensure the prompt clearly instructs the model how to interpret it.

Where possible, provide clear, human-readable responses rather than raw structured data.

For example, instead of returning:

{
	"status":"paid",
	"paid":"£0.00"
}

return something clearer such as:

{
	"status":"There was no balance to pay for this item"
}

This reduces ambiguity and helps the model produce more accurate responses.


Tool Response Size Guidance

LLMs have limited context windows, meaning they can only retain a certain amount of the conversation at any given time.

Tool responses are added to this context. Returning large amounts of unnecessary data will:

  • Consume available context space
  • Increase API costs
  • Reduce the model’s ability to reference earlier parts of the conversation
  • Increase latency, as more data must be processed on each request

For best results, responses should be kept concise and relevant.


Handling “Not Found” Responses

Empty results can confuse the model. APIs should provide a clear indicator explaining why no data was returned.

For example, instead of returning a plain 404, return a 404 with a response body such as:

No entries were found for these search parameters.

This helps the model distinguish between:

  • A valid request that returned no results
  • An incorrect endpoint or malformed request

You should also provide prompt guidance on how the model should communicate failed searches to users.

For example, if a lookup requires both a member ID and date of birth, the model may incorrectly respond with:

“I could not find a record with that date of birth.”

In reality, the model cannot know which value was incorrect. This can mislead the customer and potentially introduce a data leakage risk, as a malicious user could attempt to brute-force valid data combinations.


Tool Security and Rate Limiting Guidance

APIs should include a unique identifier for each request, ideally using the interaction UUID. This allows requests to be traced back to their originating interaction.

Where possible, APIs should also implement rate limiting based on the interaction ID. This helps prevent brute-force attacks against backend services.


Business Hours

Business hours defined within prompts are often unreliable. LLMs are not dependable when performing tasks such as:

  • Converting time zones
  • Calculating relative times
  • Interpreting business hours logic

Where possible, provide a dedicated tool to handle time calculations and availability checks rather than relying on the model to infer them.


Prompt Writing

How the LLM Finds Tools

Whenever the model generates a response, it evaluates the tools available to determine whether any are relevant to the user’s request.

For example, if a user asks to check their bank balance, the model recognises that it cannot access that information directly. It will then examine the available tools to determine whether a suitable tool exists to retrieve the data.

The model evaluates the tool description and parameter definitions each time it decides how to respond.

⚠️

The model cannot see the tool’s internal name. Referring to the tool by its name will not produce reliable results.

 

Tools in Prompts

If you are writing a prompt for a bot that will eventually use tools, but those tools are not yet available, it is generally best to delay writing the prompt.

Tools can significantly influence how the bot behaves.

For example, consider a tool that retrieves a customer’s bank balance:

{
	"method":"POST",
	"url":"https://someapi.com/customer/balance",
	"body": {
		"customerId":"123456",
		"telephoneBankingId":"987654"
	}
}

The prompt would describe the parameters:

customerId:
	The customer’s ID, provided by the customer when requested.

telephoneBankingId:
	A telephone banking ID used as a secure authentication code.

The tool description might then say:

Use this tool to retrieve the customer's current bank balance.
Both parameters must be provided to complete the request.

💡 Important:

You do not need to explicitly instruct the model in the main prompt to collect these values. The model can infer the required information based on the tool parameters.

If a customer asks:

“What’s my bank balance?”

the model will identify the tool, review the required parameters, and ask the user for the necessary details.


When Additional Prompt Guidance Is Needed

Additional prompt instructions are useful when you want the model to follow a specific workflow, especially when multiple tools depend on one another.

For example, if a bank transfer tool exists, the prompt might include instructions such as:

If a customer wants to initiate a bank transfer, you should first check their bank balance. This confirms whether sufficient funds are available and whether an overdraft facility applies. Inform the customer if the transfer would result in an overdraft.

Why You Should Wait for Tools Before Finalising Prompts

If you design and test conversation flows before tools are implemented, you will likely need to simulate the workflow manually in the prompt.

Once tools are later added, the bot’s behaviour can change significantly, which may invalidate previously tested flows.

For this reason, it is generally best to finalise prompts after tools have been implemented, whenever possible.


Avoid Referring to Tools by Name

The tool label in Talkative exists only for internal reference. The model does not see this label.

Instead of referring to a specific tool name, prompts should describe the action required.

For example:

Use a tool to look up the customer's bank balance.

or simply:

Look up the customer's bank balance.

If stronger instruction is required, you might write:

Always check the customer's bank balance using a tool, as the value may have changed since the last check.

Referencing the Prompt as the Source of Truth

Prompts often refer to a “knowledge base”, but the model has no inherent understanding of what this means.

Every time a tool is called, its output is added to the prompt context. For this reason, it is more effective to instruct the model to:

  • Refer only to the information provided in the prompt
  • Generate answers based solely on the information contained in the prompt

Internationalisation Guidance

Default prompts often assume UK date and time formats. Ensure these match the expectations of the customer’s region.

The way the prompt describes dates and times influences how the model:

  • Interprets user input
  • Formats its responses during data collection

S2S vs S4TS

Prompt Variations

When using S2S, the prompt affects both how the model interprets information and how it interacts during speech.

Although prompting accents rarely works reliably, S2S prompts can guide behaviour during tool usage. For example:

  • “Bear with me while I check that for you.”
  • “Speak slowly when repeating numbers.”

This works because S2S can execute tools while speaking and adapt the response dynamically.

With S4TS, this is not possible. The model does not know a tool is being executed until it begins execution, and speech is generated only after the tool response has been processed.

For clarity:

  • With S4TS, the prompt affects only the LLM component
  • The STT and TTS systems are unaffected

Instructions such as “speak more slowly” will therefore have no effect.


Influencing Speech with S4TS

While you cannot directly control the TTS voice through the prompt, you can influence how the LLM formats the text that will be spoken.

For example, when returning codes such as:

SA-1234

you might instruct the model:

The text you generate will be read aloud by a TTS model. When returning long numbers or codes, write them in word form separated by commas so the TTS can read them clearly.

This may produce output such as:

S, A, One, Two, Three, Four

This approach may reduce transcript readability, but it often improves the clarity of spoken responses.

Interaction data can still be post-processed into a more readable format for storage.


Speech-to-Text Smart Formatting

Smart formatting can improve transcript readability for human review. However, when using Voice AI, it is often better to disable it.

Smart formatting may:

  • Alter what the caller actually said
  • Remove important characters
  • Change the meaning of phrases
  • Interfere with tool parameters

Disabling it may make transcripts slightly less readable, but it ensures the LLM receives the full and accurate input, improving reliability.

Did this answer your question?
😞
😐
🤩