Voice Agents

AgentPress supports real-time voice conversations with your agents. Users can speak directly to your agent and receive spoken responses, creating a natural conversational experience.

Overview

Voice agents combine:

Speech-to-text: User speech is transcribed in real-time
AI processing: Your agent processes the message using its configured prompt and tools
Text-to-speech: The agent’s response is spoken back to the user

Voice conversations support everything regular text conversations do, including tool usage, knowledge base access, and persona-based journeys.

Voice Providers

AgentPress supports two voice providers:

OpenAI Realtime

The default voice provider, using OpenAI’s GPT-4 Realtime API.

Characteristics:

High intelligence (8/10)
Fast response times (8/10)
11 distinct voice options
WebRTC-based for low latency

Ultravox

An alternative voice provider with customizable voices.

Characteristics:

Good intelligence (7/10)
Fast response times (8/10)
Custom voice library
Adjustable temperature settings

Configuring Voice

To enable voice for an agent:

Step 1: Open Agent Settings

Edit your agent and scroll to the Voice Configuration section.

Step 2: Enable Voice

Toggle Enable Voice to activate voice capabilities.

Step 3: Select a Model

Choose between:

OpenAI Realtime - Best for most use cases
Ultravox - For custom voice requirements

Step 4: Choose a Voice

For OpenAI: Select from 11 built-in voices:

Voice	Character
Alloy	Neutral, professional
Ash	Clear, articulate
Ballad	Warm, narrative
Coral	Friendly, approachable
Echo	Calm, measured
Fable	Engaging, storytelling
Nova	Energetic, dynamic
Onyx	Deep, authoritative
Sage	Calm, professional
Shimmer	Light, cheerful
Verse	Distinctive, clear

For Ultravox: Enter a voice ID or browse the Ultravox voice library to find additional options.

Step 5: Set Temperature (Ultravox only)

Adjust the temperature slider to control response variation:

Lower (0-0.5): More consistent, focused responses
Higher (1.5-2): More creative, varied responses

Step 6: Configure Voice Prompt

Optionally, provide a voice-specific system prompt. This prompt is used instead of your main agent prompt during voice conversations.

Why use a separate voice prompt?

Voice conversations are more conversational
Written responses may not sound natural when spoken
You may want different instructions for spoken interactions

If you leave this blank, your main agent prompt is used.

Using Voice in Chat

Starting a Voice Conversation

Open a conversation with a voice-enabled agent
Click the microphone button in the chat input
Grant microphone permission when prompted
Start speaking

During the Conversation

A waveform visualization shows when audio is being processed
The agent’s responses are spoken aloud
Transcriptions appear in the chat for reference
You can use tools just like in text conversations

Ending the Conversation

Click the stop button (square icon) to end the voice session. You can continue the conversation via text or start a new voice session.

Voice and Tools

Voice agents can use all the same tools as text agents. When a tool is called:

The agent speaks that it’s performing an action
The tool executes in the background
The agent speaks the result

This creates a natural flow where the agent can look up information, perform calculations, or take actions while maintaining the conversation.

Voice and Context

Voice conversations maintain full context:

Conversation history: Previous messages are remembered
Tool results: Outputs from tools are available for follow-up
Personas: Persona-based guidance works the same as text
Knowledge bases: RAG queries work during voice conversations

You can seamlessly switch between text and voice within the same conversation thread.

Voice Prompt Best Practices

Keep It Conversational

Write prompts that sound natural when spoken:

Instead of:

Provide a comprehensive response covering all aspects of the user’s query with appropriate detail and context.

Write:

Have a natural conversation. Keep responses concise and clear. Ask follow-up questions when helpful.

Account for Spoken Interaction

Voice users can’t see formatting, links, or code blocks. Adjust your instructions:

Instead of:

Format your response with bullet points and include relevant links.

Write:

Explain things step by step. Mention when you could share a link or document if they switch to text.

Handle Interruptions Gracefully

Users may interrupt or change topics mid-sentence. Include guidance like:

If the user changes topics, smoothly acknowledge and move to the new topic.

Consider Pacing

Spoken responses should be appropriately paced:

Keep responses brief enough to feel conversational. Pause between topics to give the user a chance to respond.

Example Voice Configuration

Here’s an example configuration for a customer service voice agent:

Voice Model: OpenAI Realtime Voice: Nova Voice Prompt:


You are a helpful customer service agent. Speak naturally and conversationally.

Keep responses concise—aim for 2-3 sentences when possible. If you need to explain something complex, break it into steps and check in with the customer between steps.

When looking up information, let the customer know what you're doing: "Let me check that for you."

If you can't help with something, clearly explain what they should do next.

Always confirm you've understood their question before answering if there's any ambiguity.

Troubleshooting

Voice Not Working

Check that Enable Voice is toggled on in agent settings
Verify your browser has microphone permissions
Ensure you have a stable internet connection
Try refreshing the page

Audio Quality Issues

Use a headset to prevent echo
Reduce background noise
Speak clearly at a moderate pace
Check your microphone settings

Agent Not Responding

The agent may still be processing—wait a moment
If using tools, the agent might be waiting for results
Try ending and restarting the voice session

Transcription Errors

Speak more slowly and clearly
Reduce background noise
Unusual terms or names may need spelling out

Summary

Feature	Description
Providers	OpenAI Realtime and Ultravox
Voices	11 OpenAI voices + custom Ultravox library
Temperature	Adjustable for Ultravox
Voice Prompt	Separate prompt for voice conversations
Tool Support	Full tool access during voice
Context	Maintains conversation history
Switching	Seamless text-to-voice transitions

Voice agents bring your AI assistants to life, enabling hands-free interactions that feel natural and engaging.