Voice Agents
AgentPress supports real-time voice conversations with your agents. Users can speak directly to your agent and receive spoken responses, creating a natural conversational experience.
Overview
Section titled “Overview”Voice agents combine:
- Speech-to-text: User speech is transcribed in real-time
- AI processing: Your agent processes the message using its configured prompt and tools
- Text-to-speech: The agent’s response is spoken back to the user
Voice conversations support everything regular text conversations do, including tool usage, knowledge base access, and persona-based journeys.
Voice Providers
Section titled “Voice Providers”AgentPress supports two voice providers:
OpenAI Realtime
Section titled “OpenAI Realtime”The default voice provider, using OpenAI’s GPT-4 Realtime API.
Characteristics:
- High intelligence (8/10)
- Fast response times (8/10)
- 11 distinct voice options
- WebRTC-based for low latency
Ultravox
Section titled “Ultravox”An alternative voice provider with customizable voices.
Characteristics:
- Good intelligence (7/10)
- Fast response times (8/10)
- Custom voice library
- Adjustable temperature settings
Configuring Voice
Section titled “Configuring Voice”To enable voice for an agent:
Step 1: Open Agent Settings
Section titled “Step 1: Open Agent Settings”Edit your agent and scroll to the Voice Configuration section.
Step 2: Enable Voice
Section titled “Step 2: Enable Voice”Toggle Enable Voice to activate voice capabilities.
Step 3: Select a Model
Section titled “Step 3: Select a Model”Choose between:
- OpenAI Realtime - Best for most use cases
- Ultravox - For custom voice requirements
Step 4: Choose a Voice
Section titled “Step 4: Choose a Voice”For OpenAI: Select from 11 built-in voices:
| Voice | Character |
|---|---|
| Alloy | Neutral, professional |
| Ash | Clear, articulate |
| Ballad | Warm, narrative |
| Coral | Friendly, approachable |
| Echo | Calm, measured |
| Fable | Engaging, storytelling |
| Nova | Energetic, dynamic |
| Onyx | Deep, authoritative |
| Sage | Calm, professional |
| Shimmer | Light, cheerful |
| Verse | Distinctive, clear |
For Ultravox: Enter a voice ID or browse the Ultravox voice library to find additional options.
Step 5: Set Temperature (Ultravox only)
Section titled “Step 5: Set Temperature (Ultravox only)”Adjust the temperature slider to control response variation:
- Lower (0-0.5): More consistent, focused responses
- Higher (1.5-2): More creative, varied responses
Step 6: Configure Voice Prompt
Section titled “Step 6: Configure Voice Prompt”Optionally, provide a voice-specific system prompt. This prompt is used instead of your main agent prompt during voice conversations.
Why use a separate voice prompt?
- Voice conversations are more conversational
- Written responses may not sound natural when spoken
- You may want different instructions for spoken interactions
If you leave this blank, your main agent prompt is used.
Using Voice in Chat
Section titled “Using Voice in Chat”Starting a Voice Conversation
Section titled “Starting a Voice Conversation”- Open a conversation with a voice-enabled agent
- Click the microphone button in the chat input
- Grant microphone permission when prompted
- Start speaking
During the Conversation
Section titled “During the Conversation”- A waveform visualization shows when audio is being processed
- The agent’s responses are spoken aloud
- Transcriptions appear in the chat for reference
- You can use tools just like in text conversations
Ending the Conversation
Section titled “Ending the Conversation”Click the stop button (square icon) to end the voice session. You can continue the conversation via text or start a new voice session.
Voice and Tools
Section titled “Voice and Tools”Voice agents can use all the same tools as text agents. When a tool is called:
- The agent speaks that it’s performing an action
- The tool executes in the background
- The agent speaks the result
This creates a natural flow where the agent can look up information, perform calculations, or take actions while maintaining the conversation.
Voice and Context
Section titled “Voice and Context”Voice conversations maintain full context:
- Conversation history: Previous messages are remembered
- Tool results: Outputs from tools are available for follow-up
- Personas: Persona-based guidance works the same as text
- Knowledge bases: RAG queries work during voice conversations
You can seamlessly switch between text and voice within the same conversation thread.
Voice Prompt Best Practices
Section titled “Voice Prompt Best Practices”Keep It Conversational
Section titled “Keep It Conversational”Write prompts that sound natural when spoken:
Instead of:
Provide a comprehensive response covering all aspects of the user’s query with appropriate detail and context.
Write:
Have a natural conversation. Keep responses concise and clear. Ask follow-up questions when helpful.
Account for Spoken Interaction
Section titled “Account for Spoken Interaction”Voice users can’t see formatting, links, or code blocks. Adjust your instructions:
Instead of:
Format your response with bullet points and include relevant links.
Write:
Explain things step by step. Mention when you could share a link or document if they switch to text.
Handle Interruptions Gracefully
Section titled “Handle Interruptions Gracefully”Users may interrupt or change topics mid-sentence. Include guidance like:
If the user changes topics, smoothly acknowledge and move to the new topic.
Consider Pacing
Section titled “Consider Pacing”Spoken responses should be appropriately paced:
Keep responses brief enough to feel conversational. Pause between topics to give the user a chance to respond.
Example Voice Configuration
Section titled “Example Voice Configuration”Here’s an example configuration for a customer service voice agent:
Voice Model: OpenAI Realtime Voice: Nova Voice Prompt:
Troubleshooting
Section titled “Troubleshooting”Voice Not Working
Section titled “Voice Not Working”- Check that Enable Voice is toggled on in agent settings
- Verify your browser has microphone permissions
- Ensure you have a stable internet connection
- Try refreshing the page
Audio Quality Issues
Section titled “Audio Quality Issues”- Use a headset to prevent echo
- Reduce background noise
- Speak clearly at a moderate pace
- Check your microphone settings
Agent Not Responding
Section titled “Agent Not Responding”- The agent may still be processing—wait a moment
- If using tools, the agent might be waiting for results
- Try ending and restarting the voice session
Transcription Errors
Section titled “Transcription Errors”- Speak more slowly and clearly
- Reduce background noise
- Unusual terms or names may need spelling out
Summary
Section titled “Summary”| Feature | Description |
|---|---|
| Providers | OpenAI Realtime and Ultravox |
| Voices | 11 OpenAI voices + custom Ultravox library |
| Temperature | Adjustable for Ultravox |
| Voice Prompt | Separate prompt for voice conversations |
| Tool Support | Full tool access during voice |
| Context | Maintains conversation history |
| Switching | Seamless text-to-voice transitions |
Voice agents bring your AI assistants to life, enabling hands-free interactions that feel natural and engaging.