Voice Agents
AgentPress supports real-time voice conversations with your agents. Users can speak directly to your agent and receive spoken responses, creating a natural conversational experience.
Overview
Voice agents combine:
- Speech-to-text: User speech is transcribed in real-time
- AI processing: Your agent processes the message using its configured prompt and tools
- Text-to-speech: The agent’s response is spoken back to the user
Voice conversations support everything regular text conversations do, including tool usage, knowledge base access, and persona-based journeys.
Voice Providers
AgentPress supports two voice providers:
OpenAI Realtime
The default voice provider, using OpenAI’s GPT-4 Realtime API.
Characteristics:
- High intelligence (8/10)
- Fast response times (8/10)
- 11 distinct voice options
- WebRTC-based for low latency
Ultravox
An alternative voice provider with customizable voices.
Characteristics:
- Good intelligence (7/10)
- Fast response times (8/10)
- Custom voice library
- Adjustable temperature settings
Configuring Voice
To enable voice for an agent:
Step 1: Open Agent Settings
Edit your agent and scroll to the Voice Configuration section.
Step 2: Enable Voice
Toggle Enable Voice to activate voice capabilities.
Step 3: Select a Model
Choose between:
- OpenAI Realtime - Best for most use cases
- Ultravox - For custom voice requirements
Step 4: Choose a Voice
For OpenAI: Select from 11 built-in voices:
| Voice | Character |
|---|---|
| Alloy | Neutral, professional |
| Ash | Clear, articulate |
| Ballad | Warm, narrative |
| Coral | Friendly, approachable |
| Echo | Calm, measured |
| Fable | Engaging, storytelling |
| Nova | Energetic, dynamic |
| Onyx | Deep, authoritative |
| Sage | Calm, professional |
| Shimmer | Light, cheerful |
| Verse | Distinctive, clear |
For Ultravox: Enter a voice ID or browse the Ultravox voice library to find additional options.
Step 5: Set Temperature (Ultravox only)
Adjust the temperature slider to control response variation:
- Lower (0-0.5): More consistent, focused responses
- Higher (1.5-2): More creative, varied responses
Step 6: Configure Voice Prompt
Optionally, provide a voice-specific system prompt. This prompt is used instead of your main agent prompt during voice conversations.
Why use a separate voice prompt?
- Voice conversations are more conversational
- Written responses may not sound natural when spoken
- You may want different instructions for spoken interactions
If you leave this blank, your main agent prompt is used.
Using Voice in Chat
Starting a Voice Conversation
- Open a conversation with a voice-enabled agent
- Click the microphone button in the chat input
- Grant microphone permission when prompted
- Start speaking
During the Conversation
- A waveform visualization shows when audio is being processed
- The agent’s responses are spoken aloud
- Transcriptions appear in the chat for reference
- You can use tools just like in text conversations
Ending the Conversation
Click the stop button (square icon) to end the voice session. You can continue the conversation via text or start a new voice session.
Voice and Tools
Voice agents can use all the same tools as text agents. When a tool is called:
- The agent speaks that it’s performing an action
- The tool executes in the background
- The agent speaks the result
This creates a natural flow where the agent can look up information, perform calculations, or take actions while maintaining the conversation.
Voice and Context
Voice conversations maintain full context:
- Conversation history: Previous messages are remembered
- Tool results: Outputs from tools are available for follow-up
- Personas: Persona-based guidance works the same as text
- Knowledge bases: RAG queries work during voice conversations
You can seamlessly switch between text and voice within the same conversation thread.
Voice Prompt Best Practices
Keep It Conversational
Write prompts that sound natural when spoken:
Instead of:
Provide a comprehensive response covering all aspects of the user’s query with appropriate detail and context.
Write:
Have a natural conversation. Keep responses concise and clear. Ask follow-up questions when helpful.
Account for Spoken Interaction
Voice users can’t see formatting, links, or code blocks. Adjust your instructions:
Instead of:
Format your response with bullet points and include relevant links.
Write:
Explain things step by step. Mention when you could share a link or document if they switch to text.
Handle Interruptions Gracefully
Users may interrupt or change topics mid-sentence. Include guidance like:
If the user changes topics, smoothly acknowledge and move to the new topic.
Consider Pacing
Spoken responses should be appropriately paced:
Keep responses brief enough to feel conversational. Pause between topics to give the user a chance to respond.
Example Voice Configuration
Here’s an example configuration for a customer service voice agent:
Voice Model: OpenAI Realtime Voice: Nova Voice Prompt:
You are a helpful customer service agent. Speak naturally and conversationally.
Keep responses concise—aim for 2-3 sentences when possible. If you need to explain something complex, break it into steps and check in with the customer between steps.
When looking up information, let the customer know what you're doing: "Let me check that for you."
If you can't help with something, clearly explain what they should do next.
Always confirm you've understood their question before answering if there's any ambiguity.Troubleshooting
Voice Not Working
- Check that Enable Voice is toggled on in agent settings
- Verify your browser has microphone permissions
- Ensure you have a stable internet connection
- Try refreshing the page
Audio Quality Issues
- Use a headset to prevent echo
- Reduce background noise
- Speak clearly at a moderate pace
- Check your microphone settings
Agent Not Responding
- The agent may still be processing—wait a moment
- If using tools, the agent might be waiting for results
- Try ending and restarting the voice session
Transcription Errors
- Speak more slowly and clearly
- Reduce background noise
- Unusual terms or names may need spelling out
Summary
| Feature | Description |
|---|---|
| Providers | OpenAI Realtime and Ultravox |
| Voices | 11 OpenAI voices + custom Ultravox library |
| Temperature | Adjustable for Ultravox |
| Voice Prompt | Separate prompt for voice conversations |
| Tool Support | Full tool access during voice |
| Context | Maintains conversation history |
| Switching | Seamless text-to-voice transitions |
Voice agents bring your AI assistants to life, enabling hands-free interactions that feel natural and engaging.