Skip to Content
GuidesVoice Agents

Voice Agents

AgentPress supports real-time voice conversations with your agents. Users can speak directly to your agent and receive spoken responses, creating a natural conversational experience.

Overview

Voice agents combine:

  • Speech-to-text: User speech is transcribed in real-time
  • AI processing: Your agent processes the message using its configured prompt and tools
  • Text-to-speech: The agent’s response is spoken back to the user

Voice conversations support everything regular text conversations do, including tool usage, knowledge base access, and persona-based journeys.

Voice Providers

AgentPress supports two voice providers:

OpenAI Realtime

The default voice provider, using OpenAI’s GPT-4 Realtime API.

Characteristics:

  • High intelligence (8/10)
  • Fast response times (8/10)
  • 11 distinct voice options
  • WebRTC-based for low latency

Ultravox

An alternative voice provider with customizable voices.

Characteristics:

  • Good intelligence (7/10)
  • Fast response times (8/10)
  • Custom voice library
  • Adjustable temperature settings

Configuring Voice

To enable voice for an agent:

Step 1: Open Agent Settings

Edit your agent and scroll to the Voice Configuration section.

Step 2: Enable Voice

Toggle Enable Voice to activate voice capabilities.

Step 3: Select a Model

Choose between:

  • OpenAI Realtime - Best for most use cases
  • Ultravox - For custom voice requirements

Step 4: Choose a Voice

For OpenAI: Select from 11 built-in voices:

VoiceCharacter
AlloyNeutral, professional
AshClear, articulate
BalladWarm, narrative
CoralFriendly, approachable
EchoCalm, measured
FableEngaging, storytelling
NovaEnergetic, dynamic
OnyxDeep, authoritative
SageCalm, professional
ShimmerLight, cheerful
VerseDistinctive, clear

For Ultravox: Enter a voice ID or browse the Ultravox voice library  to find additional options.

Step 5: Set Temperature (Ultravox only)

Adjust the temperature slider to control response variation:

  • Lower (0-0.5): More consistent, focused responses
  • Higher (1.5-2): More creative, varied responses

Step 6: Configure Voice Prompt

Optionally, provide a voice-specific system prompt. This prompt is used instead of your main agent prompt during voice conversations.

Why use a separate voice prompt?

  • Voice conversations are more conversational
  • Written responses may not sound natural when spoken
  • You may want different instructions for spoken interactions

If you leave this blank, your main agent prompt is used.

Using Voice in Chat

Starting a Voice Conversation

  1. Open a conversation with a voice-enabled agent
  2. Click the microphone button in the chat input
  3. Grant microphone permission when prompted
  4. Start speaking

During the Conversation

  • A waveform visualization shows when audio is being processed
  • The agent’s responses are spoken aloud
  • Transcriptions appear in the chat for reference
  • You can use tools just like in text conversations

Ending the Conversation

Click the stop button (square icon) to end the voice session. You can continue the conversation via text or start a new voice session.

Voice and Tools

Voice agents can use all the same tools as text agents. When a tool is called:

  1. The agent speaks that it’s performing an action
  2. The tool executes in the background
  3. The agent speaks the result

This creates a natural flow where the agent can look up information, perform calculations, or take actions while maintaining the conversation.

Voice and Context

Voice conversations maintain full context:

  • Conversation history: Previous messages are remembered
  • Tool results: Outputs from tools are available for follow-up
  • Personas: Persona-based guidance works the same as text
  • Knowledge bases: RAG queries work during voice conversations

You can seamlessly switch between text and voice within the same conversation thread.

Voice Prompt Best Practices

Keep It Conversational

Write prompts that sound natural when spoken:

Instead of:

Provide a comprehensive response covering all aspects of the user’s query with appropriate detail and context.

Write:

Have a natural conversation. Keep responses concise and clear. Ask follow-up questions when helpful.

Account for Spoken Interaction

Voice users can’t see formatting, links, or code blocks. Adjust your instructions:

Instead of:

Format your response with bullet points and include relevant links.

Write:

Explain things step by step. Mention when you could share a link or document if they switch to text.

Handle Interruptions Gracefully

Users may interrupt or change topics mid-sentence. Include guidance like:

If the user changes topics, smoothly acknowledge and move to the new topic.

Consider Pacing

Spoken responses should be appropriately paced:

Keep responses brief enough to feel conversational. Pause between topics to give the user a chance to respond.

Example Voice Configuration

Here’s an example configuration for a customer service voice agent:

Voice Model: OpenAI Realtime Voice: Nova Voice Prompt:

You are a helpful customer service agent. Speak naturally and conversationally. Keep responses concise—aim for 2-3 sentences when possible. If you need to explain something complex, break it into steps and check in with the customer between steps. When looking up information, let the customer know what you're doing: "Let me check that for you." If you can't help with something, clearly explain what they should do next. Always confirm you've understood their question before answering if there's any ambiguity.

Troubleshooting

Voice Not Working

  • Check that Enable Voice is toggled on in agent settings
  • Verify your browser has microphone permissions
  • Ensure you have a stable internet connection
  • Try refreshing the page

Audio Quality Issues

  • Use a headset to prevent echo
  • Reduce background noise
  • Speak clearly at a moderate pace
  • Check your microphone settings

Agent Not Responding

  • The agent may still be processing—wait a moment
  • If using tools, the agent might be waiting for results
  • Try ending and restarting the voice session

Transcription Errors

  • Speak more slowly and clearly
  • Reduce background noise
  • Unusual terms or names may need spelling out

Summary

FeatureDescription
ProvidersOpenAI Realtime and Ultravox
Voices11 OpenAI voices + custom Ultravox library
TemperatureAdjustable for Ultravox
Voice PromptSeparate prompt for voice conversations
Tool SupportFull tool access during voice
ContextMaintains conversation history
SwitchingSeamless text-to-voice transitions

Voice agents bring your AI assistants to life, enabling hands-free interactions that feel natural and engaging.

Last updated on