Skip to content

Voice Agents

AgentPress supports real-time voice conversations with your agents. Users can speak directly to your agent and receive spoken responses, creating a natural conversational experience.

Voice agents combine:

  • Speech-to-text: User speech is transcribed in real-time
  • AI processing: Your agent processes the message using its configured prompt and tools
  • Text-to-speech: The agent’s response is spoken back to the user

Voice conversations support everything regular text conversations do, including tool usage, knowledge base access, and persona-based journeys.

AgentPress supports two voice providers:

The default voice provider, using OpenAI’s GPT-4 Realtime API.

Characteristics:

  • High intelligence (8/10)
  • Fast response times (8/10)
  • 11 distinct voice options
  • WebRTC-based for low latency

An alternative voice provider with customizable voices.

Characteristics:

  • Good intelligence (7/10)
  • Fast response times (8/10)
  • Custom voice library
  • Adjustable temperature settings

To enable voice for an agent:

Edit your agent and scroll to the Voice Configuration section.

Toggle Enable Voice to activate voice capabilities.

Choose between:

  • OpenAI Realtime - Best for most use cases
  • Ultravox - For custom voice requirements

For OpenAI: Select from 11 built-in voices:

VoiceCharacter
AlloyNeutral, professional
AshClear, articulate
BalladWarm, narrative
CoralFriendly, approachable
EchoCalm, measured
FableEngaging, storytelling
NovaEnergetic, dynamic
OnyxDeep, authoritative
SageCalm, professional
ShimmerLight, cheerful
VerseDistinctive, clear

For Ultravox: Enter a voice ID or browse the Ultravox voice library to find additional options.

Adjust the temperature slider to control response variation:

  • Lower (0-0.5): More consistent, focused responses
  • Higher (1.5-2): More creative, varied responses

Optionally, provide a voice-specific system prompt. This prompt is used instead of your main agent prompt during voice conversations.

Why use a separate voice prompt?

  • Voice conversations are more conversational
  • Written responses may not sound natural when spoken
  • You may want different instructions for spoken interactions

If you leave this blank, your main agent prompt is used.

  1. Open a conversation with a voice-enabled agent
  2. Click the microphone button in the chat input
  3. Grant microphone permission when prompted
  4. Start speaking
  • A waveform visualization shows when audio is being processed
  • The agent’s responses are spoken aloud
  • Transcriptions appear in the chat for reference
  • You can use tools just like in text conversations

Click the stop button (square icon) to end the voice session. You can continue the conversation via text or start a new voice session.

Voice agents can use all the same tools as text agents. When a tool is called:

  1. The agent speaks that it’s performing an action
  2. The tool executes in the background
  3. The agent speaks the result

This creates a natural flow where the agent can look up information, perform calculations, or take actions while maintaining the conversation.

Voice conversations maintain full context:

  • Conversation history: Previous messages are remembered
  • Tool results: Outputs from tools are available for follow-up
  • Personas: Persona-based guidance works the same as text
  • Knowledge bases: RAG queries work during voice conversations

You can seamlessly switch between text and voice within the same conversation thread.

Write prompts that sound natural when spoken:

Instead of:

Provide a comprehensive response covering all aspects of the user’s query with appropriate detail and context.

Write:

Have a natural conversation. Keep responses concise and clear. Ask follow-up questions when helpful.

Voice users can’t see formatting, links, or code blocks. Adjust your instructions:

Instead of:

Format your response with bullet points and include relevant links.

Write:

Explain things step by step. Mention when you could share a link or document if they switch to text.

Users may interrupt or change topics mid-sentence. Include guidance like:

If the user changes topics, smoothly acknowledge and move to the new topic.

Spoken responses should be appropriately paced:

Keep responses brief enough to feel conversational. Pause between topics to give the user a chance to respond.

Here’s an example configuration for a customer service voice agent:

Voice Model: OpenAI Realtime Voice: Nova Voice Prompt:

You are a helpful customer service agent. Speak naturally and conversationally.

Keep responses concise—aim for 2-3 sentences when possible. If you need to explain something complex, break it into steps and check in with the customer between steps.

When looking up information, let the customer know what you're doing: "Let me check that for you."

If you can't help with something, clearly explain what they should do next.

Always confirm you've understood their question before answering if there's any ambiguity.
  • Check that Enable Voice is toggled on in agent settings
  • Verify your browser has microphone permissions
  • Ensure you have a stable internet connection
  • Try refreshing the page
  • Use a headset to prevent echo
  • Reduce background noise
  • Speak clearly at a moderate pace
  • Check your microphone settings
  • The agent may still be processing—wait a moment
  • If using tools, the agent might be waiting for results
  • Try ending and restarting the voice session
  • Speak more slowly and clearly
  • Reduce background noise
  • Unusual terms or names may need spelling out
FeatureDescription
ProvidersOpenAI Realtime and Ultravox
Voices11 OpenAI voices + custom Ultravox library
TemperatureAdjustable for Ultravox
Voice PromptSeparate prompt for voice conversations
Tool SupportFull tool access during voice
ContextMaintains conversation history
SwitchingSeamless text-to-voice transitions

Voice agents bring your AI assistants to life, enabling hands-free interactions that feel natural and engaging.