TAAFT
Free mode
100% free
Freemium
Free Trial
Deals
March 30, 2026
Use tool
Inputs:
APITextAudio
Outputs:
AudioAPIText
Speech-native voice AI agents
Ultravox.ai website
Featured alternatives Vozexo - Dental Answering Service | 24/7 AI for Dentists Vozexo - Dental Answering Service | 24/7 AI for Dentists
14,809
CloudTalk | AI Voice Agents CloudTalk | AI Voice Agents
10,098
KaiCalls KaiCalls
380
WidgetVox WidgetVox
221
Dialora.ai Dialora.ai
4,026
Mumble AI: Voice-first workspace for Mac & iOS Mumble AI: Voice-first workspace for Mac & iOS
96,624
Notis Notis
1,187,342

Overview

Ultravox is a real-time voice AI platform for building fast, fluent conversational experiences.

With developer-friendly APIs, agentic-ready primitives, and a speech-native model, Ultravox makes it easy to build voice agents that follow instructions reliably, interact with third-party systems effectively, and communicate naturally.

Legacy voice AI systems orchestrate a series of independent component services that form a connected pipeline, which is subject to unpredictable latency. Ultravox controls and manages our entire inference stack and infrastructure, so we can guarantee reliability and availability at scale.

Each voice AI call through Ultravox Realtime is assigned dedicated GPU resources for the entire lifespan of the call, ensuring a consistent low-latency experience regardless of demand on our system, even for users with thousands of concurrent calls.


Join thousands of teams building natural, conversational voice AI agents with Ultravox.

Supported features

Show more

Releases

Get notified when a new version of Ultravox.ai is released
Ultravox.ai icon
Ultravox.ai v0.7
Mar 30, 2026
In the v0.7 series, the Ultravox model is trained on GLM 4.6, taking the lead on audio reasoning tasks over closed source models like gpt4o-audio, while retaining advantages in speech understanding from previous versions.


Ultravox is a multimodal model that can consume both speech and text as input (e.g., a text system prompt and voice user message). The input to the model is given as a text prompt with a special pseudo-token, and the model processor will replace this magic token with embeddings derived from the input audio. Using the merged embeddings as input, the model will then generate output text as usual.
2 0
Author

Pricing

Pricing model
Free Trial
Paid options from
$0.07/unit
Billing frequency
Pay-as-you-go
Keeping you safe
Good to know
Save

Other tools by Fixie.ai

#437 0 0
0 AIs selected
Clear selection
#
Name
Task