LongCat AudioDiT 3.5B
Overview
LongCat-AudioDiT-3.5B is Meituan LongCat’s diffusion-based text-to-speech model built directly in waveform latent space rather than mel-spectrogram space. It is designed for high-fidelity speech generation and zero-shot voice cloning, supports Chinese and English, and is positioned as a top-performing open model on the Seed benchmark for speaker similarity and intelligibility.
About Meituan
Meituan is a technology-driven retail company based in Beijing, founded in March 2010. It operates a platform that digitises local goods and services—from food delivery to travel bookings—with the mission “We help people eat better, live better.
Tools using LongCat AudioDiT 3.5B
No tools found for this model yet.
KiloClaw - Managed 🦀 