TAAFT
Free mode
100% free
Freemium
Free Trial
Deals

WaveNet

By Google
WaveNet treats a waveform as a sequence and learns p(x) as a product of conditional distributions over audio samples. A stack of dilated, causal convolutional layers with gated activations creates a very large receptive field without recurrence, so the model captures long-range dependencies in pitch, formants, and rhythm. Training uses teacher forcing with discrete mu-law or continuous likelihoods, and conditioning vectors provide linguistic features, F0, and speaker identity for TTS. At inference the network samples one value at a time, which is accurate but slow, leading to later distillation and parallel variants such as Parallel WaveNet and WaveRNN for real-time use. Compared with HMM TTS, WaveNet produces far more natural timbre and prosody, handles coarticulation gracefully, and adapts well to different speakers and styles. It also generalizes to music and sound effects because the architecture models raw audio directly rather than vocoder parameters.
New Audio Gen 4
Released: September 8, 2016

Overview

WaveNet is a neural, sample-level generative model for raw audio. It uses dilated causal convolutions to predict the next sample distribution, yielding highly natural speech and expressive prosody, although generation is expensive without acceleration.

About Google

At Google, we think that AI can meaningfully improve people's lives and that the biggest impact will come when everyone can access it.

Industry: Technology, Information and Internet
Company Size: 182.000-190.000
Location: Mountain View, CA, US
Website: ai.google
View Company Profile

Tools using WaveNet

No tools found for this model yet.

Last updated: February 18, 2026
0 AIs selected
Clear selection
#
Name
Task