WaveNet

WaveNet

WaveNet treats a waveform as a sequence and learns p(x) as a product of conditional distributions over audio samples. A stack of dilated, causal convolutional layers with gated activations creates a very large receptive field without recurrence, so the model captures long-range dependencies in pitch, formants, and rhythm. Training uses teacher forcing with discrete mu-law or continuous likelihoods, and conditioning vectors provide linguistic features, F0, and speaker identity for TTS. At inference the network samples one value at a time, which is accurate but slow, leading to later distillation and parallel variants such as Parallel WaveNet and WaveRNN for real-time use. Compared with HMM TTS, WaveNet produces far more natural timbre and prosody, handles coarticulation gracefully, and adapts well to different speakers and styles. It also generalizes to music and sound effects because the architecture models raw audio directly rather than vocoder parameters.

Overview

WaveNet is a neural, sample-level generative model for raw audio. It uses dilated causal convolutions to predict the next sample distribution, yielding highly natural speech and expressive prosody, although generation is expensive without acceleration.

About Google

At Google, we think that AI can meaningfully improve people's lives and that the biggest impact will come when everyone can access it.

Industry: Technology, Information and Internet

Company Size: 182.000-190.000

Location: Mountain View, CA, US

Website: ai.google

View Company Profile

Tools using WaveNet

No tools found for this model yet.

Last updated: February 18, 2026

Search

Overview

About Google

Tools using WaveNet

Related Models

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: