WaveNet | AI Model

Overview

WaveNet is a neural, sample-level generative model for raw audio. It uses dilated causal convolutions to predict the next sample distribution, yielding highly natural speech and expressive prosody, although generation is expensive without acceleration.

Description

WaveNet treats a waveform as a sequence and learns p(x) as a product of conditional distributions over audio samples. A stack of dilated, causal convolutional layers with gated activations creates a very large receptive field without recurrence, so the model captures long-range dependencies in pitch, formants, and rhythm. Training uses teacher forcing with discrete mu-law or continuous likelihoods, and conditioning vectors provide linguistic features, F0, and speaker identity for TTS. At inference the network samples one value at a time, which is accurate but slow, leading to later distillation and parallel variants such as Parallel WaveNet and WaveRNN for real-time use. Compared with HMM TTS, WaveNet produces far more natural timbre and prosody, handles coarticulation gracefully, and adapts well to different speakers and styles. It also generalizes to music and sound effects because the architecture models raw audio directly rather than vocoder parameters.

About DeepMind

DeepMind is a technology company that specializes in artificial intelligence and machine learning.

Industry: Research Services

Company Size: 501-1000

Location: London, GB

Website: deepmind.com

View Company Profile

Related Models

Last updated: October 8, 2025

Overview

Description

About DeepMind

Related Models

Gemma 3n

Veo 3.1 Fast

Doubao Realtime Voice

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool