Definition
The artificial production of human speech from written text.
Detailed Explanation
Speech synthesis, or text-to-speech (TTS), converts written text into spoken words using either concatenative synthesis (joining pre-recorded speech units) or neural synthesis (using deep learning to generate speech waveforms directly). Modern systems use sequence-to-sequence models with attention mechanisms for more natural prosody and inflection, enabling natural-sounding speech with appropriate prosody and intonation.
Use Cases
Screen readers, virtual assistants, navigation systems, audiobook production