My Journey into Learning Audio ML

Deep-unlearning / Learning-Audio-ML

6 0 Language: null Updated: 10mo ago

README

My Journey into Learning Audio ML

About

This repository documents my journey into learning Audio ML, bridging the gap in my knowledge as I transition from a more general deep learning background. The exploration is not strictly linear and may branch into broader deep learning topics. It serves as a space for experimentation, implementation of research papers, writing blog posts, and creating tutorials on related subjects.

What am I currently working on

ASR:
- Fast-Conformer: Implementing Fast-Conformer paper (https://arxiv.org/pdf/2305.05084) and probably other variants and optimizations.
- Finetuning Script for Voxtral model
- [WIP] Study Slam
- [WIP] Blog post about Audio LMs
- [WIP] Blog post about distil-whisper + tutorial
TTS:
- [WIP] Blog Post of Speech LLMs: SpeechLLMs Playbook
- Finetuning Llasa: Blog Post Repo
- Finetuning Dia-TTS: Repo
Audio Codecs:
- Transformers integration of XCodec2: PR

Useful Ressources

List of Awesome AudioLM Datasets
Torchaudio tutorials: https://pytorch.org/audio/main/index.html
WAVLab Lectures on Speech Recognition and Understanding: https://www.youtube.com/@wavlab3016/videos
Training recipe for Speech LMs: https://github.com/slp-rl/slamkit
Conversational AI Reading Group: https://poonehmousavi.github.io/rg

Search

My Journey into Learning Audio ML

Report repository

README

My Journey into Learning Audio ML

About

What am I currently working on

Useful Ressources

Go to section

Search

My Journey into Learning Audio ML

README

My Journey into Learning Audio ML

About

What am I currently working on

Useful Ressources

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool

Choose listing type: