2023-01-27
Music labeling dataset with aspects and moods.
Generated by ChatGPT

MusicCaps is a dataset of 5,521 music clips of 10 seconds each, labeled with an aspect list and a free-text caption written by musicians. An aspect list is a list of adjectives that describe how the music sounds, such as “pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead”.

The free-text caption is a description of how the music sounds, including details like instruments and mood. MusicCaps is sourced from the AudioSet dataset and is divided into an eval and train split.

The dataset is licensed with a Creative Commons BY-SA 4.0 license. Each clip is labeled with metadata such as YT ID (pointing to the YouTube video in which the labeled music segment appears), start and end position in the video, labels from the AudioSet dataset, aspect list, caption, author ID (for grouping samples by who wrote them), is balanced subset, and is AudioSet eval split.

The dataset is intended to be used for music description tasks.


MusicLM by Google was manually vetted by our editorial team and was first featured on January 28th 2023.
Pros and Cons


Music labeling dataset
Aspects and moods identified
Descriptive free-text caption
Labeled by musicians
Eval and train split
Licensed under Creative Commons
YouTube video metadata
Dataset for describing music
High quality musical clips
Comprehensive aspect list
Designed for Music description tasks
Includes caption author ID
Balanced subset
Dataset linked to YouTube
Well-structured music captions
Points to start and end in video
Includes is AudioSet eval split


Limited to 10 second clips
Low quality sound
No active events
Only divisions are 'eval' and 'train'
Requires manual noise filtering
Limited music genres
Depends on Youtube availability
Only English labels
No waveform data
No ongoing support (never updated)


