Million Song Dataset

The Million Song Dataset is a freely available collection of audio features and metadata for one million contemporary popular music tracks. Its goals are: - To encourage research on algorithms that scale to commercial sizes. - To provide a benchmark dataset for evaluating research. - To serve as a shortcut alternative to building a large dataset with APIs (e.g., The Echo Nest) - To help new researchers get started in the MIR field. The core of the dataset is the feature analysis and metadata of one million songs, provided by The Echo Nest. The dataset does not include any audio, only the derived features. Note, however, that audio sample can be obtained from services like 7digital, using the code we provide. The Million Song Dataset is also a cluster of complementary datasets contributed by the community: - SecondHandSongs dataset -> cover songs - musiXmatch dataset -> lyrics - Last.fm dataset -> tags and similarity at the song level - Taste Profile subset -> user data - thisismyjam-to-MSD mapping -> more user data - tagtraum genre annotations -> genre labels - MAGD main datasets -> more genre labels The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. It was supported in part by the NSF.

Organization

Million Song Dataset

Temporal coverage

Not provided

Spatial coverage

Data
Usage guide

® 2025 Data Basis

Terms of Use

Privacy Policy

Contact