Product
Common Voice dataset
IT InfrastructureMachine Learning Data Catalog
The largest open-source voice dataset in the world.
☆☆☆☆☆ 0.0 Based on 0 Reviews
Common Voice dataset
Learn More
About the Common Voice dataset
The Common Voice dataset is a massive collection of transcribed speech data collected from volunteers globally. The dataset includes a unique MP3 and corresponding text file for each voice clip. It also includes demographic metadata like age, gender, and accent, which helps improve the accuracy and inclusivity of speech recognition engines. The project's goal is to create an open and diverse dataset that mitigates biases in AI and democratizes voice technology.