Uzbekvoice.ai - An initiative to teach Uzbek speech to computers

Stages

Hackathon

Roadmap Guide Dataset

Other

Resources

Discover what matters to you

NVIDIA NeMo™

NVIDIA NeMo™ is an open-source toolkit for researchers developing state-of-the-art conversational AI models and working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models) and make it easier to create new conversational AI models.

NVIDIA NeMo™

DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real-time on devices ranging from a Raspberry Pi 4 to high-power GPU servers. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.

DeepSpeech

Coqui

Coqui is dedicated to open speech technology. Their projects include deep learning-based STT and TTS engines. With text-to-speech, experience the immediacy of script-to-performance. Cast from a wide selection of high-quality, directable, emotive voices or clone a voice to suit your needs. With Coqui text-to-speech production times go from months to minutes. Training and deploying STT models has never been so easy.

Coqui

Community Playbook

Common Voice is a publicly available voice dataset, powered by the voices of volunteer contributors around the world. People who want to build voice applications can use the dataset to train machine learning models. Common Voice has a variety of communities that support the project in different important areas, they are usually grouped by language. Find helpful guidance on the entire Common Voice journey, from localization to dataset usage, as well as how to connect with our community.

Community Playbook

Why Common Voice?

At present, most voice datasets are owned by companies, which stifles innovation. Voice datasets also underrepresent non-English speakers. This means that voice-enabled technology doesn’t work at all for many languages, and where it does work, it may not perform equally well for everyone. We want to change that by mobilizing people everywhere to share their voices.