Skip to content

Commit e38b705

Browse files
Vaibhavs10pcuenca
andauthored
doc: update audio related tasks page. (#721)
--------- Co-authored-by: Pedro Cuenca <[email protected]>
1 parent d54d39f commit e38b705

File tree

4 files changed

+19
-21
lines changed

4 files changed

+19
-21
lines changed

packages/tasks/src/tasks/automatic-speech-recognition/about.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The use of Multilingual ASR has become popular, the idea of maintaining just a s
1818

1919
## Inference
2020

21-
The Hub contains over [~9,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
21+
The Hub contains over [17,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can test right away in your browser using the model page widgets. You can also use any model as a service using the Inference API. Here is a simple code snippet to do exactly this:
2222

2323
```python
2424
import json
@@ -39,12 +39,12 @@ data = query("sample1.flac")
3939
You can also use libraries such as [transformers](https://huggingface.co/models?library=transformers&pipeline_tag=automatic-speech-recognition&sort=downloads), [speechbrain](https://huggingface.co/models?library=speechbrain&pipeline_tag=automatic-speech-recognition&sort=downloads), [NeMo](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=nemo&sort=downloads) and [espnet](https://huggingface.co/models?library=espnet&pipeline_tag=automatic-speech-recognition&sort=downloads) if you want one-click managed Inference without any hassle.
4040

4141
```python
42+
# pip install --upgrade transformers
43+
4244
from transformers import pipeline
4345

44-
with open("sample.flac", "rb") as f:
45-
data = f.read()
46+
pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v3")
4647

47-
pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2")
4848
pipe("sample.flac")
4949
# {'text': "GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHOOL ROOMS DAY AFTER DAY FOR A FORTNIGHT HE'LL HAVE TO PUT IN AN APPEARANCE AT SOME PLACE OF WORSHIP ON SUNDAY MORNING AND HE CAN COME TO US IMMEDIATELY AFTERWARDS"}
5050
```
@@ -57,7 +57,7 @@ import { HfInference } from "@huggingface/inference";
5757
const inference = new HfInference(HF_TOKEN);
5858
await inference.automaticSpeechRecognition({
5959
data: await (await fetch("sample.flac")).blob(),
60-
model: "openai/whisper-large-v2",
60+
model: "openai/whisper-large-v3",
6161
});
6262
```
6363

packages/tasks/src/tasks/automatic-speech-recognition/data.ts

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,16 @@ import type { TaskDataCustom } from "..";
33
const taskData: TaskDataCustom = {
44
datasets: [
55
{
6-
description: "18,000 hours of multilingual audio-text dataset in 108 languages.",
7-
id: "mozilla-foundation/common_voice_13_0",
6+
description: "31,175 hours of multilingual audio-text dataset in 108 languages.",
7+
id: "mozilla-foundation/common_voice_17_0",
88
},
99
{
1010
description: "An English dataset with 1,000 hours of data.",
1111
id: "librispeech_asr",
1212
},
1313
{
14-
description: "High quality, multi-speaker audio data and their transcriptions in various languages.",
15-
id: "openslr",
14+
description: "A multi-lingual audio dataset with 370K hours of audio.",
15+
id: "espnet/yodas",
1616
},
1717
],
1818
demo: {
@@ -47,12 +47,12 @@ const taskData: TaskDataCustom = {
4747
id: "openai/whisper-large-v3",
4848
},
4949
{
50-
description: "A good generic ASR model by MetaAI.",
51-
id: "facebook/wav2vec2-base-960h",
50+
description: "A good generic speech model by MetaAI for fine-tuning.",
51+
id: "facebook/w2v-bert-2.0",
5252
},
5353
{
5454
description: "An end-to-end model that performs ASR and Speech Translation by MetaAI.",
55-
id: "facebook/s2t-small-mustc-en-fr-st",
55+
id: "facebook/seamless-m4t-v2-large",
5656
},
5757
],
5858
spaces: [

packages/tasks/src/tasks/text-to-speech/about.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ await inference.textToSpeech({
5858

5959
- [Hugging Face Audio Course](https://huggingface.co/learn/audio-course/chapter6/introduction)
6060
- [ML for Audio Study Group - Text to Speech Deep Dive](https://www.youtube.com/watch?v=aLBedWj-5CQ)
61-
- [An introduction to SpeechT5, a multi-purpose speech recognition and synthesis model](https://huggingface.co/blog/speecht5).
62-
- [A guide on Fine-tuning Whisper For Multilingual ASR with 🤗Transformers](https://huggingface.co/blog/fine-tune-whisper)
6361
- [Speech Synthesis, Recognition, and More With SpeechT5](https://huggingface.co/blog/speecht5)
6462
- [Optimizing a Text-To-Speech model using 🤗 Transformers](https://huggingface.co/blog/optimizing-bark)
65-
-
63+
- [Train your own TTS models with Parler-TTS](https://github.com/huggingface/parler-tts)

packages/tasks/src/tasks/text-to-speech/data.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ const taskData: TaskDataCustom = {
44
canonicalId: "text-to-audio",
55
datasets: [
66
{
7-
description: "Thousands of short audio clips of a single speaker.",
8-
id: "lj_speech",
7+
description: "10K hours of multi-speaker English dataset.",
8+
id: "parler-tts/mls_eng_10k",
99
},
1010
{
1111
description: "Multi-speaker English dataset.",
@@ -43,8 +43,8 @@ const taskData: TaskDataCustom = {
4343
id: "facebook/mms-tts",
4444
},
4545
{
46-
description: "An end-to-end speech synthesis model.",
47-
id: "microsoft/speecht5_tts",
46+
description: "A prompt based, powerful TTS model.",
47+
id: "parler-tts/parler_tts_mini_v0.1",
4848
},
4949
],
5050
spaces: [
@@ -57,8 +57,8 @@ const taskData: TaskDataCustom = {
5757
id: "coqui/xtts",
5858
},
5959
{
60-
description: "An application that synthesizes speech for various speaker types.",
61-
id: "Matthijs/speecht5-tts-demo",
60+
description: "An application that synthesizes speech for diverse speaker prompts.",
61+
id: "parler-tts/parler_tts_mini",
6262
},
6363
],
6464
summary:

0 commit comments

Comments
 (0)