doc: update audio related tasks page. #721

Vaibhavs10 · 2024-05-29T16:07:23Z

pcuenca · 2024-05-29T18:53:26Z

packages/tasks/src/tasks/automatic-speech-recognition/about.md

@@ -18,7 +18,7 @@ The use of Multilingual ASR has become popular, the idea of maintaining just a s

 ## Inference

-The Hub contains over [~9,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
+The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:


Suggested change

The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:

The Hub contains over [17,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can test right away in your browser using the model page widgets. You can also use any model as a service using the Inference API. Here is a simple code snippet to do exactly this:

I believe it's the Inference API, no? The paragraph after the snippet talks about "one-click managed inference", but I don't think this is shown here.

Yes, this is Serverless Inference API. And I agree the sentence afterwards "you can use libraries..." is misleading.

Makes sense! and.. thanks for the suggestion!

Can you also change the paragraph after the code snippet please?

sanchit-gandhi

Thanks for the updates @Vaibhavs10! A few suggestions for the ASR task page:

In the Inference section, before giving the Transformers code-snippet, should we also include instructions to pip install the library? Given the task page is targeted at beginners, IMO it's good to be more verbose here

pip install --upgrade transformers

There's an unnecessary op that we do in the Transformers code-snippet, which we can remove:

from transformers import pipeline

- with open("sample.flac", "rb") as f:
-  data = f.read()

pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2")
pipe("sample.flac")
# {'text': "GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHOOL ROOMS DAY AFTER DAY FOR A FORTNIGHT HE'LL HAVE TO PUT IN AN APPEARANCE AT SOME PLACE OF WORSHIP ON SUNDAY MORNING AND HE CAN COME TO US IMMEDIATELY AFTERWARDS"}

For the datasets, instead of replacing librispeech_asr with espnet/yodas, should we replace openslr instead? librispeech_asr is quite well known among the ASR community, so it would be nice to keep it displayed there to make users aware it's present on the Hub (and can be used as an alternative to Kaldi LibriSpeech). Whereas openslr is not really used as a multilingual dataset anymore and espnet/yodas can be viewed as a replacement
For the models, should we replace facebook/s2t-small-mustc-en-fr-st with distil-whisper/distil-large-v3? The former is again quite outdated (most users would be better off using openai/whisper-large-v3 which we already promote as the first model on the list)

Co-authored-by: Pedro Cuenca <[email protected]>

Vaibhavs10 · 2024-05-30T10:47:35Z

Nice @sanchit-gandhi - addressed your review - Instead of distil-whisper I chose seamless-m4t just for diversity reasons. lmk if that works.

packages/tasks/src/tasks/text-to-speech/about.md

packages/tasks/src/tasks/text-to-speech/data.ts

doc: update audio related tasks page.

541a793

Vaibhavs10 requested review from osanseviero, SBrandeis, gary149, Wauplin, julien-c and pcuenca as code owners May 29, 2024 16:07

pcuenca reviewed May 29, 2024

View reviewed changes

sanchit-gandhi approved these changes May 30, 2024

View reviewed changes

Vaibhavs10 and others added 3 commits May 30, 2024 11:20

Update packages/tasks/src/tasks/automatic-speech-recognition/about.md

e8ba9dc

Co-authored-by: Pedro Cuenca <[email protected]>

apply suggestions from review.

db74d7d

Merge branch 'main' into update-audio-tasks

3b5af01

ylacombe reviewed May 30, 2024

View reviewed changes

packages/tasks/src/tasks/text-to-speech/about.md Show resolved Hide resolved

ylacombe reviewed May 30, 2024

View reviewed changes

packages/tasks/src/tasks/text-to-speech/about.md Outdated Show resolved Hide resolved

ylacombe reviewed May 30, 2024

View reviewed changes

packages/tasks/src/tasks/text-to-speech/data.ts Show resolved Hide resolved

ylacombe approved these changes May 30, 2024

View reviewed changes

apply suggestions from review.

0eacdab

Vaibhavs10 merged commit e38b705 into main May 30, 2024
4 checks passed

Vaibhavs10 deleted the update-audio-tasks branch May 30, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

doc: update audio related tasks page. #721

doc: update audio related tasks page. #721

Uh oh!

Vaibhavs10 commented May 29, 2024

Uh oh!

pcuenca May 29, 2024

Uh oh!

osanseviero May 30, 2024

Uh oh!

Vaibhavs10 May 30, 2024

Uh oh!

osanseviero May 30, 2024

Uh oh!

sanchit-gandhi left a comment

Uh oh!

Vaibhavs10 commented May 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
	The Hub contains over [17,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can test right away in your browser using the model page widgets. You can also use any model as a service using the Inference API. Here is a simple code snippet to do exactly this:

doc: update audio related tasks page. #721

doc: update audio related tasks page. #721

Uh oh!

Conversation

Vaibhavs10 commented May 29, 2024

Uh oh!

pcuenca May 29, 2024

Choose a reason for hiding this comment

Uh oh!

osanseviero May 30, 2024

Choose a reason for hiding this comment

Uh oh!

Vaibhavs10 May 30, 2024

Choose a reason for hiding this comment

Uh oh!

osanseviero May 30, 2024

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi left a comment

Choose a reason for hiding this comment

Uh oh!

Vaibhavs10 commented May 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!