Skip to content

doc: update audio related tasks page. #721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 30, 2024
Merged

doc: update audio related tasks page. #721

merged 5 commits into from
May 30, 2024

Conversation

Vaibhavs10
Copy link
Member

@@ -18,7 +18,7 @@ The use of Multilingual ASR has become popular, the idea of maintaining just a s

## Inference

The Hub contains over [~9,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this:
The Hub contains over [17,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can test right away in your browser using the model page widgets. You can also use any model as a service using the Inference API. Here is a simple code snippet to do exactly this:

I believe it's the Inference API, no? The paragraph after the snippet talks about "one-click managed inference", but I don't think this is shown here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is Serverless Inference API. And I agree the sentence afterwards "you can use libraries..." is misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! and.. thanks for the suggestion!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also change the paragraph after the code snippet please?

Copy link

@sanchit-gandhi sanchit-gandhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Vaibhavs10! A few suggestions for the ASR task page:

  1. In the Inference section, before giving the Transformers code-snippet, should we also include instructions to pip install the library? Given the task page is targeted at beginners, IMO it's good to be more verbose here
pip install --upgrade transformers
  1. There's an unnecessary op that we do in the Transformers code-snippet, which we can remove:
from transformers import pipeline

- with open("sample.flac", "rb") as f:
-  data = f.read()

pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2")
pipe("sample.flac")
# {'text': "GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHOOL ROOMS DAY AFTER DAY FOR A FORTNIGHT HE'LL HAVE TO PUT IN AN APPEARANCE AT SOME PLACE OF WORSHIP ON SUNDAY MORNING AND HE CAN COME TO US IMMEDIATELY AFTERWARDS"}
  1. For the datasets, instead of replacing librispeech_asr with espnet/yodas, should we replace openslr instead? librispeech_asr is quite well known among the ASR community, so it would be nice to keep it displayed there to make users aware it's present on the Hub (and can be used as an alternative to Kaldi LibriSpeech). Whereas openslr is not really used as a multilingual dataset anymore and espnet/yodas can be viewed as a replacement
  2. For the models, should we replace facebook/s2t-small-mustc-en-fr-st with distil-whisper/distil-large-v3? The former is again quite outdated (most users would be better off using openai/whisper-large-v3 which we already promote as the first model on the list)

@Vaibhavs10
Copy link
Member Author

Nice @sanchit-gandhi - addressed your review - Instead of distil-whisper I chose seamless-m4t just for diversity reasons. lmk if that works.

@Vaibhavs10 Vaibhavs10 merged commit e38b705 into main May 30, 2024
4 checks passed
@Vaibhavs10 Vaibhavs10 deleted the update-audio-tasks branch May 30, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants