-
Notifications
You must be signed in to change notification settings - Fork 444
doc: update audio related tasks page. #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -18,7 +18,7 @@ The use of Multilingual ASR has become popular, the idea of maintaining just a s | |||
|
|||
## Inference | |||
|
|||
The Hub contains over [~9,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this: | |||
The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hub contains over [~17,000+ ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can use right away by trying out the widgets directly in the browser or calling the models as a service using Inference Endpoints. Here is a simple code snippet to do exactly this: | |
The Hub contains over [17,000 ASR models](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads) that you can test right away in your browser using the model page widgets. You can also use any model as a service using the Inference API. Here is a simple code snippet to do exactly this: |
I believe it's the Inference API, no? The paragraph after the snippet talks about "one-click managed inference", but I don't think this is shown here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is Serverless Inference API. And I agree the sentence afterwards "you can use libraries..." is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! and.. thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also change the paragraph after the code snippet please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Vaibhavs10! A few suggestions for the ASR task page:
- In the Inference section, before giving the Transformers code-snippet, should we also include instructions to pip install the library? Given the task page is targeted at beginners, IMO it's good to be more verbose here
pip install --upgrade transformers
- There's an unnecessary op that we do in the Transformers code-snippet, which we can remove:
from transformers import pipeline
- with open("sample.flac", "rb") as f:
- data = f.read()
pipe = pipeline("automatic-speech-recognition", "openai/whisper-large-v2")
pipe("sample.flac")
# {'text': "GOING ALONG SLUSHY COUNTRY ROADS AND SPEAKING TO DAMP AUDIENCES IN DRAUGHTY SCHOOL ROOMS DAY AFTER DAY FOR A FORTNIGHT HE'LL HAVE TO PUT IN AN APPEARANCE AT SOME PLACE OF WORSHIP ON SUNDAY MORNING AND HE CAN COME TO US IMMEDIATELY AFTERWARDS"}
- For the datasets, instead of replacing
librispeech_asr
withespnet/yodas
, should we replaceopenslr
instead?librispeech_asr
is quite well known among the ASR community, so it would be nice to keep it displayed there to make users aware it's present on the Hub (and can be used as an alternative to Kaldi LibriSpeech). Whereasopenslr
is not really used as a multilingual dataset anymore andespnet/yodas
can be viewed as a replacement - For the models, should we replace
facebook/s2t-small-mustc-en-fr-st
withdistil-whisper/distil-large-v3
? The former is again quite outdated (most users would be better off usingopenai/whisper-large-v3
which we already promote as the first model on the list)
Nice @sanchit-gandhi - addressed your review - Instead of distil-whisper I chose seamless-m4t just for diversity reasons. lmk if that works. |
cc: @sanchit-gandhi @ylacombe