Skip to content

Add support for EDS-NLP library #696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 17, 2024

Conversation

percevalw
Copy link
Contributor

@percevalw percevalw commented May 22, 2024

Hi,

This PR adds support for the edsnlp library (related to huggingface/api-inference-community#424), an NLP library developed at the Greater Paris University Hospitals (AP-HP) with a specific focus on hybrid models (rule-based + ML) and multi-tasking, widely used in the structuring of clinical language at scale (NER, entity linking, document classification, event extraction, ...).
I saw that the file https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/library-to-tasks.ts was automatically generated from a GitHub CI as mentioned on the docs, but didn't see how the action would edit it so I took the liberty of modifying it myself.
Let me know if you need more information or if I need to change anything about this PR, thanks !

@Wauplin
Copy link
Contributor

Wauplin commented May 22, 2024

Hi @percevalw! Nice to see you in a Github PR! 🤗

There are actually several definitions to "add support for xxx library" on the Hub, which correspond to your 2 PRs:

  1. Have your library recognized officially on the Hub (i.e. what this PR do). It enables the download count, code snippets, automatic tagging, a link to the lib' repo, better discoverability of the models, etc. This is open to "any" community library without much supervision on our side.
  2. Make your library compatible with the serverless inference API. It allows users to test models directly from the model page and we (as HF) run the model on our infra. This is trickier and mostly reserved to a subset of very popular libraries. Since it means running external code on our infra, we first need to make sure we want to do it (security-wise, compute-wise and maintenance-wise). In this regard, I don't think we'll support EDS-NLP in the community inference API in the near future. cc @osanseviero please confirm.

Note that regardless of these integrations, it is already possible to upload your models to the Hub, adapt your library to upload to/download from the Hub, adapt your library to tag the uploaded models, enable download counts with a config.json file, etc.

@Wauplin
Copy link
Contributor

Wauplin commented May 22, 2024

About this PR itself (and the dummy NER repo), do you think it'll be possible to move the artifacts/ folder at the root of the repo? This way all EDS-NLP repos will have the same structure which usually makes it easier to retrieve files programmatically. With the current version, you have to know the model_id (dummy-ner) to know in which subfolder the artifacts are located. Also having config.cfg at the root of the repo would make it easier for users to look into the config of the model.

Those are only recommendations, nothing wrong per-se with the current structure.

@percevalw percevalw force-pushed the add-library-edsnlp branch from 732cd1f to 1bc8319 Compare May 23, 2024 12:18
@percevalw
Copy link
Contributor Author

Thank you for the thorough review!

About this PR itself (and the dummy NER repo), do you think it'll be possible to move the artifacts/ folder at the root of the repo? ... Also having config.cfg at the root of the repo would make it easier for users to look into the config of the model.

This structure makes it easier for us to install the model but indeed also makes it more difficult to download the files in a standardised way. I'll try to look into this again.

This is trickier and mostly reserved to a subset of very popular libraries. Since it means running external code on our infra, we first need to make sure we want to do it (security-wise, compute-wise and maintenance-wise). In this regard, I don't think we'll support EDS-NLP in the community inference API in the near future.

Fair enough, I'll close the other PR and check back when we're amongst the big ones :)

@percevalw percevalw force-pushed the add-library-edsnlp branch from 6f43bb1 to 25f3c74 Compare June 13, 2024 18:11
@percevalw percevalw requested a review from pcuenca as a code owner June 13, 2024 18:11
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on it @percevalw! Looks good to me now :) I left a minor comment as you can provide a documentation url as well.

@percevalw
Copy link
Contributor Author

Thank you! The suggestion to move the artifacts folder to the root of the repo was very relevant, and I was able to proceed in that direction. That being said, adding the library to huggingface.js should not be impacted by the architecture of the models. In particular, it should not affect the download count (tracked by */config.cfg since there should be exactly 1 config.cfg file per model) or the installation snippets. So, everything looks good to me!

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to know! If it ever change in the future, you can always come back for a second PR in huggingface.js anyway. But agree with you that wildcard: { path: "*/config.cfg" }, will work in any case (EDIT: except if config.cfg is moved at the root of the repo at some point)

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Wauplin Wauplin merged commit 1f4eb84 into huggingface:main Jun 17, 2024
4 checks passed
@Wauplin
Copy link
Contributor

Wauplin commented Jun 17, 2024

Merged! :)

@julien-c
Copy link
Member

very cool – let's make sure future models like https://huggingface.co/AP-HP/eds-pseudo-public are as visible as possible 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants