-
-
Notifications
You must be signed in to change notification settings - Fork 49
feat🚀: Add the ability to download from HF by repo_id #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat🚀: Add the ability to download from HF by repo_id #65
Conversation
the `clip` model object in python_bindings now can take `model_path_or_repo_id` paramter to download a model from HugeFare by repo_id.⚠️ breaking changes: - add `huggingface_hub` dependency to support downloading models from HF by repo_id - The `model_path` parameter was renamed to `model_path_or_repo_id` for the adding support downloading by repo id - you can pass `model_file` if you pass a **HF repo_id** that has more than `.bin` file to specify the exact model file to download from that repo - If `model_path_or_repo_id` is a HF repo id and `model_file` is not specified, it will download the default model file (usually the file with smallest name ending with `.bin`) 📝file changed: - python_bindings/clip_cpp/clip.py - python_bindings/example_main.py - pyproject.toml -> add huggingface_hub dependency - update the python_bindings/README.md - update the ./README.md
please, test it before you merge, for some reason I can't get it to be built correctly on my wsl keep getting this #55 but Pypi install work fine, can you tell me your workflow of building the python binding package? |
Thanks for taking this! In fact, we don't need huggingface_hub as a dependency. We can resolve actual file URLs by replacing For example, this url is for browsing the file page and clicking that takes you to the that page on your browser: https://huggingface.co/Green-Sky/ggml_laion_clip-vit-b-32-laion2b-s34b-b79k/blob/main/laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin However, once you replace So if we know And we can use urllib.request.urlretrieve to download with a progress bar as in the following example: import urllib.request
import sys
def download_with_progress_bar(url, destination_path):
def reporthook(count, block_size, total_size):
# Calculate the progress
progress = count * block_size / total_size
progress_percent = int(progress * 100)
# Create a simple progress bar
bar_length = 50
progress_bar = '=' * int(progress * bar_length)
spaces = ' ' * (bar_length - len(progress_bar))
sys.stdout.write(f"\r[{progress_bar}{spaces}] {progress_percent}%")
sys.stdout.flush()
try:
urllib.request.urlretrieve(url, destination_path, reporthook=reporthook)
sys.stdout.write("\n") # Move to the next line after download is complete
print(f"File downloaded to {destination_path}")
return True
except Exception as e:
print(f"\nError downloading file: {e}")
return False
if __name__ == "__main__":
destination_path = 'laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin'
download_with_progress_bar(url, destination_path) |
Here's the workflow to build the shared lib and the Python package end-to-end. Of course better to have a build-python.sh file for it. You can save it in the repo main: #/bin/bash
rm -rf ./build
mkdir build
cd build
cmake -DBUILD_SHARED_LIBS=ON -DCLIP_NATIVE=OFF ..
make
cp ./libclip.so ../examples/python_bindings/clip_cpp/
cp ./ggml/src/libggml.so ../examples/python_bindings/clip_cpp/
cd ../examples/python_bindings
poetry build |
With #66, now you can simply run |
Interesting, i will try to work with it and edit the PR today, I should handle checking if the file already exists in the files.
This is nice, it should be much easier to build the binding now thanks |
I already finished implementing the download functionality with the ability to check if the file exists and match the size of that in the network and proper error handling just like the huggingface_api, but one last thing is missing if the user just provides the example: model = Clip(
model_path_or_repo_id=repo_id,
##⚠️⚠️ Here I was able with huggingface lib to check for all available files
## and choose the smallest .bin file
model_file=None,
verbosity=2
) maybe there a request you know I can make to list the files in the hugging face repo and ideally their sizes, |
Great! Some digging into the code revealed the endpoint we can use to retrieve the metadata. Hm, you are using For example, below is the response to https://huggingface.co/api/models/Green-Sky/ggml_laion_clip-vit-b-32-laion2b-s34b-b79k?blobs=true We can parse that JSON with standard {
"_id": "6491777a87ae7236b212148f",
"id": "Green-Sky/ggml_laion_clip-vit-b-32-laion2b-s34b-b79k",
"modelId": "Green-Sky/ggml_laion_clip-vit-b-32-laion2b-s34b-b79k",
"author": "Green-Sky",
"sha": "02595089876c11995d92348a84466d26f6038b52",
"lastModified": "2023-06-25T16:04:20.000Z",
"private": false,
"disabled": false,
"gated": false,
"tags": [
"clip",
"vision",
"ggml",
"clip.cpp",
"license:mit",
"region:us"
],
"downloads": 0,
"likes": 2,
"model-index": null,
"config": {
},
"cardData": {
"license": "mit",
"tags": [
"clip",
"vision",
"ggml",
"clip.cpp"
]
},
"spaces": [],
"siblings": [
{
"rfilename": ".gitattributes",
"blobId": "a6344aac8c09253b3b630fb776ae94478aa0275b",
"size": 1519
},
{
"rfilename": "README.md",
"blobId": "6a67f6909611556bb67735869b6812e3d3731c29",
"size": 391
},
{
"rfilename": "laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin",
"blobId": "e9bcf0bc818e3588d507e0439cf30b0b05d237f1",
"size": 303606311,
"lfs": {
"sha256": "a916d7b54205f5237fe26361f5f259e2d0eab0cb609436d9f9f52a786553c0c5",
"size": 303606311,
"pointerSize": 134
}
},
{
"rfilename": "laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_0.bin",
"blobId": "132a765c5dbc5367a3c336b619741fecd358ded6",
"size": 89830695,
"lfs": {
"sha256": "47cad6cbbe4c311ecd5ad3c32915b0838ce395c41e68b479919d313605c375da",
"size": 89830695,
"pointerSize": 133
}
},
{
"rfilename": "laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.q4_1.bin",
"blobId": "14419b7636fb441cdb53fea95e9d61fbb5728352",
"size": 99125287,
"lfs": {
"sha256": "acf3c61c06c16630369acc31387b27096aa85780861a6ed5cb675e7284ea2a94",
"size": 99125287,
"pointerSize": 133
}
}
]
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good job. Bump the version in pyproject.toml and it's good to merge.
…f-Dawoad/clip.cpp into add_colab_notebook_example
I think it ready, I will try to write tests and maybe prepare to automated github actions. |
the
clip
model object in python_bindings now can takemodel_path_or_repo_id
paramter to download a model from Hugging Face by repo_id.huggingface_hub
dependency to support downloading models from HF by repo_idmodel_path
parameter was renamed tomodel_path_or_repo_id
for the adding support downloading by repo idmodel_file
if you pass a HF repo_id that has more than.bin
files to specify the exact model file to download from that repomodel_path_or_repo_id
is a HF repo id andmodel_file
is not specified,it will download the default model file (usually the file with smallest name ending with
.bin
)📝file changed: