Skip to content

Commit 8b667eb

Browse files
feat🚀: Add the ability to download from HF by repo_id (#65)
Python Bindings Version **0.4.0**: the `clip` model object in python_bindings now can take `model_path_or_repo_id` paramter to download a model from HuggingFace by repo_id. ⚠️ breaking changes: - The `model_path` parameter was renamed to `model_path_or_repo_id` for the adding support downloading by repo id - `model_file` as an Optional parameter to specify the exact model file to download from that repo if you pass a **HF repo_id** that has more than `.bin` file , If `model_path_or_repo_id` is a HF repo id and `model_file` is not specified, it will download the default model file (usually the file with smallest size ending with `.bin`) ✨ Features Added : - ability to download from Hugging Face (no dependencies) - if the file already exists (downloaded by the package) it will loaded instead of redownloading - added some doc string to the main methods 📝file changed: - python_bindings/clip_cpp/clip.py - python_bindings/example_main.py - pyproject.toml -> bump version 0.4.0 - update the python_bindings/README.md - update the ./README.md
1 parent 50bcfb4 commit 8b667eb

File tree

6 files changed

+381
-54
lines changed

6 files changed

+381
-54
lines changed

examples/python_bindings/README.md

Lines changed: 82 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,16 @@ This package provides basic Python bindings for [clip.cpp](https://github.com/mo
55
It requires no third-party libraries and no big dependencies such as PyTorch, TensorFlow, Numpy, ONNX etc.
66

77
## Install
8+
89
If you are on a X64 Linux distribution, you can simply Pip-install it:
910

1011
```sh
1112
pip install clip_cpp
1213
```
1314

15+
> Colab Notebook available for quick experiment :
16+
>
17+
> <a href="https://colab.research.google.com/github/Yossef-Dawoad/clip.cpp/blob/add_colab_notebook_example/examples/python_bindings/notebooks/clipcpp_demo.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
1418
1519
If you are on another operating system or architecture,
1620
or if you want to make use of support for instruction sets other than AVX2 (e.g., AVX512),
@@ -20,19 +24,59 @@ Se [clip.cpp](https://github.com/monatis/clip.cpp) for more info.
2024
All you need to do is to compile with the `-DBUILD_SHARED_LIBS=ON` option and copy `libclip.so` to `examples/python_bindings/clip_cpp`.
2125

2226
## Usage
27+
28+
```python
29+
from clip_cpp import Clip
30+
31+
## you can either pass repo_id or .bin file
32+
## incase you pass repo_id and it has more than .bin file
33+
## it's recommended to spacify which file to download with `model_file`
34+
repo_id = 'Green-Sky/ggml_laion_clip-vit-b-32-laion2b-s34b-b79k'
35+
model_file = 'laion_clip-vit-b-32-laion2b-s34b-b79k.ggmlv0.f16.bin'
36+
37+
model = Clip(
38+
model_path_or_repo_id=repo_id,
39+
model_file=model_file,
40+
verbosity=2
41+
)
42+
43+
text_2encode = 'cat on a Turtle'
44+
45+
tokens = model.tokenize(text_2encode)
46+
text_embed = model.encode_text(tokens)
47+
48+
## load and extract embedings of an image from the disk
49+
image_2encode = '/path/to/cat.jpg'
50+
image_embed = model.load_preprocess_encode_image(image_2encode)
51+
52+
## perform similarity search between the image and the text
53+
score = model.calculate_similarity(text_embed, image_embed)
54+
55+
# Alternatively, you can just do:
56+
# score = model.compare_text_and_image(text, image_path)
57+
58+
print(f"Similarity score: {score}")
59+
60+
```
61+
2362
## Clip Class
2463

2564
The `Clip` class provides a Python interface to clip.cpp, allowing you to perform various tasks such as text and image encoding, similarity scoring, and text-image comparison. Below are the constructor and public methods of the `Clip` class:
2665

2766
### Constructor
2867

2968
```python
30-
def __init__(self, model_file: str, verbosity: int = 0):
69+
def __init__(
70+
self, model_path_or_repo_id: str,
71+
model_file: Optional[str] = None,
72+
revision: Optional[str] = None,
73+
verbosity: int = 0):
3174
```
3275

33-
- **Description**: Initializes a `Clip` instance with the specified CLIP model file and optional verbosity level.
34-
- `model_file` (str): The path to the CLIP model file.
35-
- `verbosity` (int, optional): An integer specifying the verbosity level (default is 0).
76+
- **Description**: Initializes a `Clip` instance with the specified CLIP model file and optional verbosity level.
77+
- `model_path_or_repo_id` (str): The path to the CLIP model file `file` | HF `repo_id`.
78+
- `model_file` (str, optional): if model_path_or_repo_id is **repo_id** that has multiple `.bin` files you can sapcify which `.bin` file to download
79+
- `verbosity` (int, optional): An integer specifying the verbosity level (default is 0).
3680

3781
### Public Methods
3882

@@ -43,7 +87,7 @@ def __init__(self, model_file: str, verbosity: int = 0):
4387
def vision_config(self) -> Dict[str, Any]:
4488
```
4589

46-
- **Description**: Retrieves the configuration parameters related to the vision component of the CLIP model.
90+
- **Description**: Retrieves the configuration parameters related to the vision component of the CLIP model.
4791

4892
#### 2. `text_config`
4993

@@ -52,16 +96,16 @@ def vision_config(self) -> Dict[str, Any]:
5296
def text_config(self) -> Dict[str, Any]:
5397
```
5498

55-
- **Description**: Retrieves the configuration parameters related to the text component of the CLIP model.
99+
- **Description**: Retrieves the configuration parameters related to the text component of the CLIP model.
56100

57101
#### 3. `tokenize`
58102

59103
```python
60104
def tokenize(self, text: str) -> List[int]:
61105
```
62106

63-
- **Description**: Tokenizes a text input into a list of token IDs.
64-
- `text` (str): The input text to be tokenized.
107+
- **Description**: Tokenizes a text input into a list of token IDs.
108+
- `text` (str): The input text to be tokenized.
65109

66110
#### 4. `encode_text`
67111

@@ -71,10 +115,10 @@ def encode_text(
71115
) -> List[float]:
72116
```
73117

74-
- **Description**: Encodes a list of token IDs into a text embedding.
75-
- `tokens` (List[int]): A list of token IDs obtained through tokenization.
76-
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
77-
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
118+
- **Description**: Encodes a list of token IDs into a text embedding.
119+
- `tokens` (List[int]): A list of token IDs obtained through tokenization.
120+
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
121+
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
78122

79123
#### 5. `load_preprocess_encode_image`
80124

@@ -84,10 +128,10 @@ def load_preprocess_encode_image(
84128
) -> List[float]:
85129
```
86130

87-
- **Description**: Loads an image, preprocesses it, and encodes it into an image embedding.
88-
- `image_path` (str): The path to the image file to be encoded.
89-
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
90-
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
131+
- **Description**: Loads an image, preprocesses it, and encodes it into an image embedding.
132+
- `image_path` (str): The path to the image file to be encoded.
133+
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
134+
- `normalize` (bool, optional): Whether or not to normalize the output vector (default is `True`).
91135

92136
#### 6. `calculate_similarity`
93137

@@ -97,9 +141,9 @@ def calculate_similarity(
97141
) -> float:
98142
```
99143

100-
- **Description**: Calculates the similarity score between a text embedding and an image embedding.
101-
- `text_embedding` (List[float]): The text embedding obtained from `encode_text`.
102-
- `image_embedding` (List[float]): The image embedding obtained from `load_preprocess_encode_image`.
144+
- **Description**: Calculates the similarity score between a text embedding and an image embedding.
145+
- `text_embedding` (List[float]): The text embedding obtained from `encode_text`.
146+
- `image_embedding` (List[float]): The image embedding obtained from `load_preprocess_encode_image`.
103147

104148
#### 7. `compare_text_and_image`
105149

@@ -109,37 +153,38 @@ def compare_text_and_image(
109153
) -> float:
110154
```
111155

112-
- **Description**: Compares a text input and an image file, returning a similarity score.
113-
- `text` (str): The input text.
114-
- `image_path` (str): The path to the image file for comparison.
115-
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
156+
- **Description**: Compares a text input and an image file, returning a similarity score.
157+
- `text` (str): The input text.
158+
- `image_path` (str): The path to the image file for comparison.
159+
- `n_threads` (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
116160

117161
#### 8. `__del__`
118162

119163
```python
120164
def __del__(self):
121165
```
122166

123-
- **Description**: Destructor that frees resources associated with the `Clip` instance.
167+
- **Description**: Destructor that frees resources associated with the `Clip` instance.
124168

125169
With the `Clip` class, you can easily work with the CLIP model for various natural language understanding and computer vision tasks.
126170

127171
## Example
172+
128173
A basic example can be found in the [clip.cpp examples](https://github.com/monatis/clip.cpp/blob/main/examples/python_bindings/example_main.py).
129174

130175
```
131-
python example_main.py --help
132-
usage: clip [-h] -m MODEL [-v VERBOSITY] -t TEXT -i IMAGE
133-
134-
optional arguments:
135-
-h, --help show this help message and exit
136-
-m MODEL, --model MODEL
137-
path to GGML file
138-
-v VERBOSITY, --verbosity VERBOSITY
139-
Level of verbosity. 0 = minimum, 2 = maximum
140-
-t TEXT, --text TEXT text to encode
141-
-i IMAGE, --image IMAGE
142-
path to an image file
176+
python example_main.py --help
177+
usage: clip [-h] -m MODEL [-v VERBOSITY] -t TEXT -i IMAGE
178+
179+
optional arguments:
180+
-h, --help show this help message and exit
181+
-m MODEL, --model MODEL
182+
path to GGML file
183+
-v VERBOSITY, --verbosity VERBOSITY
184+
Level of verbosity. 0 = minimum, 2 = maximum
185+
-t TEXT, --text TEXT text to encode
186+
-i IMAGE, --image IMAGE
187+
path to an image file
143188
```
144189

145-
Bindings to the DLL are implemented in `clip_cpp/clip.py` and
190+
Bindings to the DLL are implemented in `clip_cpp/clip.py` and

0 commit comments

Comments
 (0)