Skip to content

Commit b46feea

Browse files
docs✨: Add colab notebook example (#64)
* Created using Colaboratory * docs✨:Add Colab notebook for basic experiments
1 parent 8c92e21 commit b46feea

File tree

2 files changed

+3414
-30
lines changed

2 files changed

+3414
-30
lines changed

README.md

Lines changed: 44 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
11
# clip.cpp
2+
23
CLIP inference in plain C/C++ with no extra dependencies
34

45
## Description
6+
57
This is a dependency free implementation of well known [CLIP](https://github.com/openai/clip) by OpenAI,
68
thanks to the great work in [GGML](https://github.com/ggerganov/ggml).
79
You can use it to work with CLIP models from both OpenAI and LAION
810
in Transformers format.
911

1012
## Motivation
13+
1114
CLIP is deployed for several task from semantic image search to zero-shot image labeling.
1215
It's also a part of Stable Diffusion and and the recently emerging field of large multimodal models (LMM).
1316
This repo is aimed at powering useful applications based on such models on computation- or memory-constraint devices.
@@ -16,21 +19,24 @@ This repo is aimed at powering useful applications based on such models on compu
1619
clip.cpp also has a short startup time compared to large ML frameworks, which makes it suitable for serverless deployments where the cold start is an issue.
1720

1821
## Hot topics
19-
- 09/14/2023: All functions are C-compatible now. `zsl` example is updated to match Huggingface's zero-shot behavior in the zero-shot pipeline.
20-
- 09/11/2023: Introduce Python bindings.
21-
- 07/12/2023: Batch inference support for image encoding.
22-
- 07/11/2023: Semantic image search [example](examples/image-search/README.md) directly in C++.
22+
23+
- 09/14/2023: All functions are C-compatible now. `zsl` example is updated to match Huggingface's zero-shot behavior in the zero-shot pipeline.
24+
- 09/11/2023: Introduce Python bindings.
25+
- 07/12/2023: Batch inference support for image encoding.
26+
- 07/11/2023: Semantic image search [example](examples/image-search/README.md) directly in C++.
2327

2428
## Note about image preprocessing
25-
PIL uses a two-pass convolutions-based bicubic interpolation in resizing with antialiasing applied. In Pytorch, antialiasing is optional. It needs some extra attention to implement this preprocessing logic that matches their results numerically. However, I found that linear interpolation is also good enough for both comparison of different embeddings from this implementation and also comparison of an embedding from this implementation and another one from Transformers. So let's use it until we craft a proper bicubic interpolation.
2629

30+
PIL uses a two-pass convolutions-based bicubic interpolation in resizing with antialiasing applied. In Pytorch, antialiasing is optional. It needs some extra attention to implement this preprocessing logic that matches their results numerically. However, I found that linear interpolation is also good enough for both comparison of different embeddings from this implementation and also comparison of an embedding from this implementation and another one from Transformers. So let's use it until we craft a proper bicubic interpolation.
2731

2832
## Preconverted Models
33+
2934
Preconverted Models can be found in [HuggingFace Repositories tagged with `clip.cpp`](https://huggingface.co/models?other=clip.cpp).
3035
If you want to do conversion yourself for some reason, see below for how.
3136
Otherwise, download a model of your choice from the link above and then feel free to jump to the building section.
3237

3338
## Model conversion
39+
3440
You can convert CLIP models from OpenAI and LAION in Transformers format. Apparently, LAION's models outperform OpenAI models in several benchmarks, so they are recommended.
3541

3642
1. Clone the model repo from HF Hub:
@@ -64,6 +70,7 @@ python convert_hf_to_ggml.py ../../CLIP-ViT-B-32-laion2B-s34B-b79K 1
6470
The output `ggml-model-f16.bin` file is in the model directory specified in the command above.
6571

6672
## Building
73+
6774
```shell
6875
git clone --recurse-submodules https://github.com/monatis/clip.cpp.git
6976

@@ -84,13 +91,14 @@ And the binaries are in the `./bin` directory.
8491
I couldn't reproduce it on my Macbook M2 pro so cannot help further. If you know a solution that I can include in `CMakeLists.txt` please ping me [here](https://github.com/monatis/clip.cpp/issues/24).
8592

8693
## Quantization
94+
8795
`clip.cpp` currently supports q4_0 and q4_1 quantization types.
8896
You can quantize a model in F16 to one of these types by using the `./bin/quantize` binary.
8997

9098
```
91-
usage: ./bin/quantize /path/to/ggml-model-f16.bin /path/to/ggml-model-quantized.bin type
92-
type = 2 - q4_0
93-
type = 3 - q4_1
99+
usage: ./bin/quantize /path/to/ggml-model-f16.bin /path/to/ggml-model-quantized.bin type
100+
type = 2 - q4_0
101+
type = 3 - q4_1
94102
```
95103

96104
For example, you can run the following to convert the model to q4_0:
@@ -102,27 +110,28 @@ For example, you can run the following to convert the model to q4_0:
102110
Now you can use `ggml-model-q4_0.bin` just like the model in F16.
103111

104112
## Usage
113+
105114
Currently we have three examples: `main`, `zsl` and `image-search`.
106115

107116
1. `main` is just for demonstrating the usage of API and optionally print out some verbose timings. It simply calculates the similarity between one image and one text passed as CLI args.
108117

109118
```
110-
Usage: ./bin/main [options]
111-
112-
Options: -h, --help: Show this message and exit
113-
-m <path>, --model <path>: path to model. Default: models/ggml-model-f16.bin
114-
-t N, --threads N: Number of threads to use for inference. Default: 4
115-
--text <text>: Text to encode. At least one text should be specified
116-
--image <path>: Path to an image file. At least one image path should be specified
117-
-v <level>, --verbose <level>: Control the level of verbosity. 0 = minimum, 2 = maximum. Default: 1
119+
Usage: ./bin/main [options]
120+
121+
Options: -h, --help: Show this message and exit
122+
-m <path>, --model <path>: path to model. Default: models/ggml-model-f16.bin
123+
-t N, --threads N: Number of threads to use for inference. Default: 4
124+
--text <text>: Text to encode. At least one text should be specified
125+
--image <path>: Path to an image file. At least one image path should be specified
126+
-v <level>, --verbose <level>: Control the level of verbosity. 0 = minimum, 2 = maximum. Default: 1
118127
```
119128

120129
2. `zsl` is a zero-shot image labeling example. It labels an image with one of the labels.
121-
The CLI args are the same as in `main`,
122-
but you must specify multiple `--text` arguments to specify the labels.
130+
The CLI args are the same as in `main`,
131+
but you must specify multiple `--text` arguments to specify the labels.
123132

124133
3. `image-search` is an example for semantic image search with [USearch](https://github.com/unum-cloud/usearch/).
125-
You must enable `CLIP_BUILD_IMAGE_SEARCH` option to compile it, and the dependency will be automatically fetched by cmake:
134+
You must enable `CLIP_BUILD_IMAGE_SEARCH` option to compile it, and the dependency will be automatically fetched by cmake:
126135

127136
```sh
128137
mkdir build
@@ -137,6 +146,7 @@ make
137146
See [examples/image-search/README.md](examples/image-search/README.md) for more info and usage.
138147

139148
## Python bindings
149+
140150
You can use clip.cpp in Python with no third-party libraries (no dependencies other than standard Python libraries).
141151
It uses `ctypes` to load a dynamically linked library (DLL) to interface the implementation in C/C++.
142152

@@ -146,6 +156,10 @@ If you are on an X64 Linux distribution, you can simply Pip-install it with AVX2
146156
pip install clip_cpp
147157
```
148158

159+
> Colab Notebook available for quick experiment :
160+
>
161+
> <a href="https://colab.research.google.com/github/Yossef-Dawoad/clip.cpp/blob/add_colab_notebook_example/examples/python_bindings/notebooks/clipcpp_demo.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
162+
149163
If you are on another operating system or architecture,
150164
or if you want to make use of support for instruction sets other than AVX2 (e.g., AVX512),
151165
you can build it from source.
@@ -165,23 +179,23 @@ make
165179
And find the `libclip.so` binary in the `build` directory.
166180
See [examples/python_bindings/README.md](examples/python_bindings/README.md) for more info and usage.
167181

168-
169182
## Benchmarking
183+
170184
You can use the benchmarking utility to compare the performances of different checkpoints and quantization types.
171185

172186
```
173-
usage: ./bin/benchmark <model_path> <images_dir> <num_images_per_dir> [output_file]
174-
175-
model_path: path to CLIP model in GGML format
176-
images_dir: path to a directory of images where images are organized into subdirectories named classes
177-
num_images_per_dir: maximum number of images to read from each one of subdirectories. if 0, read all files
178-
output_file: optional. if specified, dump the output to this file instead of stdout
179-
```
187+
usage: ./bin/benchmark <model_path> <images_dir> <num_images_per_dir> [output_file]
180188
189+
model_path: path to CLIP model in GGML format
190+
images_dir: path to a directory of images where images are organized into subdirectories named classes
191+
num_images_per_dir: maximum number of images to read from each one of subdirectories. if 0, read all files
192+
output_file: optional. if specified, dump the output to this file instead of stdout
193+
```
181194

182195
TODO: share benchmarking results for a common dataset later on.
183196

184197
## Future Work
185-
- [ ] Support `text-only`, `image-only` and `both` (current) options when exporting, and modify model loading logic accordingly. It might be relevant to use a single modality in certain cases, as in large multimodal models, or building and/or searching for semantic image search.
186-
- [ ] Seperate memory buffers for text and image models, as their memory requirements are different.
187-
- [ ] Implement proper bicubic interpolation (PIL uses a convolutions-based algorithm, and it's more stable than affine transformations).
198+
199+
- [ ] Support `text-only`, `image-only` and `both` (current) options when exporting, and modify model loading logic accordingly. It might be relevant to use a single modality in certain cases, as in large multimodal models, or building and/or searching for semantic image search.
200+
- [ ] Seperate memory buffers for text and image models, as their memory requirements are different.
201+
- [ ] Implement proper bicubic interpolation (PIL uses a convolutions-based algorithm, and it's more stable than affine transformations).

examples/python_bindings/notebooks/clipcpp_demo.ipynb

Lines changed: 3370 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)