Imagenet1k benchmark #57

ellonde · 2023-09-13T20:45:55Z

A small script to setup ImageNet1k for the benchmark

Minor changes to the downloaded dataset as there are duplicated classes, some classes also contains "/" which is replaced with "or".

I looked at the clip-benchmark package and they average the vectors of a bunch of templates strings where they insert the class name. I made some c++ code for that as well but did not include it. Let me know if you want to have a look at that. Perhaps in another PR.

I have run this on the default setup described in the README and performance is not really looking as expected.
As mentioned in another issue, the performance difference could be due to different tokenizer strategy. But I have tried to make a dumb test change to the benchmark script where I call the python tokenizer instead and I only get slightly better results.

Here are the results:
With python tokenizer:
| total | 0.1102 | 0.1600 |

With cpp tokenizer:
| total | 0.0800 | 0.1182 |

As mentioned what I ran with was CLIP-ViT-B-32-laion2B-s34B-b79K fp16 model.

Regardless, I think the benchmark dataset is nice to have to validate what performance hits are taken from different quantization strategies.

monatis · 2023-09-14T05:46:47Z

Wow! Do you have any info about those scores for the original Pytorch implementation? And how long does it take to run the inference through the entire imagenet1k dataset with clip.cpp?

Regardless of that, I was not aware that tokenization makes such a big difference. I should definitely have a look at it.

ellonde · 2023-09-14T07:18:32Z

I have not run the original implementation myself but this model is reported to have 66.6% top 1 performance in the open-clip repo. That is, I'm assuming, a fp32 version of the model.

I only have the timings from the hacky python-tokenizer version which is super slow when doing the tokenization. But processing the images is of course most of the work if you use the tokenizer from cpp.

- 998 texts encoded in 3211832.00 ms ( 3218.27 ms per text)
- 47904 images encoded in 5594410.00 ms (  116.78 ms per image)

monatis · 2023-09-14T07:57:52Z

this model is reported to have 66.6% top 1 performance in the open-clip repo.

I think this is a big margin that cannot be attributed to fp16 vs. fp32 models. Is it possible that you post the code for calculating the metrics you reported?

ellonde · 2023-09-14T11:31:24Z

I completely agree!

I have used the benchmark script that you provide, in which I trusted blindly 😅 I haven't look through to verify that its computing the metrics correctly.

I also dont think averaging of the class templates as text targets can account for this big of a difference neither

monatis · 2023-09-14T18:47:21Z

There should be a difference between test protocols because vectors from this implementation are comparable to those from the original implementation --I verified it during the implementation, and it's also confirmed from the community feedback. I'd like to hold this PR until I better understand the difference between the metrics.

skirodev · 2023-09-16T15:14:43Z

I found issues with the batch inference implementation in ggml_conv_2d, so I submitted a PR #528 fix it.

Here are the results of benchmark testing using fixed code and the ggml_laion_clip-vit-b-32-laion2b-s34b-b79k f16 model on the Imagenet1k dataset.

Origin code:
| total | 0.0805 | 0.1185 |

Fixed code:
| total | 0.3137 | 0.4582 |

monatis · 2023-09-18T17:51:20Z

@skirodev I tracked your branch until it's merged into master. I was not aware that there's a regression for it. Thanks.

@ellonde Now that we know the reason for that large margin, I think we can merge this. There's still a margin between clip.cpp and the scores reported by OpenCLIP, but this can be attributed to their different test protocol plus tokenization and image preprocessing divergences in this repo. I will fix the second and third items soon, but the first one is just for research-grade purposes, and I believe we can skip it for now

.I'll handle this PR in #72 because I couldn't push to your branch.

Imagenet1k benchmark

6719bb2

monatis marked this pull request as draft September 14, 2023 18:47

monatis closed this Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Imagenet1k benchmark #57

Imagenet1k benchmark #57

Uh oh!

ellonde commented Sep 13, 2023

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

ellonde commented Sep 14, 2023

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

ellonde commented Sep 14, 2023 •

edited

Loading

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

skirodev commented Sep 16, 2023

Uh oh!

monatis commented Sep 18, 2023

Uh oh!

Uh oh!

Uh oh!

Imagenet1k benchmark #57

Imagenet1k benchmark #57

Uh oh!

Conversation

ellonde commented Sep 13, 2023

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

ellonde commented Sep 14, 2023

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

ellonde commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

monatis commented Sep 14, 2023

Uh oh!

skirodev commented Sep 16, 2023

Uh oh!

monatis commented Sep 18, 2023

Uh oh!

Uh oh!

ellonde commented Sep 14, 2023 •

edited

Loading