-
Notifications
You must be signed in to change notification settings - Fork 714
AVX runtime check #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX runtime check #357
Conversation
Thank you! Will review it within the week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pull request!
I wonder if you've looked at the performance compared to the base version? It seems that the avx check is performed on every distance calculation. I would expect it to significantly hurt the performance for small vectors (e.g. d=4, though I have not checked that).
I think an alternative here is to move the avx check to the Space constructor (https://github.com/nmslib/hnswlib/blob/master/hnswlib/space_ip.h#L286), so it will be done only once.
hnswlib/space_ip.h
Outdated
InnerProductSIMD16Ext(const void *pVect1v, const void *pVect2v, const void *qty_ptr) { | ||
DISTFUNC<float> simdfunc_; | ||
#if defined(USE_AVX512) | ||
if (AVX512Capable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it does a test for the AVX every distance calculation, which I guess would damage the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another look to see if I can only perform the check only once in the constructor. The difficulty in doing that is that the *ExtResiduals functions use the *Ext functions and are outside the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be solved by adding a template parameter with avx support inside so it (so compiler would optimize the unused branches away)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the decision branch to the constructor so it shouldn't add extra compute during the distance calculations. Also I noticed the CI tests on windows failed. I don't have a windows framework for testing. Any chance I could get help with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a VM for windows and fixed windows build now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll check it. Hopefully, this weekend.
Hi @yurymalkov . Tony is helping us work on the AVX runtime check for hnswlib. And we just wonder if you could allow running the CI test workflows after his recent modifications. Thanks! Sincerely, |
Hi @tony-kuo. Sure! Thanks! |
setup.py
Outdated
} | ||
if not os.environ.get("HNSWLIB_NO_NATIVE"): | ||
if "conda" not in sys.version.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick question here - why not anaconda?
It seems like it cancels the use of AVX for all anaconda python builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. Originally I was trying to catch when it was building the binary in the conda-forge systems, but with the flag now, that isn't necessary. I'll remove the "conda" check and resubmit for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one more compatibility concern and question about a check for cpuid logic.
In other places looks good to me!
I also (somewhat strangely) do not see much changes between AVX and AVX512 speed in my tests (and I've checked that different implementations are running) - I wonder what is your experience?
Thank you so much for the contribution! |
Regarding issue #292
AVX runtime check: I've made a wrapper function for the simd functions and runtime check which to use. Not the cleanest implementation but I think of a simpler way.
In addition, for issue conda-forge/hnswlib-feedstock#11
I've tested building on one machine and installing in another with different AVX capabilities with both conda and pip. The '-march=native' in setup.py is a factor in illegal instruction errors. Taking it out solves this issue. So I propose a environment flag 'HNSWLIB_NO_NATIVE', which is checked for in setup.py. This will allow conda build recipes to choose not to use '-march=native'.