Skip to content

AVX runtime check #357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 30, 2022
Merged

AVX runtime check #357

merged 11 commits into from
Jan 30, 2022

Conversation

tony-kuo
Copy link

Regarding issue #292
AVX runtime check: I've made a wrapper function for the simd functions and runtime check which to use. Not the cleanest implementation but I think of a simpler way.

In addition, for issue conda-forge/hnswlib-feedstock#11
I've tested building on one machine and installing in another with different AVX capabilities with both conda and pip. The '-march=native' in setup.py is a factor in illegal instruction errors. Taking it out solves this issue. So I propose a environment flag 'HNSWLIB_NO_NATIVE', which is checked for in setup.py. This will allow conda build recipes to choose not to use '-march=native'.

@yurymalkov
Copy link
Member

Thank you! Will review it within the week.

Copy link
Member

@yurymalkov yurymalkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request!
I wonder if you've looked at the performance compared to the base version? It seems that the avx check is performed on every distance calculation. I would expect it to significantly hurt the performance for small vectors (e.g. d=4, though I have not checked that).

I think an alternative here is to move the avx check to the Space constructor (https://github.com/nmslib/hnswlib/blob/master/hnswlib/space_ip.h#L286), so it will be done only once.

InnerProductSIMD16Ext(const void *pVect1v, const void *pVect2v, const void *qty_ptr) {
DISTFUNC<float> simdfunc_;
#if defined(USE_AVX512)
if (AVX512Capable())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it does a test for the AVX every distance calculation, which I guess would damage the performance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look to see if I can only perform the check only once in the constructor. The difficulty in doing that is that the *ExtResiduals functions use the *Ext functions and are outside the class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be solved by adding a template parameter with avx support inside so it (so compiler would optimize the unused branches away)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the decision branch to the constructor so it shouldn't add extra compute during the distance calculations. Also I noticed the CI tests on windows failed. I don't have a windows framework for testing. Any chance I could get help with that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a VM for windows and fixed windows build now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll check it. Hopefully, this weekend.

@yihming
Copy link

yihming commented Jan 8, 2022

Hi @yurymalkov . Tony is helping us work on the AVX runtime check for hnswlib. And we just wonder if you could allow running the CI test workflows after his recent modifications. Thanks!

Sincerely,
Yiming

@yurymalkov
Copy link
Member

Hi @tony-kuo. Sure! Thanks!
I guess there is an option to run the CI tests in the forks of the repository as well, thought this requires turning on github actions.

@tony-kuo tony-kuo requested a review from yurymalkov January 13, 2022 15:45
setup.py Outdated
}
if not os.environ.get("HNSWLIB_NO_NATIVE"):
if "conda" not in sys.version.lower():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick question here - why not anaconda?
It seems like it cancels the use of AVX for all anaconda python builds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Originally I was trying to catch when it was building the binary in the conda-forge systems, but with the flag now, that isn't necessary. I'll remove the "conda" check and resubmit for review.

@tony-kuo tony-kuo requested a review from yurymalkov January 13, 2022 20:23
@tony-kuo tony-kuo requested a review from yurymalkov January 17, 2022 13:57
@tony-kuo tony-kuo requested a review from yurymalkov January 17, 2022 16:48
Copy link
Member

@yurymalkov yurymalkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more compatibility concern and question about a check for cpuid logic.
In other places looks good to me!

I also (somewhat strangely) do not see much changes between AVX and AVX512 speed in my tests (and I've checked that different implementations are running) - I wonder what is your experience?

@yurymalkov
Copy link
Member

yurymalkov commented Jan 30, 2022

Thank you so much for the contribution!
The AVX512 speed seems weird, but that is outside of the scope of this PR. Will check the reason separately.

@yurymalkov yurymalkov merged commit 41998c9 into nmslib:develop Jan 30, 2022
@tony-kuo tony-kuo deleted the avx_runtime branch January 31, 2022 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants