-
Notifications
You must be signed in to change notification settings - Fork 711
IP space roundup bug fix #361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @GuyAv46. Thank you for the PR! One major factor is that hopefully #357 is going to be merged pretty soon with some changes in the same file, so probably it would be needed to be updated. I guess it also might make sense to rename I also wonder if you've know what is the difference between avx and non-avx computation on the same cpu? |
Happy to help. :) about the last question - do you mean what is the difference in performance? |
tested on Lenovo Thinkpad P15v with AVX512 capabilities, on 128 dimension float vector, using |
@GuyAv46 Thank you! |
As far as I can tell, my PR fix this problem; there is no more difference in result accuracy between different optimization. Every function eventually multiply each same-index elements pair and then sum them all up, using Before the change, I notice that when I used InnerProductSIMD16ExtResiduals with two close vectors, the first internal function returned 1 (the dot product was 0), and the second internal (on the "tail") returned ~0 (1-{almost 1}), and instead returning ~0 (5.9604645e-08), the function returned Now I'm getting the same result regardless the specific function. |
Got it! Thanks! |
7b41eca
to
49ef6bc
Compare
Hi @GuyAv46. |
Yes, It is ready |
Thanks! |
Fixing a bug causing IP functions in some optimizations and cases to round up the calculated result.
when using
InnerProductSIMD16ExtResiduals
orInnerProductSIMD4ExtResiduals
, we callInnerProductSIMD16Ext
orInnerProductSIMD4Ext
respectively, and then also callingInnerProduct
. each result already calculated1.0f - {IP}
so we need to sum them and then subtract one 1.0f back to get1.0f - {IP16/4 + IPrest}
and not2.0f - {IP16/4 + IPrest}
.When comparing close vectors, this can lead to losing the actual IP and rounding up the result, and getting inconsistent results in different optimizations.