-
Notifications
You must be signed in to change notification settings - Fork 130
[SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison #885
Conversation
51b5647
to
7cb7c0e
Compare
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
@@ -99,26 +100,74 @@ bool write_binary_file(const char *fname, const std::vector<T> &vec, | |||
} | |||
|
|||
template <typename T> | |||
bool cmp_binary_files(const char *fname1, const char *fname2, T tolerance) { | |||
bool cmp_binary_files(const char *fname1, const char *fname2, const T tolerance, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion: can either reuse this function template, or make an overload.
… (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
7cb7c0e
to
49bd916
Compare
std::cerr << std::endl; | ||
return false; | ||
} else { | ||
std::cerr << ". Current mismatch rate: " << std::setprecision(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is it ok to output mismatches (up to the threshold) or output only in case of failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say, always output mismatch summary, even if they are "tolerated" and test passes.
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
bool cmp_binary_files(const char *fname1, const char *fname2, T tolerance) { | ||
bool cmp_binary_files(const char *fname1, const char *fname2, const T tolerance, | ||
const double toleratedMismatchRate = 0, | ||
const int mismatchReportThrottleLimit = 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is mismatchReportThrottleLimit? please add comment. what "throttle" in the name signifies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Threshold" might be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
std::cerr << (int)vec1[i] << " vs " << (int)vec2[i] << std::endl; | ||
} else { | ||
std::cerr << vec1[i] << " vs " << vec2[i] << std::endl; | ||
if (!toleratedMismatchRate || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if toleratedMismatchRate == 0 and there are errors in every vector element, there will be size lines of output, which is bad. I understand that this is how it was before, but since you are refactoring this anyway, please output at most mismatchReportThrottleLimit
errors (also consider changing the name to something simpler - mismatchReportLimit
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if toleratedMismatchRate == 0 and there are errors in every vector element, there will be size lines of output, which is bad.
Correction: there will be one line of output and return:
if (!toleratedMismatchRate) {
std::cerr << std::endl;
return false;
}
// Yes, this is return from inside a loop which is hard to notice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. ok then
std::cerr << std::endl; | ||
return false; | ||
} else { | ||
std::cerr << ". Current mismatch rate: " << std::setprecision(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say, always output mismatch summary, even if they are "tolerated" and test passes.
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
return false; | ||
} else { | ||
std::cerr << ". Current mismatch rate: " << std::setprecision(8) | ||
<< std::fixed << totalMismatches / size << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<< std::fixed << totalMismatches / size << std::endl; | |
<< std::fixed << totalMismatches / (double)size << std::endl; |
accompanies the above suggestion to make size
of size_t
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I was relying on implicit conversions.
Yes, implicit conversions can be evil (int promotion is just one to name among those, not related here
however).
If you explicitly oppose any implicit conversions, I can change. If not, let's leave it as is, not to make more code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if size becomes size_t as I suggested above, then totalMismatches / size would be 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know where confusion comes from:
yes it would be 0 if totalMismatches was of integral type, BUT totalMismatches is of type double, so totalMismatches / size will not be 0 unless totalMismatches is 0.0.
"size" will be implicitly converted to double in this expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalMismatches should be integral as well then. I did not even expect it to be double, very counter-intuitive. Please make them both integral and do casting where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of totalMismatches conforms to its use case (also, intuition is subjective).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change totalMismatches
and size
to integer type to avoid hard-to-detect conversion errors when double can't accurately represent these integer (by nature) values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For double after reaching 2^53 we'd stop incrementing the value if incrementing by 1, that's true.
If there'll ever be a > petabyte file checked in llvm-test-suite, well, yes, that could be a problem.
Another argument against double could be performance.
Anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
} | ||
|
||
} else if (totalMismatches == mismatchReportThrottleLimit) { | ||
std::cerr << "Mismatch output throttled ... " << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand meaning of "throttled" here. "truncated", "stopped" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, please reword then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded.
SYCL/ESIMD/esimd_test_utils.hpp
Outdated
return false; | ||
} | ||
|
||
std::cerr << "Total mismatch rate is " << totalMismatchRate << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With introduction of tolerated mismatched rate, we need to also introduce and report maximum detected relative mismatch (in %). I would say if this is more than 10-20% in any element, that would indicate a problem, and test test developer should pay attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is more than 10-20% in any element, that would indicate a problem
If I understand correctly, you mean checking for absolute difference (distance) between reference vs tested value of each data element (and expressing it as a percentage).
That's subtle: interpretation of this distance depends on generator semantics, e.g. in case of overflow allowed during generation for uint8_t what's the distance between 255 and 0? It may be considered as 255 in one direction, but also as 1+N*(255+1) if we go the other way over the circle, where N is from 0 to +inf.
And the same goes for Mandelbrot set image generator: color value overflows and we are not certain about what's the distance: ambiguity we can't interpret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean relative, not absolute, just as I wrote. abs((expected-val)/expected)*100.
So, please report this info rather than silently returning "pass".
Note that I did not propose return "fail", irrespective of the max relative mismatch. I would like test developers or people who run tests see it and decide themselves if it is OK or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean relative, ... abs((expected-val)/expected)*100
Konstantin, I understand, but formally it's derived from absolute difference [ abs((expected-val) ... ]
and it doesn't matter if later it's expressed as a percentage from the expected value or anything else: it gives no practically useful information.
Example. Let's look at some full error log:
...
byte 1: got 1 instead of 255
byte 2: got 10 instead of 9
byte 3: got 5 instead of 4
byte 100: got 12 instead of 9
byte 101: got 11 instead of 8
byte 102: got 6 instead of 3
...
With this info we can do at least something: observe the whole picture, try to infer a pattern.
But if we report max relative divergency for a single byte (byte 1 in this case) it will give no useful information by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it does not give practically useful information in this particular case with mandelbrot does not imply it won't in other cases. Since this is a library function to be used by other cases, we should care about other use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Konstantin,
It's not applicable in case of toleratedMismatchRate == 0 as we fail right away there.
In case of toleratedMismatchRate != 0 log contains plenty of records that allow a user to make conclusions about deviations or increase print limit to observe the full log.
There's practically no use case for the suggested data.
I could understand adding something if there's a use case, but, I'm sorry, I can't understand adding more code if there's no foreseeable use case (taking into account all mentioned above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I outlined the use case above - please take a look.
Having user scan through all tolerated mismatches in case there are many of these is error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: in current implementation arguments fname1, fname2 are not assigned to any particular role of being reference or not.
Adding the requested printout will imply assigning these roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
Co-authored-by: kbobrovs <[email protected]>
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
… tolerated-mismatch-rate
…/llvm-test-suite into tolerated-mismatch-rate
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
bool cmp_binary_files(const char *testOutFile, const char *referenceFile, | ||
const T tolerance = 0, | ||
const double mismatchRateTolerance = 0, | ||
const int mismatchReportLimit = 9) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbobrovs , maybe it's better to make it overridable via environment for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Could be a separate patch
typename std::enable_if<std::is_integral<T>::value || | ||
std::is_floating_point<T>::value, | ||
int>::type = 0> | ||
bool cmp_binary_files(const char *testOutFile, const char *referenceFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args renamed to reflect roles of test output vs reference source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# command stderr:
Tolerated mismatch rate set to 0.0027
Mismatch at 135942 133 vs 118. Current mismatch rate: 0.00000000
Mismatch at 135943 45 vs 38. Current mismatch rate: 0.00000069
Mismatch at 135944 129 vs 126. Current mismatch rate: 0.00000139
Mismatch at 183876 146 vs 236. Current mismatch rate: 0.00000208
Mismatch at 183877 34 vs 76. Current mismatch rate: 0.00000278
Mismatch at 183878 234 vs 252. Current mismatch rate: 0.00000347
Mismatch at 183918 11 vs 252. Current mismatch rate: 0.00000417
Mismatch at 183919 227 vs 220. Current mismatch rate: 0.00000486
Mismatch at 183920 207 vs 204. Current mismatch rate: 0.00000556
Mismatch output stopped.
Total mismatch rate is 0.00267497 with max relative difference of 0.95634921
PASSED
…intel#885) * [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
…intel/llvm-test-suite#885) * [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
…g involding floating point computations (due to deviations in GPU vs host-based FP).