Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison #885

Merged
merged 10 commits into from
Mar 24, 2022

Conversation

lsatanov
Copy link

@lsatanov lsatanov commented Mar 2, 2022

…g involding floating point computations (due to deviations in GPU vs host-based FP).

@lsatanov lsatanov force-pushed the tolerated-mismatch-rate branch 8 times, most recently from 51b5647 to 7cb7c0e Compare March 3, 2022 14:49
@@ -99,26 +100,74 @@ bool write_binary_file(const char *fname, const std::vector<T> &vec,
}

template <typename T>
bool cmp_binary_files(const char *fname1, const char *fname2, T tolerance) {
bool cmp_binary_files(const char *fname1, const char *fname2, const T tolerance,
Copy link
Author

@lsatanov lsatanov Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: can either reuse this function template, or make an overload.

… (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
@lsatanov lsatanov force-pushed the tolerated-mismatch-rate branch from 7cb7c0e to 49bd916 Compare March 7, 2022 09:10
@lsatanov lsatanov changed the title DRAFT [SYCL][ESIMD][EMU] DRAFT [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison Mar 7, 2022
@lsatanov lsatanov changed the title DRAFT [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison Mar 7, 2022
std::cerr << std::endl;
return false;
} else {
std::cerr << ". Current mismatch rate: " << std::setprecision(8)
Copy link
Author

@lsatanov lsatanov Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is it ok to output mismatches (up to the threshold) or output only in case of failure?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, always output mismatch summary, even if they are "tolerated" and test passes.

@lsatanov lsatanov marked this pull request as ready for review March 7, 2022 11:47
@lsatanov lsatanov requested a review from a team as a code owner March 7, 2022 11:47
bool cmp_binary_files(const char *fname1, const char *fname2, T tolerance) {
bool cmp_binary_files(const char *fname1, const char *fname2, const T tolerance,
const double toleratedMismatchRate = 0,
const int mismatchReportThrottleLimit = 5) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is mismatchReportThrottleLimit? please add comment. what "throttle" in the name signifies?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Threshold" might be a better name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

std::cerr << (int)vec1[i] << " vs " << (int)vec2[i] << std::endl;
} else {
std::cerr << vec1[i] << " vs " << vec2[i] << std::endl;
if (!toleratedMismatchRate ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if toleratedMismatchRate == 0 and there are errors in every vector element, there will be size lines of output, which is bad. I understand that this is how it was before, but since you are refactoring this anyway, please output at most mismatchReportThrottleLimit errors (also consider changing the name to something simpler - mismatchReportLimit)

Copy link
Author

@lsatanov lsatanov Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if toleratedMismatchRate == 0 and there are errors in every vector element, there will be size lines of output, which is bad.

Correction: there will be one line of output and return:

if (!toleratedMismatchRate) {
std::cerr << std::endl;
return false;
}

// Yes, this is return from inside a loop which is hard to notice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. ok then

std::cerr << std::endl;
return false;
} else {
std::cerr << ". Current mismatch rate: " << std::setprecision(8)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, always output mismatch summary, even if they are "tolerated" and test passes.

return false;
} else {
std::cerr << ". Current mismatch rate: " << std::setprecision(8)
<< std::fixed << totalMismatches / size << std::endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<< std::fixed << totalMismatches / size << std::endl;
<< std::fixed << totalMismatches / (double)size << std::endl;

accompanies the above suggestion to make size of size_t type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I was relying on implicit conversions.
Yes, implicit conversions can be evil (int promotion is just one to name among those, not related here
however).

If you explicitly oppose any implicit conversions, I can change. If not, let's leave it as is, not to make more code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if size becomes size_t as I suggested above, then totalMismatches / size would be 0

Copy link
Author

@lsatanov lsatanov Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know where confusion comes from:
yes it would be 0 if totalMismatches was of integral type, BUT totalMismatches is of type double, so totalMismatches / size will not be 0 unless totalMismatches is 0.0.
"size" will be implicitly converted to double in this expression.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalMismatches should be integral as well then. I did not even expect it to be double, very counter-intuitive. Please make them both integral and do casting where needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of totalMismatches conforms to its use case (also, intuition is subjective).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change totalMismatches and size to integer type to avoid hard-to-detect conversion errors when double can't accurately represent these integer (by nature) values.

Copy link
Author

@lsatanov lsatanov Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For double after reaching 2^53 we'd stop incrementing the value if incrementing by 1, that's true.
If there'll ever be a > petabyte file checked in llvm-test-suite, well, yes, that could be a problem.
Another argument against double could be performance.

Anyways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

}

} else if (totalMismatches == mismatchReportThrottleLimit) {
std::cerr << "Mismatch output throttled ... " << std::endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand meaning of "throttled" here. "truncated", "stopped" ?

Copy link
Author

@lsatanov lsatanov Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, stopped.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, please reword then

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

return false;
}

std::cerr << "Total mismatch rate is " << totalMismatchRate << std::endl;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With introduction of tolerated mismatched rate, we need to also introduce and report maximum detected relative mismatch (in %). I would say if this is more than 10-20% in any element, that would indicate a problem, and test test developer should pay attention.

Copy link
Author

@lsatanov lsatanov Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is more than 10-20% in any element, that would indicate a problem

If I understand correctly, you mean checking for absolute difference (distance) between reference vs tested value of each data element (and expressing it as a percentage).

That's subtle: interpretation of this distance depends on generator semantics, e.g. in case of overflow allowed during generation for uint8_t what's the distance between 255 and 0? It may be considered as 255 in one direction, but also as 1+N*(255+1) if we go the other way over the circle, where N is from 0 to +inf.

And the same goes for Mandelbrot set image generator: color value overflows and we are not certain about what's the distance: ambiguity we can't interpret

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean relative, not absolute, just as I wrote. abs((expected-val)/expected)*100.
So, please report this info rather than silently returning "pass".

Note that I did not propose return "fail", irrespective of the max relative mismatch. I would like test developers or people who run tests see it and decide themselves if it is OK or not.

Copy link
Author

@lsatanov lsatanov Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean relative, ... abs((expected-val)/expected)*100

Konstantin, I understand, but formally it's derived from absolute difference [ abs((expected-val) ... ]
and it doesn't matter if later it's expressed as a percentage from the expected value or anything else: it gives no practically useful information.

Example. Let's look at some full error log:
...
byte 1: got 1 instead of 255
byte 2: got 10 instead of 9
byte 3: got 5 instead of 4
byte 100: got 12 instead of 9
byte 101: got 11 instead of 8
byte 102: got 6 instead of 3
...

With this info we can do at least something: observe the whole picture, try to infer a pattern.
But if we report max relative divergency for a single byte (byte 1 in this case) it will give no useful information by itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it does not give practically useful information in this particular case with mandelbrot does not imply it won't in other cases. Since this is a library function to be used by other cases, we should care about other use cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Konstantin,
It's not applicable in case of toleratedMismatchRate == 0 as we fail right away there.
In case of toleratedMismatchRate != 0 log contains plenty of records that allow a user to make conclusions about deviations or increase print limit to observe the full log.

There's practically no use case for the suggested data.

I could understand adding something if there's a use case, but, I'm sorry, I can't understand adding more code if there's no foreseeable use case (taking into account all mentioned above).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I outlined the use case above - please take a look.
Having user scan through all tolerated mismatches in case there are many of these is error prone.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: in current implementation arguments fname1, fname2 are not assigned to any particular role of being reference or not.
Adding the requested printout will imply assigning these roles.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

lsatanov and others added 5 commits March 16, 2022 13:02
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
…es comparison (for cases resulted from deviations in GPU vs host-based FP computations).
bool cmp_binary_files(const char *testOutFile, const char *referenceFile,
const T tolerance = 0,
const double mismatchRateTolerance = 0,
const int mismatchReportLimit = 9) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbobrovs , maybe it's better to make it overridable via environment for convenience.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Could be a separate patch

typename std::enable_if<std::is_integral<T>::value ||
std::is_floating_point<T>::value,
int>::type = 0>
bool cmp_binary_files(const char *testOutFile, const char *referenceFile,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Args renamed to reflect roles of test output vs reference source.

Copy link
Author

@lsatanov lsatanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# command stderr:
Tolerated mismatch rate set to 0.0027
Mismatch at 135942 133 vs 118. Current mismatch rate: 0.00000000
Mismatch at 135943 45 vs 38. Current mismatch rate: 0.00000069
Mismatch at 135944 129 vs 126. Current mismatch rate: 0.00000139
Mismatch at 183876 146 vs 236. Current mismatch rate: 0.00000208
Mismatch at 183877 34 vs 76. Current mismatch rate: 0.00000278
Mismatch at 183878 234 vs 252. Current mismatch rate: 0.00000347
Mismatch at 183918 11 vs 252. Current mismatch rate: 0.00000417
Mismatch at 183919 227 vs 220. Current mismatch rate: 0.00000486
Mismatch at 183920 207 vs 204. Current mismatch rate: 0.00000556
Mismatch output stopped.
Total mismatch rate is 0.00267497 with max relative difference of 0.95634921
PASSED

@kbobrovs kbobrovs merged commit b63c248 into intel:intel Mar 24, 2022
myler pushed a commit to myler/llvm-test-suite that referenced this pull request Jun 17, 2022
…intel#885)

* [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
…intel/llvm-test-suite#885)

* [SYCL][ESIMD][EMU] tolerated mismatch rate in binary files comparison (e.g. for cases resulted from deviations in GPU vs host-based FP computations).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants