-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Remove charset tests with race conditions #12571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These tests don't seem reliable (with documented race conditions). Remove them since they regularly kill the CI.
Codecov Report
@@ Coverage Diff @@
## master #12571 +/- ##
=======================================
Coverage 43.43% 43.43%
=======================================
Files 643 643
Lines 71165 71165
=======================================
+ Hits 30910 30913 +3
- Misses 35246 35247 +1
+ Partials 5009 5005 -4
Continue to review full report at Codecov.
|
Could also just comment it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to disagree - if this race is still happening 4 out of 5 our preferred charset stuff isn't working correctly and people's documents will be being detected incorrectly.
We need to fix that not just paper over by deleting the test that is showing the problem.
For example, what is relative score of iso-8859-1 Vs the iso-8859-2? Would be as simple as adding a small boost to preferred charsets?
This test is causing like every fourth CI run to fail which wastes a lot of valuable time of contributors. We can keep an issue around to keep it a "known issue" but as it is now, this test randomly failing does not gain us any additional knowledge. |
I really wonder how something like charset detection can be non-deterministic. Do we know if the bug comes from gogs/chardet or our code?
|
That this test is still failing means that we are detecting the wrong character set frequently in certain cases. For example adding this would allow us to investigate what is going on with scores: diff --git a/modules/charset/charset.go b/modules/charset/charset.go
index a7e427db9..8a7fd90c5 100644
--- a/modules/charset/charset.go
+++ b/modules/charset/charset.go
@@ -149,6 +149,17 @@ func DetectEncoding(content []byte) (string, error) {
return "", err
}
+ if log.IsTrace() {
+ logText := &strings.Builder{}
+ logValues := []interface{}{}
+ _, _ = logText.WriteString("Detected Encodings:\n")
+ for _, result := range results {
+ _, _ = logText.WriteString("%d %s\n")
+ logValues = append(logValues, result.Confidence, result.Charset)
+ }
+ log.Trace(logText.String(), logValues...)
+ }
+
topConfidence := results[0].Confidence
topResult := results[0]
priority, has := setting.Repository.DetectedCharsetScore[strings.ToLower(strings.TrimSpace(topResult.Charset))] |
OK the issue is that although I fixed this in Gitea proper the Test does not set the default detectedcharsetsorder. The below would fix this:
|
TestToUTF8WithFallback is the cause of recurrent spurious test failures even despite code to set the detected charset order. The reason why this happens is because the preferred detected charset order is not being initialised for these tests. This PR simply ensures that this is set at the start of each test and would allow different tests to be written to allow differing orders. Replaces go-gitea#12571 Close go-gitea#12571 Signed-off-by: Andrew Thornton <[email protected]>
good :) |
TestToUTF8WithFallback is the cause of recurrent spurious test failures even despite code to set the detected charset order. The reason why this happens is because the preferred detected charset order is not being initialised for these tests. This PR simply ensures that this is set at the start of each test and would allow different tests to be written to allow differing orders. Replaces #12571 Close #12571 Signed-off-by: Andrew Thornton <[email protected]>
…ea#12574) Backport go-gitea#12574 TestToUTF8WithFallback is the cause of recurrent spurious test failures even despite code to set the detected charset order. The reason why this happens is because the preferred detected charset order is not being initialised for these tests. This PR simply ensures that this is set at the start of each test and would allow different tests to be written to allow differing orders. Replaces go-gitea#12571 Close go-gitea#12571 Signed-off-by: Andrew Thornton <[email protected]>
These tests don't seem reliable (with documented race conditions). Remove them since they regularly kill the CI.
For both
gitea/modules/charset/charset_test.go
Lines 91 to 95 in 2026d88
and
gitea/modules/charset/charset_test.go
Lines 136 to 141 in 2026d88
It is definitely hitting this race condition that we already 'accept' in the tests below:
gitea/modules/charset/charset_test.go
Lines 228 to 230 in 2026d88
Both tests pass when we detect it as
ISO-8859-1
but fail when they sometimes gets detected asISO-8859-2
With these tests locally it fails one out of every 4 or 5 tries, after removing it never fails