-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Fix potential read of uninitialized padding data in DOM #17628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The fix for phpGH-17481 introduced a regression that can cause the read of uninitialized padding data when going over a chunk boundary during HTML parsing of UTF-8. The wrong offset was computed with respect to the input buffer, the length of the error-corrected UTF-8 code point is not necessarily the same as the input code point length. This was not noticed because no CI jobs run with Valgrind nor I do it regularly, and ASAN doesn't catch uninitialized accesses.
cc @NattyNarwhal @SakiTakamachi heads up that when this is approved I'd like to request to cherry-pick this into PHP-8.4.4 as the regression is not in a released version yet; and I don't think we should release versions with known regressions. It's kind of unfortunate to discover this the day after tagging, sorry for the trouble. |
We just tagged RC1, so even if I can't get it into the RC1 tarball if it's locked in, we should be able to get it into 8.4.4 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can't have an MSAN build either for libxml as that would require building libxml ourselves. :/
The fix for GH-17481 introduced a regression that can cause the read of uninitialized padding data when going over a chunk boundary during HTML parsing of UTF-8. The wrong offset was computed with respect to the input buffer, the length of the error-corrected UTF-8 code point is not necessarily the same as the input code point length. This was not noticed because no CI jobs run with Valgrind nor I do it regularly, and ASAN doesn't catch uninitialized accesses.
* PHP-8.4: Fix potential read of uninitialized padding data in DOM (#17628)
The fix for GH-17481 introduced a regression that can cause the read of uninitialized padding data when going over a chunk boundary during HTML parsing of UTF-8. The wrong offset was computed with respect to the input buffer, the length of the error-corrected UTF-8 code point is not necessarily the same as the input code point length. This was not noticed because no CI jobs run with Valgrind nor I do it regularly, and ASAN doesn't catch uninitialized accesses.
Is it possible to add a test case for this? |
Technically, the test suite already tests this, it's just that we don't see the failure because extensions like DOM are not tested under MSAN (it would require an MSAN-instrumented libxml), and we don't use Valgrind to run the test suite in CI because that's too slow. So this like these are easy to miss. |
The fix for GH-17481 introduced a regression that can cause the read of uninitialized padding data when going over a chunk boundary during HTML parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the length of the error-corrected UTF-8 code point is not necessarily the same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it regularly, and ASAN doesn't catch uninitialized accesses.