Skip to content

Fix potential read of uninitialized padding data in DOM #17628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

nielsdos
Copy link
Member

The fix for GH-17481 introduced a regression that can cause the read of uninitialized padding data when going over a chunk boundary during HTML parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the length of the error-corrected UTF-8 code point is not necessarily the same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it regularly, and ASAN doesn't catch uninitialized accesses.

The fix for phpGH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
@nielsdos
Copy link
Member Author

cc @NattyNarwhal @SakiTakamachi heads up that when this is approved I'd like to request to cherry-pick this into PHP-8.4.4 as the regression is not in a released version yet; and I don't think we should release versions with known regressions. It's kind of unfortunate to discover this the day after tagging, sorry for the trouble.

@NattyNarwhal
Copy link
Member

We just tagged RC1, so even if I can't get it into the RC1 tarball if it's locked in, we should be able to get it into 8.4.4 release.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can't have an MSAN build either for libxml as that would require building libxml ourselves. :/

@NattyNarwhal NattyNarwhal merged commit 81803b9 into php:PHP-8.4 Jan 30, 2025
9 of 10 checks passed
NattyNarwhal pushed a commit that referenced this pull request Jan 30, 2025
The fix for GH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
NattyNarwhal added a commit that referenced this pull request Jan 30, 2025
* PHP-8.4:
  Fix potential read of uninitialized padding data in DOM (#17628)
NattyNarwhal pushed a commit that referenced this pull request Jan 30, 2025
The fix for GH-17481 introduced a regression that can cause the read of
uninitialized padding data when going over a chunk boundary during HTML
parsing of UTF-8.
The wrong offset was computed with respect to the input buffer, the
length of the error-corrected UTF-8 code point is not necessarily the
same as the input code point length.
This was not noticed because no CI jobs run with Valgrind nor I do it
regularly, and ASAN doesn't catch uninitialized accesses.
NattyNarwhal added a commit that referenced this pull request Jan 30, 2025
Include regression fix from GH-17628 and another regression fix for
zlib-ng in ed1d51f.
@xPaw
Copy link
Contributor

xPaw commented Feb 7, 2025

Is it possible to add a test case for this?

@nielsdos
Copy link
Member Author

nielsdos commented Feb 7, 2025

Technically, the test suite already tests this, it's just that we don't see the failure because extensions like DOM are not tested under MSAN (it would require an MSAN-instrumented libxml), and we don't use Valgrind to run the test suite in CI because that's too slow. So this like these are easy to miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants