-
Notifications
You must be signed in to change notification settings - Fork 455
[CDRIVER-5611] Refactor of validation fuzz test to expand coverage and speed #1822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDRIVER-5611] Refactor of validation fuzz test to expand coverage and speed #1822
Conversation
- Fuzz test accepts the inner body of a document as the test input, rather than the full document. This greatly accelerates fuzz corpus discovery. - Treat the first byte of the test input as a set of flags to control behavior. This allows the fuzzer to discover different branches that are hidden behind the presence/absence of certain validation flags. - Add a separate exe that reads a corpus item and runs the same validation code on it (for issue repro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor feedback; otherwise, LGTM.
src/libbson/fuzz/CMakeLists.txt
Outdated
@@ -5,8 +5,10 @@ set_property(TARGET fuzz_test_properties APPEND PROPERTY INTERFACE_LINK_OPTIONS | |||
add_executable(fuzz_test_init_from_json EXCLUDE_FROM_ALL fuzz_test_init_from_json.c) | |||
target_link_libraries(fuzz_test_init_from_json PRIVATE fuzz_test_properties) | |||
|
|||
add_executable(fuzz_test_validate EXCLUDE_FROM_ALL fuzz_test_validate.c) | |||
add_executable(fuzz_test_validate EXCLUDE_FROM_ALL fuzz_test_validate.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend updating the description of the MONGO_FUZZ
CMake option to include "(requires C++)" to document the additional toolchain dependency required to build fuzz testing targets. Also consider guarding these targets with if (CMAKE_CXX_COMPILER)
as done for the mongoc-cxx-check
target.
src/libbson/fuzz/CMakeLists.txt
Outdated
@@ -5,8 +5,10 @@ set_property(TARGET fuzz_test_properties APPEND PROPERTY INTERFACE_LINK_OPTIONS | |||
add_executable(fuzz_test_init_from_json EXCLUDE_FROM_ALL fuzz_test_init_from_json.c) | |||
target_link_libraries(fuzz_test_init_from_json PRIVATE fuzz_test_properties) | |||
|
|||
add_executable(fuzz_test_validate EXCLUDE_FROM_ALL fuzz_test_validate.c) | |||
add_executable(fuzz_test_validate EXCLUDE_FROM_ALL fuzz_test_validate.cpp) | |||
add_executable(validate-repro EXCLUDE_FROM_ALL validate-repro.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_executable(validate-repro EXCLUDE_FROM_ALL validate-repro.cpp) | |
add_executable(validate_repro EXCLUDE_FROM_ALL validate-repro.cpp) |
Consistent casing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to switch everything else to be hyphenated? Most components are hyphenated currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
Aside: a quick grep reveals the list of user-defined CMake targets (which are --target
-able, excluding those generated by the build system) which use _
is currently limited to the following list:
bson_shared
bson_static
hello_mongoc
mcd_rpc
mongo_c_driver_examples
mongo_c_driver_tests
mongoc_shared
mongoc_static
@@ -0,0 +1,9 @@ | |||
#include <stdint.h> | |||
|
|||
#include "./fuzz_test_validate.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "./fuzz_test_validate.hpp" | |
#include "fuzz_test_validate.hpp" |
The #include ""
form is already relative to current file's directory (?).
src/libbson/fuzz/validate-repro.cpp
Outdated
#include <fstream> | ||
#include <sstream> | ||
|
||
#include "./fuzz_test_validate.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include "./fuzz_test_validate.hpp" | |
#include "fuzz_test_validate.hpp" |
The #include ""
form is already relative to current file's directory (?).
// Treat the first byte as the flags, and the BSON data is what remains: | ||
const uint8_t *bson_doc_data = data + 1; | ||
const size_t bson_data_size = size - 1; | ||
if (bson_data_size > INT32_MAX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (bson_data_size > INT32_MAX) { | |
if (bson_data_size > static_cast<std::size_t>(INT32_MAX)) { |
Signedness consistency.
src/libbson/fuzz/make-dicts.py
Outdated
|
||
|
||
def len_prefix(b: BytesIter) -> bytes: | ||
"""Prepend a u32le byte-length prefix to a set of bytes""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Prepend a u32le byte-length prefix to a set of bytes""" | |
"""Prepend a i32le byte-length prefix to a set of bytes""" |
I expect this is encoded as a signed int32 (which matches the BSON spec for encoded lengths).
src/libbson/fuzz/make-dicts.py
Outdated
def escape(b: bytes) -> Iterable[str]: | ||
s = b.decode("ascii", "backslashreplace") | ||
for u8 in b: | ||
try: | ||
s = chr(u8) | ||
s.encode("ascii") # Try to encode as ASCII | ||
except ValueError: | ||
# Byte is not valid UTF-8, or not valid ASCII | ||
pass | ||
else: | ||
if s.isprintable(): | ||
yield s | ||
continue | ||
# Byte is not valid ASCII, or is not a printable char | ||
yield f"\\x{u8:0>2x}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested simplification:
def escape(b: bytes) -> Iterable[str]: | |
s = b.decode("ascii", "backslashreplace") | |
for u8 in b: | |
try: | |
s = chr(u8) | |
s.encode("ascii") # Try to encode as ASCII | |
except ValueError: | |
# Byte is not valid UTF-8, or not valid ASCII | |
pass | |
else: | |
if s.isprintable(): | |
yield s | |
continue | |
# Byte is not valid ASCII, or is not a printable char | |
yield f"\\x{u8:0>2x}" | |
def escape(b: bytes) -> Iterable[str]: | |
for u8 in b: | |
s = chr(u8) | |
if s.isascii() and s.isprintable(): | |
yield s | |
continue | |
# Byte is not valid ASCII, or is not a printable char | |
yield f"\\x{u8:0>2x}" |
src/libbson/fuzz/validate-repro.cpp
Outdated
if (rc == 0) { | ||
std::fprintf (stderr, "Validation returned normally (no bug?)\n"); | ||
} else if (rc == -1) { | ||
std::fprintf (stderr, "Test case is was rejected by the fuzzer (not a valid corpus item?)\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::fprintf (stderr, "Test case is was rejected by the fuzzer (not a valid corpus item?)\n"); | |
std::fprintf (stderr, "Test case is/was rejected by the fuzzer (not a valid corpus item?)\n"); |
- `fuzz_test_` prefix is redundant in a directory for fuzz tests. - Use `kebab-case` for target and file names.
Summary
The existing
fuzz_test_validate
suffered from two issues that made fuzzing less than optimal:bson_validate
, which will reduce the branch coverage that is reachable by the fuzzer.bson_init_static
until the fuzzer would coincidentally discover the need to update the document header to match the new data size. Fuzzing would stagnate at around 475 corpus items.This changeset addresses both issues:
bson_validate
. This allows the fuzzer to test multiple branches within validation.bson_init_static
will always succeed and we can actually get to testingbson_validate
. This allows the fuzzer to generate over 500 valid inputs in the first 30 seconds of running (with-fork=32
).Additionally:
bson_validate
on a corpus item directly. This can be used to repro any issue discovered by the fuzz tester.