Skip to content

Fixed PHPC-922: Improve specificity in generic corrupt BSON exception message #776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 9, 2018

Conversation

derickr
Copy link
Contributor

@derickr derickr commented Mar 7, 2018

No description provided.

@derickr derickr changed the title Fixed PHPC-922: Improve specificity in generic corrupt BSON exception message [WIP] Fixed PHPC-922: Improve specificity in generic corrupt BSON exception message Mar 7, 2018
@derickr derickr requested a review from jmikola March 7, 2018 16:32
@derickr derickr changed the title [WIP] Fixed PHPC-922: Improve specificity in generic corrupt BSON exception message Fixed PHPC-922: Improve specificity in generic corrupt BSON exception message Mar 7, 2018
@@ -0,0 +1,21 @@
--TEST--
PHPC-531: Segfault due to double free by corrupt BSON visitor (nested document)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd for this to add a new PHPC-531 test; however, bson-toPHP_error-004.phpt is the only other place where we test for "Detected unknown BSON type" outside of BSON spec tests.

Ideally, this would be a new test case in that file (using pack()), or a newly created bson-toPHP_error-005.phpt file (there is a gap between 4 and 6). The latter would be fairly low effort, but I won't argue if you want to leave this as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it made more sense to clone this already existing test. But whatever.

@@ -0,0 +1,36 @@
--TEST--
MongoDB\BSON\toPHP(): Field path values with bson_iter_visit_all() failures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this is 006 if 005 doesn't exist?

@@ -0,0 +1,47 @@
--TEST--
BSON\fromPHP(): PHP documents with circular references
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MongoDB\BSON\fromPHP()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd how that snuck in. Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how this snuck in, -004 had the same. I'll make sure to keep this in a separate commit.

echo "\nTesting packed array with circular reference\n";

echo throws(function() {
$document = ['x' => 1, 'y' => [1, 2, 3]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks redundant, since bson-fromPHP_error-004.phpt already tests $document = ['x' => 1, 'y' => []]; and asserts that the field path is "y.0". All this is testing is that $document['y'][] = &$document['y']; on the subsequent line appends the fourth element in a packed array instead of the first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it tests that the error message shows "3" instead of "0", so I'd rather keep this.

$document = (object) ['x' => 1, 'y' => (object) []];
$document->y->z = &$document->y;
fromPHP($document);
}, 'MongoDB\Driver\Exception\UnexpectedValueException'), "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is identical to the one in bson-fromPHP_error-004.phpt and should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might extend it with another nested level actually.

$document = ['x' => 1, 'y' => []];
$document['y'][0]['z'] = &$document['y'];
fromPHP($document);
}, 'MongoDB\Driver\Exception\UnexpectedValueException'), "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the only new test, which is only checking that ".0." is incorporated into the exception message's field path. I think this should be be added to the 004 test as an additional case since it's so closely related to the other cases. The test case description should then be: "Testing associative array in packed array with circular reference" or something to that effect.

Noting the historic argument of not modifying existing test cases, I'd argue that spreading closely related tests out across multiple files only complicates maintainability. This would also only be a test case addition to an existing file (not altering existing tests in any way).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll conflate them. I am still very much against changing test cases, but this just really adds things, so I'm less against it.

@@ -7,7 +7,7 @@ require_once __DIR__ . '/../utils/tools.php';

class MyRecursiveSerializable implements MongoDB\BSON\Serializable
{
public $x = 1;
public $child = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the message, I assume this change was necessary to demonstrate that BSON serialization never even reaches this property. That would have been difficult before since both properties shared the same name ("x").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

@@ -32,15 +32,15 @@ foreach ($tests as $bson) {
?>
===DONE===
<?php exit(0); ?>
--EXPECTF--
--EXPECT--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have some concerns about the amount of memory allocations this is introducing during most every stage of BSON serialization and deserialization. I reckon some immediate improvements can be made by reducing php_phongo_field_path_write_item_at_current_level() calls in some places where it's not needed.

IIRC you had implemented some driver benchmarks some time ago (at least for BSON). Assuming that's still accessible, can we get that added to the repository so we can run it and measure the impact of this change?

If we can make some assumptions about the field path during BSON parsing (e.g. dots never appear in keys), I'd also like to test a different implementation of php_phongo_field_path using a single string buffer instead of a linked list. That's not an immediate task, but something I'd create a ticket to circle back on down the line and revisit once we can do benchmarks.

php_phongo_bson_to_zval_ex(bson_get_data(doc), doc->len, &cursor->visitor_data);
php_phongo_bson_state_dtor(&cursor->visitor_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are additional calls to php_phongo_bson_to_zval_ex() outside of Cursor.c that weren't wrapped with php_phongo_bson_state_ctor() and dtor() (e.g. php_phongo_server_to_zval(), APM). I assume those cases don't require manual ctor() and dtor() calls, as php_phongo_bson_to_zval_ex() contains logic to allocate on its own:

if (!php_phongo_bson_state_is_initialized(state)) {
	php_phongo_bson_state_ctor(state);
	must_dtor_state = true;
}

In that case, why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the state in cursor->visitor_data is being reused multiple times here (once with each iteration), and there is/was something that was going wrong without it. I'll have another look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you going to look into this before merging? If not, can you create another ticket in the epic to go back and look into it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have a look, and the fix was as simple as:

diff --git a/src/bson.c b/src/bson.c
index 2a0a79b6..8d9001e1 100644
--- a/src/bson.c
+++ b/src/bson.c
@@ -170,6 +170,7 @@ void php_phongo_bson_state_dtor(php_phongo_bson_state* state)
                if (state->field_path->ref_count < 1) {
                        php_phongo_field_path_free(state->field_path);
                }
+               state->field_path = NULL;
        }
 } /* }}} */

src/bson.c Outdated
phongo_throw_exception(PHONGO_ERROR_UNEXPECTED_VALUE TSRMLS_CC, "Detected corrupt BSON data");
char* path = php_phongo_field_path_as_string(state->field_path);

if (path) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the only place that you check for a NULL return value from php_phongo_field_path_as_string()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's the only place where it segfaulted if I didn't ;-). I wouldn't mind changing the path string function to return an empty string instead, if it was really empty or unset. That'd be a bit cleaner.

src/bson.c Outdated
char* path = php_phongo_field_path_as_string(state->field_path);

if (path) {
phongo_throw_exception(PHONGO_ERROR_UNEXPECTED_VALUE TSRMLS_CC, "Detected corrupt BSON data for field path '%s' at offset %d", path, iter.err_off);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously verified that err_off is only relative to the current nesting level, as it's reset to zero in bson_iter_init() (called in our array/document visitor callback) and bson_iter_recurse(). Do you still think it's worth including in the error message alongside the field path?

My opinion is that it's free information so I don't see the harm with including it. But I'm also not sure it tells us anything that field path doesn't already. If we're not certain that it's entirely redundant, we can keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, it doesn't do any harm. So I vote for keeping it for now.

add_next_index_zval(retval, zchild);
} else {
ADD_ASSOC_ZVAL(retval, key, zchild);
}
#endif /* PHP_VERSION_ID >= 70000 */

php_phongo_field_path_write_item_at_current_level(state->field_path, key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do some visitor functions (excluding array/document) call this at the top of the function and others at the bottom? I'd prefer to structure of the visitor functions consistent unless there's good reason not to.

On a separate note, is there even any reason to track the field path at this point? Excluding array/document, these visitors are all leaves in the field path tree. php_phongo_field_path_as_string() is only needed for exceptions for corrupt BSON data when php_phongo_bson_to_zval_ex() aborts its bson_iter_visit_all() early with a non-Exception error or in php_phongo_bson_visit_unsupported_type(), which correctly adds the current field name to the state before generating the path string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider the second point, but thought this was cleaner.

The reason why I put it some times at the bottom, is because of the #ifdef sections defining another variable first, so I had to put it here. But I did that after I had put the other ones at the top. So, I can change the ones that have it at the top, to the bottom.

Copy link
Member

@jmikola jmikola Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if we can use the compatibility macros for PHP 5/7 zval initialization in these. php_phongo_field_path_write_item_at_current_level() can then be moved to the top of each function, since you'll probably need that information for implementing other ODM functionality.

src/bson.c Outdated

bool php_phongo_field_path_pop(php_phongo_field_path* field_path)
{
field_path->current_level--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for this to also check if field_path->elements[field_path->current_level] is null before decrementing? Currently, it looks like this might leave an element beyond the post-pop current element allocated. That runs contrary to logic you have in php_phongo_field_path_ensure_allocation() to ensure that pointers beyond the current level are set to null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should always decrement, but I can change it so that it throws (an assertion) if curent_level is already <= 1. It doesn't particularly matter if we don't free the current level, as it gets cleaned up anyway (either when re-written at this level, or on _free).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of just ensuring unused pointers are set to NULL, let's do the follwowing before decrementing the current level, too:

if (field_path->elements[field_path->current_level] != NULL) {
	field_path->elements[field_path->current_level] = NULL;
}

src/bson.c Outdated
return true;
}

static bool php_phongo_bson_state_is_initialized(php_phongo_bson_state* state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use an inline hint here? AFAIK, it's used very sparingly (BulkWrite.c and php_array_api.h).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows isn't often keen on inline. But I guess as when it's a static, it shouldn't matter. That said, I believe compilers are clever enough anyway that a hint isn't needed. I'll add it though, as long as Windows doesn't bitch.

src/bson.c Outdated
void php_phongo_bson_state_copy_ctor(php_phongo_bson_state* dst, php_phongo_bson_state* src)
{
dst->map = src->map;
src->field_path->ref_count++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be prudent to check that src->field_path is allocated before doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should do that. Although logically that shouldn't happen as the _copy_ctor only happens with a nested structure.

src/bson.c Outdated
TSRMLS_FETCH();

php_phongo_field_path_push(((php_phongo_bson_state*) data)->field_path, key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

((php_phongo_bson_state*) data) can just be parent_state, as you did in php_phongo_bson_visit_document().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will fix that.

Copy link
Contributor Author

@derickr derickr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated with your comments.

@@ -0,0 +1,47 @@
--TEST--
BSON\fromPHP(): PHP documents with circular references
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know how this snuck in, -004 had the same. I'll make sure to keep this in a separate commit.

src/bson.c Outdated
continue;
}
field_path->elements[i] = NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do without this for loop, since we're going to free the entire elements array anyway. This may be a leftover from when you were actually freeing each element string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about Cursor.c and a suggestion to remove a for loop when freeing the field path (can be changed). Other than that, LGTM.


for (i = level; i < field_path->allocated_levels; i++) {
field_path->elements[i] = NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest using memset() here to remove another loop, but https://stackoverflow.com/a/2141373/162228 suggests why this is preferable. That said, I doubt PHP is running on any platforms where a NULL pointer is not zero bytes (famous last words).

php_phongo_bson_to_zval_ex(bson_get_data(doc), doc->len, &cursor->visitor_data);
php_phongo_bson_state_dtor(&cursor->visitor_data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you going to look into this before merging? If not, can you create another ticket in the epic to go back and look into it later?

@derickr derickr force-pushed the PHPC-922-improve-error-message branch from b769e32 to 6ad27e6 Compare March 9, 2018 16:58
@derickr derickr force-pushed the PHPC-922-improve-error-message branch from 6ad27e6 to 8136d50 Compare March 9, 2018 17:17
@derickr derickr merged commit 8136d50 into mongodb:master Mar 9, 2018
derickr added a commit that referenced this pull request Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants