Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

PHPLIB-1339 Add tests on Bitwise Query Operators #37

Merged
merged 5 commits into from
Jan 22, 2024

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Jan 18, 2024

Fix PHPLIB-1339

https://www.mongodb.com/docs/manual/reference/operator/query/#bitwise

  • $bitsAllClear
  • $bitsAllSet
  • $bitsAnyClear
  • $bitsAnySet

I made all this operators variadics, as suggested previously for $type.

@GromNaN GromNaN requested a review from alcaeus January 18, 2024 18:00
- $match:
a:
$bitsAllClear:
- { $binary: { base64: 'TUM9PQ==', subType: '00' } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no idea what I'm doing with this binary data. This is the Extended JSON representation of the PHP new Binary('MC=='), which I want to be the equivalent of the JS BinData(0, "ID==")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Binary::__construct() takes a string argument representing binary data. So if you're constructing this in PHP, you should be doing new Binary(base64_decode(...)).

I have no idea why the docs use base64 ID== as the representation of binary 00100000. Even the shell seems to disagree:

> BinData(0, "ID==")
Binary.createFromBase64("IA==", 0)

Using a tool like aaronluna.dev/base64, you can see that ID== has the following representation:

ASCII
HEX     2   0
8-BIT   00100000

6-BIT   001000 000011
DECIMAL 8      3
BASE64  I      D     =     =

IA== is:

ASCII
HEX     2   0
8-BIT   00100000

6-BIT   001000 000000
DECIMAL 8      0
BASE64  I      D     =     =

That suggests ID== is non-canonical, since it denotes 12 bits of data. Only by truncating the last four bits do you end up with the same value as IA==. I think this is worth a DOCSP ticket to suggest a change to the server docs.

libbson actually raises an error from _bson_json_parse_binary_elem() when parsing ID== (perhaps rightfully so):

Invalid input string "ID==", looking for base64-encoded binary

I'd suggest switching over to IA== in this example and leaving a comment in the YAML explaining that you modified the value from the docs to use the more canonical representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm fixing the BSON binary type in #34 using a Yaml tag.

The tag !binary 'IA==' will be converted to new MongoDB\BSON\Binary(base64_decode('IA=='))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN GromNaN requested a review from jmikola January 18, 2024 19:51
- $match:
a:
$bitsAllClear:
- { $binary: { base64: 'TUM9PQ==', subType: '00' } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Binary::__construct() takes a string argument representing binary data. So if you're constructing this in PHP, you should be doing new Binary(base64_decode(...)).

I have no idea why the docs use base64 ID== as the representation of binary 00100000. Even the shell seems to disagree:

> BinData(0, "ID==")
Binary.createFromBase64("IA==", 0)

Using a tool like aaronluna.dev/base64, you can see that ID== has the following representation:

ASCII
HEX     2   0
8-BIT   00100000

6-BIT   001000 000011
DECIMAL 8      3
BASE64  I      D     =     =

IA== is:

ASCII
HEX     2   0
8-BIT   00100000

6-BIT   001000 000000
DECIMAL 8      0
BASE64  I      D     =     =

That suggests ID== is non-canonical, since it denotes 12 bits of data. Only by truncating the last four bits do you end up with the same value as IA==. I think this is worth a DOCSP ticket to suggest a change to the server docs.

libbson actually raises an error from _bson_json_parse_binary_elem() when parsing ID== (perhaps rightfully so):

Invalid input string "ID==", looking for base64-encoded binary

I'd suggest switching over to IA== in this example and leaving a comment in the YAML explaining that you modified the value from the docs to use the more canonical representation.

- $match:
a:
$bitsAllSet:
- { $binary: { base64: 'TUM9PQ==', subType: '00' } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.mongodb.com/docs/manual/reference/operator/query/bitsAllSet/#bindata-bitmask uses MC==, which also isn't canonical:

> BinData(0, "MC==")
Binary.createFromBase64("MA==", 0)

I'd suggest including this in your DOCSP ticket as well. I've no idea where the author(s) got these values from.

- $match:
a:
$bitsAnyClear:
- { $binary: { base64: 'TUM9PQ==', subType: '00' } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue:

> BinData(0, "MC==")
Binary.createFromBase64("MA==", 0)

- $match:
a:
$bitsAnySet:
- { $binary: { base64: 'TUM9PQ==', subType: '00' } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue:

> BinData(0, "MC==")
Binary.createFromBase64("MA==", 0)

- $match:
a:
$bitsAllClear:
- !binary 'IA=='
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about YAML binary strings. Since this is a single line and we're not dealing with whitespace, I think it'd be reasonable to use the canonical !!binary notation.

But I'm also curious if a binary string is even appropriate here. The corresponding example uses a BSON binary type, so shouldn't you be using extended JSON notation here? I think that was in an earlier draft of the PR.

Same question applies to other files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I rename the !binary tag to !bson_binary, and the same for other tags?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these tags are custom on your part (wasn't clear to me if you were re-using existing YAML syntax), I think that'd be a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #43

- $match:
a:
# $bitsAllClear: 35
$bitsAllClear: [35]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the array? The original Integer Example uses 35, which seems to be what you have in the comment above.

Same question applies to other files.

Copy link
Member Author

@GromNaN GromNaN Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment.

Example uses the short form, the builder always generates the array form

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I'm being obtuse, but $bitsAllClear: [35] seems entirely different than $bitsAllClear: 35. The former would be checking for the 35th bit and the latter checks for the bits according to the binary representation of int(35). These are functionally different APIs (for all of these operators).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for insisting, it's fixed.

- $match:
a:
$bitsAllSet:
- !binary 'IA=='
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: apart from bitsAllClear, all of the other examples in the server docs used "MC==", which should be "MA==".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not very relevant for what we are testing, but I fixed. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, I don't imagine it affects the test, but thought it may be helpful to keep things in sync with the doc examples.

"a": {
"$bitsAllClear": [
{
"$binary": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the !binary syntax in YAML is producing extended JSON for a BSON binary type here. Where does that conversion happen? I'd have expected the YAML to just produce a binary string, which would still be parsed as a string type.

Do any of the YAML templates include extended JSON?

Edit: I just saw https://github.com/mongodb/mongo-php-builder/pull/34/files#diff-a715bdcf88cd771d7ae09bd01863bab314ff194ba28e984689e9040d87846db9R20 and https://github.com/mongodb/mongo-php-builder/pull/34/files#diff-41189b0703bfb03f394530009a7d9c88c508f36a39585abd9f73017d6f77745eR135 from PHPLIB-1350 -- so this appears intentional.

- $match:
a:
# $bitsAllClear: 35
$bitsAllClear: [35]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I'm being obtuse, but $bitsAllClear: [35] seems entirely different than $bitsAllClear: 35. The former would be checking for the 35th bit and the latter checks for the bits according to the binary representation of int(35). These are functionally different APIs (for all of these operators).

*/
public function __construct(Binary|PackedArray|BSONArray|array|int|string $bitmask)
public function __construct(Binary|int|string ...$bitmask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

I think the problem I'm pointing out in the above threads is due to ambiguity when a single integer is passed here. If 5 is passed to this method, it is interpreted as [5], which is not the same.

I think this needs to be reworked to accept either:

  • An array of ints
  • A single int
  • A single BSON Binary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, there are 3 distinct behaviors. I'll remove the variadic and change the type to int|binary|list<int>.

@GromNaN GromNaN requested a review from jmikola January 22, 2024 15:08
@GromNaN GromNaN merged commit 7c9a36a into mongodb:0.1 Jan 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants