Skip to content

PHPLIB-1220: Support codec option for GridFS buckets #1149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions src/GridFS/Bucket.php
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
namespace MongoDB\GridFS;

use Iterator;
use MongoDB\BSON\Document;
use MongoDB\Codec\DocumentCodec;
use MongoDB\Collection;
use MongoDB\Driver\CursorInterface;
use MongoDB\Driver\Exception\RuntimeException as DriverRuntimeException;
Expand All @@ -35,6 +37,7 @@
use MongoDB\Operation\Find;

use function array_intersect_key;
use function array_key_exists;
use function assert;
use function fopen;
use function get_resource_type;
Expand Down Expand Up @@ -75,6 +78,8 @@ class Bucket

private const STREAM_WRAPPER_PROTOCOL = 'gridfs';

private ?DocumentCodec $codec = null;

private CollectionWrapper $collectionWrapper;

private string $databaseName;
Expand Down Expand Up @@ -142,6 +147,10 @@ public function __construct(Manager $manager, string $databaseName, array $optio
throw new InvalidArgumentException(sprintf('Expected "chunkSizeBytes" option to be >= 1, %d given', $options['chunkSizeBytes']));
}

if (isset($options['codec']) && ! $options['codec'] instanceof DocumentCodec) {
throw InvalidArgumentException::invalidType('"codec" option', $options['codec'], DocumentCodec::class);
}

if (! is_bool($options['disableMD5'])) {
throw InvalidArgumentException::invalidType('"disableMD5" option', $options['disableMD5'], 'boolean');
}
Expand All @@ -162,10 +171,15 @@ public function __construct(Manager $manager, string $databaseName, array $optio
throw InvalidArgumentException::invalidType('"writeConcern" option', $options['writeConcern'], WriteConcern::class);
}

if (isset($options['codec']) && isset($options['typeMap'])) {
throw InvalidArgumentException::cannotCombineCodecAndTypeMap();
}

$this->manager = $manager;
$this->databaseName = $databaseName;
$this->bucketName = $options['bucketName'];
$this->chunkSizeBytes = $options['chunkSizeBytes'];
$this->codec = $options['codec'] ?? null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your explanation for not injecting codec into $collectionOptions and only applying on-demand in specific methods, but now I'm curious if we're handling typeMap incorrectly. That is set directly into $collectionOptions and ends up applying to most calls on the files and chunks collections.

Bucket uses getRawFileDocumentForStream() internally as needed to access a raw document (without the type map). Does that not also cover us for a codec? Various CollectionWrapper methods explicitly set a simple typeMap when we need consistent access to a document, and that should avoid a codec being inherited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with the codec is that it also applies on incoming data for operations like insertOne, while typeMap only applies to outgoing data. Applying a codec on the collection broke inserting GridFS files, which is when I noticed that something was up.

In an ideal world, we'd be handling everything as raw BSON data internally and only decoding it using typeMap or codec when necessary to do so. However, I believe this goes beyond the scope of what we want to do here. I'd be in favour of revisiting this when we make changes to our GridFS API to accommodate the recent request to make the stream protocol available to consumers. The refactoring necessary to accommodate this would also allow us to change the Bucket to internally always work with BSON documents and removing this special handling entirely. The collections would then receive a ['root' => 'bson'] type map, and all handling regarding codecs and type maps for the file document would be handled by the Bucket class.

$this->disableMD5 = $options['disableMD5'];
$this->readConcern = $options['readConcern'] ?? $this->manager->getReadConcern();
$this->readPreference = $options['readPreference'] ?? $this->manager->getReadPreference();
Expand All @@ -188,6 +202,7 @@ public function __debugInfo()
{
return [
'bucketName' => $this->bucketName,
'codec' => $this->codec,
'databaseName' => $this->databaseName,
'disableMD5' => $this->disableMD5,
'manager' => $this->manager,
Expand Down Expand Up @@ -309,6 +324,10 @@ public function drop()
*/
public function find($filter = [], array $options = [])
{
if ($this->codec && ! array_key_exists('codec', $options)) {
$options['codec'] = $this->codec;
}

return $this->collectionWrapper->findFiles($filter, $options);
}

Expand All @@ -326,6 +345,10 @@ public function find($filter = [], array $options = [])
*/
public function findOne($filter = [], array $options = [])
{
if ($this->codec && ! array_key_exists('codec', $options)) {
$options['codec'] = $this->codec;
}

return $this->collectionWrapper->findOneFile($filter, $options);
}

Expand Down Expand Up @@ -381,6 +404,10 @@ public function getFileDocumentForStream($stream)
{
$file = $this->getRawFileDocumentForStream($stream);

if ($this->codec) {
return $this->codec->decode(Document::fromPHP($file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you re-create a BSON document that have already been decoded into PHP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes: the document may not have come from the database but may be stored by the WritableStream instance as a hash.

}

// Filter the raw document through the specified type map
return apply_type_map_to_document($file, $this->typeMap);
}
Expand Down
2 changes: 2 additions & 0 deletions src/GridFS/WritableStream.php
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ public function __construct(CollectionWrapper $collectionWrapper, string $filena
'_id' => $options['_id'],
'chunkSize' => $this->chunkSize,
'filename' => $filename,
'length' => null,
'uploadDate' => null,
] + array_intersect_key($options, ['aliases' => 1, 'contentType' => 1, 'metadata' => 1]);
}

Expand Down
71 changes: 71 additions & 0 deletions tests/Fixtures/Codec/TestFileCodec.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<?php

namespace MongoDB\Tests\Fixtures\Codec;

use DateTimeImmutable;
use MongoDB\BSON\Document;
use MongoDB\BSON\UTCDateTime;
use MongoDB\Codec\DecodeIfSupported;
use MongoDB\Codec\DocumentCodec;
use MongoDB\Codec\EncodeIfSupported;
use MongoDB\Exception\UnsupportedValueException;
use MongoDB\Tests\Fixtures\Document\TestFile;

use function assert;

final class TestFileCodec implements DocumentCodec
{
use DecodeIfSupported;
use EncodeIfSupported;

public function canDecode($value): bool
{
return $value instanceof Document;
}

public function decode($value): TestFile
{
if (! $value instanceof Document) {
throw UnsupportedValueException::invalidDecodableValue($value);
}

$fileObject = new TestFile();
$fileObject->id = $value->get('_id');
$fileObject->length = (int) $value->get('length');
$fileObject->chunkSize = (int) $value->get('chunkSize');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the int casts here? I notice you don't do casts for filename, but perhaps that's handled by PHP enforcing TestFile's property type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're operating on raw BSON, we may get Int64 instances, however unlikely they may be (especially for chunkSize). I decided to be prudent and cast these to int to avoid that issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my assumption. Thanks for confirming!

$fileObject->filename = (string) $value->get('filename');

$uploadDate = $value->get('uploadDate');
if ($uploadDate instanceof UTCDateTime) {
$fileObject->uploadDate = DateTimeImmutable::createFromMutable($uploadDate->toDateTime());
}

if ($value->has('metadata')) {
$metadata = $value->get('metadata');
assert($metadata instanceof Document);
$fileObject->metadata = $metadata->toPHP();
}

return $fileObject;
}

public function canEncode($value): bool
{
return $value instanceof TestFile;
}

public function encode($value): Document
{
if (! $value instanceof TestFile) {
throw UnsupportedValueException::invalidEncodableValue($value);
}

return Document::fromPHP([
'_id' => $value->id,
'length' => $value->length,
'chunkSize' => $value->chunkSize,
'uploadDate' => new UTCDateTime($value->uploadDate),
'filename' => $value->filename,
]);
}
}
31 changes: 31 additions & 0 deletions tests/Fixtures/Document/TestFile.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<?php
/*
* Copyright 2023-present MongoDB, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

namespace MongoDB\Tests\Fixtures\Document;

use DateTimeImmutable;
use stdClass;

final class TestFile
{
public $id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that GridFS permits custom _id values, so we'd be wrong to force ObjectId here.

public int $length;
public int $chunkSize;
public ?DateTimeImmutable $uploadDate = null;
public string $filename;
public ?stdClass $metadata = null;
}
125 changes: 125 additions & 0 deletions tests/GridFS/BucketFunctionalTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
use MongoDB\Model\BSONDocument;
use MongoDB\Model\IndexInfo;
use MongoDB\Operation\ListIndexes;
use MongoDB\Tests\Fixtures\Codec\TestDocumentCodec;
use MongoDB\Tests\Fixtures\Codec\TestFileCodec;
use MongoDB\Tests\Fixtures\Document\TestFile;
use stdClass;

use function array_merge;
use function call_user_func;
Expand Down Expand Up @@ -68,6 +72,7 @@ public function provideInvalidConstructorOptions()
return $this->createOptionDataProvider([
'bucketName' => $this->getInvalidStringValues(true),
'chunkSizeBytes' => $this->getInvalidIntegerValues(true),
'codec' => $this->getInvalidDocumentCodecValues(),
'disableMD5' => $this->getInvalidBooleanValues(true),
'readConcern' => $this->getInvalidReadConcernValues(),
'readPreference' => $this->getInvalidReadPreferenceValues(),
Expand All @@ -83,6 +88,17 @@ public function testConstructorShouldRequireChunkSizeBytesOptionToBePositive():
new Bucket($this->manager, $this->getDatabaseName(), ['chunkSizeBytes' => 0]);
}

public function testConstructorWithCodecAndTypeMapOptions(): void
{
$options = [
'codec' => new TestDocumentCodec(),
'typeMap' => ['root' => 'array', 'document' => 'array'],
];

$this->expectExceptionObject(InvalidArgumentException::cannotCombineCodecAndTypeMap());
new Bucket($this->manager, $this->getDatabaseName(), $options);
}

/** @dataProvider provideInputDataAndExpectedChunks */
public function testDelete($input, $expectedChunks): void
{
Expand Down Expand Up @@ -317,6 +333,41 @@ public function testFindUsesTypeMap(): void
$this->assertInstanceOf(BSONDocument::class, $fileDocument);
}

public function testFindUsesCodec(): void
{
$this->bucket->uploadFromStream('a', $this->createStream('foo'));

$cursor = $this->bucket->find([], ['codec' => new TestFileCodec()]);
$fileDocument = current($cursor->toArray());

$this->assertInstanceOf(TestFile::class, $fileDocument);
$this->assertSame('a', $fileDocument->filename);
}

public function testFindInheritsBucketCodec(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['codec' => new TestFileCodec()]);
$bucket->uploadFromStream('a', $this->createStream('foo'));

$cursor = $bucket->find();
$fileDocument = current($cursor->toArray());

$this->assertInstanceOf(TestFile::class, $fileDocument);
$this->assertSame('a', $fileDocument->filename);
}

public function testFindResetsInheritedBucketCodec(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['codec' => new TestFileCodec()]);
$bucket->uploadFromStream('a', $this->createStream('foo'));

$cursor = $bucket->find([], ['codec' => null]);
$fileDocument = current($cursor->toArray());

$this->assertInstanceOf(BSONDocument::class, $fileDocument);
$this->assertSame('a', $fileDocument->filename);
}

public function testFindOne(): void
{
$this->bucket->uploadFromStream('a', $this->createStream('foo'));
Expand All @@ -339,6 +390,64 @@ public function testFindOne(): void
$this->assertSameDocument(['filename' => 'b', 'length' => 6], $fileDocument);
}

public function testFindOneUsesCodec(): void
{
$this->bucket->uploadFromStream('a', $this->createStream('foo'));
$this->bucket->uploadFromStream('b', $this->createStream('foobar'));
$this->bucket->uploadFromStream('c', $this->createStream('foobarbaz'));

$fileDocument = $this->bucket->findOne(
['length' => ['$lte' => 6]],
[
'sort' => ['length' => -1],
'codec' => new TestFileCodec(),
],
);

$this->assertInstanceOf(TestFile::class, $fileDocument);
$this->assertSame('b', $fileDocument->filename);
$this->assertSame(6, $fileDocument->length);
}

public function testFindOneInheritsBucketCodec(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['codec' => new TestFileCodec()]);

$bucket->uploadFromStream('a', $this->createStream('foo'));
$bucket->uploadFromStream('b', $this->createStream('foobar'));
$bucket->uploadFromStream('c', $this->createStream('foobarbaz'));

$fileDocument = $bucket->findOne(
['length' => ['$lte' => 6]],
['sort' => ['length' => -1]],
);

$this->assertInstanceOf(TestFile::class, $fileDocument);
$this->assertSame('b', $fileDocument->filename);
$this->assertSame(6, $fileDocument->length);
}

public function testFindOneResetsInheritedBucketCodec(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['codec' => new TestFileCodec()]);

$bucket->uploadFromStream('a', $this->createStream('foo'));
$bucket->uploadFromStream('b', $this->createStream('foobar'));
$bucket->uploadFromStream('c', $this->createStream('foobarbaz'));

$fileDocument = $bucket->findOne(
['length' => ['$lte' => 6]],
[
'sort' => ['length' => -1],
'codec' => null,
],
);

$this->assertInstanceOf(BSONDocument::class, $fileDocument);
$this->assertSame('b', $fileDocument->filename);
$this->assertSame(6, $fileDocument->length);
}

public function testGetBucketNameWithCustomValue(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['bucketName' => 'custom_fs']);
Expand Down Expand Up @@ -388,6 +497,22 @@ public function testGetFileDocumentForStreamUsesTypeMap(): void
$this->assertSame(['foo' => 'bar'], $fileDocument['metadata']->getArrayCopy());
}

public function testGetFileDocumentForStreamUsesCodec(): void
{
$bucket = new Bucket($this->manager, $this->getDatabaseName(), ['codec' => new TestFileCodec()]);

$metadata = ['foo' => 'bar'];
$stream = $bucket->openUploadStream('filename', ['_id' => 1, 'metadata' => $metadata]);

$fileDocument = $bucket->getFileDocumentForStream($stream);

$this->assertInstanceOf(TestFile::class, $fileDocument);

$this->assertSame('filename', $fileDocument->filename);
$this->assertInstanceOf(stdClass::class, $fileDocument->metadata);
$this->assertSame($metadata, (array) $fileDocument->metadata);
}

public function testGetFileDocumentForStreamWithReadableStream(): void
{
$metadata = ['foo' => 'bar'];
Expand Down