Skip to content

DOCS-655 GridFS main page #418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 12, 2012
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 230 additions & 0 deletions source/applications/gridfs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
.. index:: GridFS

======
GridFS
======

.. default-domain:: mongodb

:term:`GridFS` is a specification for storing and retrieving files that
exceed the :term:`BSON`-document :ref:`size limit
<limit-bson-document-size>` of 16MB.

Instead of storing a file in an single document, GridFS divides a file
into chunks and stores each of those chunks as a separate document. By
default GridFS limits chunk size to 256k. GridFS uses two collections to
store files. One collection stores the file chunks, and the other stores
file metadata.

When you query for a file stored through GridFS, GridFS reassembles the chunks
as needed. You can perform range queries on files stored through GridFS.
You also can access information from random sections of files, for
example skipping into the middle of a video.

GridFS is useful not only for storing files that exceed 16MB but also
for storing any files for which you want access without having to load the
entire file into memory. For more information on when to use GridFS, see
:ref:`faq-developers-when-to-use-gridfs`.

.. index:: GridFS; initialize
.. _gridfs-implement:

Implement GridFS
----------------

To store and retrieve files using :term:`GridFS`, use either of the following:

- A MongoDB driver. See the :doc:`drivers</applications/drivers>`
documentation for information on using GridFS with your driver.

- The :program:`mongofiles` command-line tool in the :program:`mongo`
shell. See :doc:`/reference/mongofiles`.

.. index:: GridFS; collections
.. _gridfs-collections:

GridFS Collections
------------------

:term:`GridFS` stores files in two collections:

- ``chunks`` stores the binary chunks. For details, see
:ref:`gridfs-chunks-collection`.

- ``files`` stores the file's metadata. For details, see
:ref:`gridfs-files-collection`.

GridFS places the collections in a common bucket by prefixing each with
the bucket name. By default, GridFS stores the collections in the ``fs``
bucket:

- ``fs.files``
- ``fs.chunks``

You can choose a different default bucket name than ``fs``, as well as
create additional buckets.

To access files, you use the bucket name. For example, if you use GridFS
to create a ``photos`` bucket, then to issue the :method:`findOne()
<db.collection.findOne()>` command from the :program:`mongo` shell you would type:

.. code-block:: javascript

db.photos.findOne()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing?

the last little bit here with the findOne()?not sure what its supposed to accomplish from an illustrative or operational perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where this came from? we need to tweak it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill shell example, tell people to use mongofiles or drivers.


.. index:: GridFS; chunks collection
.. _gridfs-chunks-collection:

The chunks Collection
~~~~~~~~~~~~~~~~~~~~~

Each document in the ``chunks`` collection represents a different chunk
of a document that has been parsed by :term:`GridFS`. The following is a
prototype document from the ``chunks`` collection.:

.. code-block:: javascript

{
"_id" : <string>,
"files_id" : <string>,
"n" : <num>,
"data" : <binary>
}

A document from the ``chunks`` collection contains the following fields:

.. data:: chunks._id

The unique :term:`ObjectID` of the chunk.

.. data:: chunks.files_id

The ``_id`` of the "parent" document, as specified in the ``files``
collection.

.. data:: chunks.n

The sequence number of the chunk. Chunks are numbered in order,
starting with 0.

.. data:: chunks.data

The chunk's payload as a :term:`BSON` binary type.

The ``chunks`` collection uses a :term:`compound index` on ``files_id`` and
``n``, as described in :ref:`gridfs-index`.

.. index:: GridFS; files collection
.. _gridfs-files-collection:

The files Collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

``files``

~~~~~~~~~~~~~~~~~~~~

Each document in the ``files`` collection represents a
document that has been stored by :term:`GridFS`. The following is a
prototype of a ``files`` collection document:

.. code-block:: javascript

{
"_id" : <ObjectID>,
"length" : <num>,
"chunkSize" : <num>
"uploadDate" : <timestamp>
"md5" : <hash>

"filename" : <string>,
"contentType" : <string>,
"aliases" : <string array>,
"metadata" : <dataObject>,
}

A document from the ``files`` collection contains some or all of the
following fields. You can create additional fields:

.. data:: files._id

The unique ID for this document. The ``_id`` is of the data type you
chose for the original document. The default type for MongoDB
documents is :term:`BSON` :term:`ObjectID`.

.. data:: files.length

The size of the document in bytes.

.. data:: files.chunkSize

The size of each chunk. GridFS divides the document into chunks of
the size specified here. The default size is 256 kilobytes.

.. data:: files.uploadDate

The date the document was first stored by GridFS. This value has the
``Date`` data type.

.. data:: files.md5

An MD5 hash returned from the filemd5 API. This value has the ``String``
data type.

.. data:: files.filename

A human-readable name for the document. This field is optional.

.. data:: files.contentType

A valid MIME type for the document. This field is optional.

.. data:: files.aliases

An array of alias strings. This field is optional.

.. data:: files.metadata

Any additional information you want to store. This field is optional.

.. index:: GridFS; index
.. _gridfs-index:

GridFS Index
------------

:term:`GridFS` uses a :term:`unique <unique index>`, :term:`compound
<compound index>` index on the ``chunks`` collection for ``files_id``
and ``n``. The index allows efficient retrieval of chunks using the
``files_id`` and ``n`` values, as shown in the following example:

.. code-block:: javascript

cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});

See the :doc:`/applications/drivers` documentation for your driver to
learn whether this index is created by default.

The following command creates this index from the shell:

.. code-block:: javascript

db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true});

Example Interface
-----------------

The following is an example of the GridFS interface in Java. The example
is for demonstration purposes only. For API specifics, see the
:doc:`/applications/drivers` documentation for your driver.

.. code-block:: java

/*
* default root collection usage - must be supported
*/
GridFS myFS = new GridFS(myDatabase); // returns a default GridFS (e.g. "fs" bucket collection)
myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file into the "fs" GridFS store

/*
* specified root collection usage - optional
*/

GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns a GridFS where "contracts" is root
myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf")); // retrieves object whose filename is "smithco"