Skip to content

DOCSP-46701: Serialization #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions source/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ MongoDB {+driver-short+} Documentation
Data Formats </data-formats>
Logging </logging>
Monitoring </monitoring>
Serialization </serialization>
Third-Party Tools </tools>
FAQ </faq>
Troubleshooting </troubleshooting>
Expand Down Expand Up @@ -100,6 +101,22 @@ Specialized Data Formats
Learn how to work with specialized data formats and custom types in the
:ref:`pymongo-data-formats` section.

Logging
-------

Learn how to configure logging in the :ref:`pymongo-logging` section.

Monitoring
----------

Learn how to monitor changes to your application in the :ref:`pymongo-monitoring` section.

Serialization
-------------

Learn how {+driver-short+} serializes and deserializes data in the
:ref:`pymongo-serialization` section.

Third-Party Tools
-----------------

Expand Down
92 changes: 92 additions & 0 deletions source/serialization.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
.. _pymongo-serialization:

=============
Serialization
=============

.. facet::
:name: genre
:values: reference

.. meta::
:keywords: class, map, deserialize

.. contents:: On this page
:local:
:backlinks: none
:depth: 2
:class: singlecol

Overview
--------

In this guide, you can learn how to use {+driver-short+} to perform
serialization.

Serialization is the process of mapping a {+language+} object to a BSON
document for storage in MongoDB. {+driver-short+} automatically converts basic {+language+}
types into BSON when you insert a document into a collection. Similarly, when you retrieve a
document from a collection, {+driver-short+} automatically converts the returned BSON
back into the corresponding {+language+} types.

You can use {+driver-short+} to serialize and deserialize the following {+language+}
types:

- ``str``
- ``int``
- ``float``
- ``bool``
- ``datetime.datetime``
- ``list``
- ``dict``
- ``None``

For a complete list of {+language+}-to-BSON mappings, see the `bson <{+api-root+}bson/index.html>`__
API documentation.

Custom Classes
--------------

To serialize and deserialize custom {+language+} classes, you must implement custom logic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to explicitly call out that serialization and deserialization are required in order to work with custom data classes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my intention with this sentence, but if you feel it's not clear enough I can make it more explicit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear enough.

to handle the conversion. The following sections show how to serialize and deserialize
custom classes.

Serializing Custom Classes
~~~~~~~~~~~~~~~~~~~~~~~~~~

To serialize a custom class, you must convert the class to a dictionary. The following
example serializes a custom class by using the ``vars()`` method, and then inserts the
serialized object into a collection:

.. code-block:: python

class Restaurant:
def __init__(self, name, cuisine):
self.name = name
self.cuisine = cuisine

restaurant = Guitar("Example Cafe", "Coffee")
restaurant_dict = vars(restaurant)

collection.insert_one(restaurant_dict)

To learn more about inserting documents into a collection, see the :ref:`pymongo-write-insert`
guide.

Deserializing Custom Classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To deserialize a custom class, you must convert the dictionary back into an instance of
the class. The following example retrieves a document from a collection, and then converts
it back into a ``Restaurant`` object from the preceding example:

.. code-block:: python

def deserialize_restaurant(doc):
return Restaurant(name=doc["name"], cuisine=doc["cuisine"])

restaurant_doc = collection.find_one({"name": "Example Cafe"})
restaurant = deserialize_restaurant(restaurant_doc)

To learn more about retrieving documents from a collection, see the :ref:`pymongo-retrieve`
guide.
143 changes: 0 additions & 143 deletions source/troubleshooting.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,149 +110,6 @@ frameworks.
if __name__ == "__main__":
app.run()

Query Works in the Shell But Not in {+driver-short+}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After the ``_id`` field, which is always first, the key-value pairs in a BSON document can
be in any order. The ``mongo`` shell preserves key order when reading and writing
data, as shown by the fields "b" and "a" in the following code example:

.. code-block:: javascript

// mongo shell
db.collection.insertOne( { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } } )
// Returns: WriteResult({ "nInserted" : 1 })

db.collection.findOne()
// Returns: { "_id" : 1, "subdocument" : { "b" : 1, "a" : 1 } }

{+driver-short+} represents BSON documents as Python dictionaries by default,
and the order of keys in dictionaries is not defined. In Python, a dictionary declared with
the "a" key first is the same as one with the "b" key first. In the following example,
the keys are displayed in the same order regardless of their order in the ``print``
statement:

.. code-block:: python

print({'a': 1.0, 'b': 1.0})
# Returns: {'a': 1.0, 'b': 1.0}

print({'b': 1.0, 'a': 1.0})
# Returns: {'a': 1.0, 'b': 1.0}

Similarly, Python dictionaries might not show keys in the order they are
stored in BSON. The following example shows the result of printing the document
inserted in a preceding example:

.. code-block:: python

print(collection.find_one())
# Returns: {'_id': 1.0, 'subdocument': {'a': 1.0, 'b': 1.0}}

To preserve the order of keys when reading BSON, use the ``SON`` class,
which is a dictionary that remembers its key order.

The following code example shows how to create a collection
configured to use the ``SON`` class:

.. code-block:: python

from bson import CodecOptions, SON

opts = CodecOptions(document_class=SON)

CodecOptions(document_class=...SON..., tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME)
collection_son = collection.with_options(codec_options=opts)

When you find the preceding subdocument, the driver represents query results with
``SON`` objects and preserves key order:

.. io-code-block::

.. input::
:language: python

print(collection_son.find_one())

.. output::

SON([('_id', 1.0), ('subdocument', SON([('b', 1.0), ('a', 1.0)]))])

The subdocument's actual storage layout is now visible: "b" is before "a".

Because a Python dictionary's key order is not defined, you cannot predict how it will be
serialized to BSON. However, MongoDB considers subdocuments equal only if their
keys have the same order. If you use a Python dictionary to query on a subdocument, it may
not match:

.. io-code-block::

.. input::
:language: python

collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None

.. output::

True

Because Python considers the two dictionaries the same, swapping the key order in your query
makes no difference:

.. io-code-block::

.. input::
:language: python

collection.find_one({'subdocument': {'b': 1.0, 'a': 1.0}}) is None

.. output::

True

You can solve this in two ways. First, you can match the subdocument field-by-field:

.. io-code-block::

.. input::
:language: python

collection.find_one({'subdocument.a': 1.0,
'subdocument.b': 1.0})

.. output::

{'_id': 1.0, 'subdocument': {'a': 1.0, 'b': 1.0}}

The query matches any subdocument with an "a" of 1.0 and a "b" of 1.0,
regardless of the order in which you specify them in Python, or the order in which they're
stored in BSON. This query also now matches subdocuments with additional
keys besides "a" and "b", whereas the previous query required an exact match.

The second solution is to use a ``~bson.son.SON`` object to specify the key order:

.. io-code-block::

.. input::
:language: python

query = {'subdocument': SON([('b', 1.0), ('a', 1.0)])}
collection.find_one(query)

.. output::

{'_id': 1.0, 'subdocument': {'a': 1.0, 'b': 1.0}}

The driver preserves the key order you use when you create a ``~bson.son.SON``
when serializing it to BSON and using it as a query. Thus, you can create a
subdocument that exactly matches the subdocument in the collection.

.. note::

For more information about subdocument matching, see the
`Query on Embedded/Nested Documents <https://www.mongodb.com/docs/manual/tutorial/query-embedded-documents/>`__
guide in the {+mdb-server+} documentation.

Cursors
-------

Expand Down
Loading