Skip to content

Commit 0154a29

Browse files
committed
DRIVERS-2789 Convert ObjectID Spec to Markdown
1 parent aac7087 commit 0154a29

File tree

4 files changed

+98
-136
lines changed

4 files changed

+98
-136
lines changed

source/crud/bulk-write.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -499,8 +499,8 @@ operation should be performed as its value. The documents have the following for
499499
}
500500
```
501501

502-
If the document to be inserted does not contain an `_id` field, drivers MUST generate a new
503-
[`ObjectId`](../objectid.rst) and add it as the `_id` field at the beginning of the document.
502+
If the document to be inserted does not contain an `_id` field, drivers MUST generate a new [`ObjectId`](../objectid.md)
503+
and add it as the `_id` field at the beginning of the document.
504504

505505
#### Update
506506

source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
- [Max Staleness](max-staleness/max-staleness.md)
2828
- [Max Staleness Tests](max-staleness/max-staleness-tests.md)
2929
- [OP_MSG](message/OP_MSG.md)
30+
- [ObjectID format](./objectid.md)
3031
- [Performance Benchmarking](benchmarking/benchmarking.md)
3132
- [Retryable Reads](retryable-reads/retryable-reads.md)
3233
- [Retryable Writes](retryable-writes/retryable-writes.md)

source/objectid.md

Lines changed: 91 additions & 134 deletions
Original file line numberDiff line numberDiff line change
@@ -1,182 +1,139 @@
1-
.. role:: javascript(code)
2-
:language: javascript
1+
# ObjectID format
32

4-
===============
5-
ObjectID format
6-
===============
3+
- Status: Accepted
4+
- Minimum Server Version: N/A
75

8-
:Status: Accepted
9-
:Minimum Server Version: N/A
6+
______________________________________________________________________
107

11-
.. contents::
8+
## Abstract
129

13-
--------
10+
This specification documents the format and data contents of ObjectID BSON values that the drivers and the server
11+
generate when no field values have been specified (e.g. creating an ObjectID BSON value when no `_id` field is present
12+
in a document). It is primarily aimed to provide an alternative to the historical use of the MD5 hashing algorithm for
13+
the machine information field of the ObjectID, which is problematic when providing a FIPS compliant implementation. It
14+
also documents existing best practices for the timestamp and counter fields.
1415

15-
Abstract
16-
========
16+
## META
1717

18-
This specification documents the format and data contents of ObjectID BSON
19-
values that the drivers and the server generate when no field values have been
20-
specified (e.g. creating an ObjectID BSON value when no _id field is present
21-
in a document). It is primarily aimed to provide an alternative to the
22-
historical use of the MD5 hashing algorithm for the machine information field
23-
of the ObjectID, which is problematic when providing a FIPS compliant
24-
implementation. It also documents existing best practices for the timestamp
25-
and counter fields.
18+
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
19+
"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).
2620

27-
META
28-
====
21+
## Specification
2922

30-
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
31-
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be
32-
interpreted as described in `RFC 2119 <https://www.ietf.org/rfc/rfc2119.txt>`_.
23+
The [ObjectID](https://www.mongodb.com/docs/manual/reference/method/ObjectId/) BSON type is a 12-byte value consisting
24+
of three different portions (fields):
3325

34-
Specification
35-
=============
36-
37-
The ObjectID_ BSON type is a 12-byte value consisting of three different
38-
portions (fields):
39-
40-
- a 4-byte value representing the seconds since the Unix epoch in the highest
41-
order bytes,
26+
- a 4-byte value representing the seconds since the Unix epoch in the highest order bytes,
4227
- a 5-byte random number unique to a machine and process,
4328
- a 3-byte counter, starting with a random value.
4429

45-
::
46-
47-
4 byte timestamp 5 byte process unique 3 byte counter
48-
|<----------------->|<---------------------->|<------------>|
49-
[----|----|----|----|----|----|----|----|----|----|----|----]
50-
0 4 8 12
30+
```
31+
4 byte timestamp 5 byte process unique 3 byte counter
32+
|<----------------->|<---------------------->|<------------>|
33+
[----|----|----|----|----|----|----|----|----|----|----|----]
34+
0 4 8 12
35+
```
5136

52-
.. _ObjectID: https://www.mongodb.com/docs/manual/reference/method/ObjectId/
37+
### Timestamp Field
5338

54-
Timestamp Field
55-
---------------
39+
This 4-byte big endian field represents the seconds since the Unix epoch (Jan 1st, 1970, midnight UTC). It is an ever
40+
increasing value that will have a range until about Jan 7th, 2106.
5641

57-
This 4-byte big endian field represents the seconds since the Unix epoch (Jan
58-
1st, 1970, midnight UTC). It is an ever increasing value that will have a
59-
range until about Jan 7th, 2106.
42+
Drivers MUST create ObjectIDs with this value representing the number of seconds since the Unix epoch.
6043

61-
Drivers MUST create ObjectIDs with this value representing the number of
62-
seconds since the Unix epoch.
44+
Drivers MUST interpret this value as an **unsigned 32-bit integer** when conversions to language specific date/time
45+
values are created, and when converting this to a timestamp.
6346

64-
Drivers MUST interpret this value as an **unsigned 32-bit integer** when
65-
conversions to language specific date/time values are created, and when
66-
converting this to a timestamp.
47+
Drivers SHOULD have an accessor method on an ObjectID class for obtaining the timestamp value.
6748

68-
Drivers SHOULD have an accessor method on an ObjectID class for obtaining the
69-
timestamp value.
49+
### Random Value
7050

71-
Random Value
72-
------------
51+
A 5-byte field consisting of a random value generated once per process. This random value is unique to the machine and
52+
process.
7353

74-
A 5-byte field consisting of a random value generated once per process. This
75-
random value is unique to the machine and process.
54+
Drivers MUST NOT have an accessor method on an ObjectID class for obtaining this value.
7655

77-
Drivers MUST NOT have an accessor method on an ObjectID class for obtaining
78-
this value.
56+
The random number does not have to be cryptographic. If possible, use a PRNG with OS supplied entropy that SHOULD NOT
57+
block to wait for more entropy to become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of process
58+
and machine by combining time, process ID, and hostname.
7959

80-
The random number does not have to be cryptographic. If possible, use a PRNG
81-
with OS supplied entropy that SHOULD NOT block to wait for more entropy to
82-
become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of
83-
process and machine by combining time, process ID, and hostname.
84-
85-
Counter
86-
-------
60+
### Counter
8761

8862
A 3-byte big endian counter.
8963

90-
This counter MUST be initialised to a random value when the driver is first
91-
activated. After initialisation, the counter MUST be increased by 1 for every
92-
ObjectID creation.
64+
This counter MUST be initialised to a random value when the driver is first activated. After initialisation, the counter
65+
MUST be increased by 1 for every ObjectID creation.
9366

94-
When the counter overflows (i.e., hits 16777215+1), the counter MUST be reset
95-
to 0.
67+
When the counter overflows (i.e., hits 16777215+1), the counter MUST be reset to 0.
9668

97-
Drivers MUST NOT have an accessor method on an ObjectID class for obtaining
98-
this value.
69+
Drivers MUST NOT have an accessor method on an ObjectID class for obtaining this value.
9970

100-
The random number does not have to be cryptographic. If possible, use a PRNG
101-
with OS supplied entropy that SHOULD NOT block to wait for more entropy to
102-
become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of
103-
process and machine by combining time, process ID, and hostname.
71+
The random number does not have to be cryptographic. If possible, use a PRNG with OS supplied entropy that SHOULD NOT
72+
block to wait for more entropy to become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of process
73+
and machine by combining time, process ID, and hostname.
10474

105-
Test Plan
106-
=========
75+
## Test Plan
10776

10877
Drivers MUST:
10978

110-
- Ensure that the Timestamp field is represented as an unsigned 32-bit
111-
representing the number of seconds since the Epoch for the Timestamp values:
79+
- Ensure that the Timestamp field is represented as an unsigned 32-bit representing the number of seconds since the
80+
Epoch for the Timestamp values:
81+
- `0x00000000`: To match `"Jan 1st, 1970 00:00:00 UTC"`
82+
- `0x7FFFFFFF`: To match `"Jan 19th, 2038 03:14:07 UTC"`
83+
- `0x80000000`: To match `"Jan 19th, 2038 03:14:08 UTC"`
84+
- `0xFFFFFFFF`: To match `"Feb 7th, 2106 06:28:15 UTC"`
85+
- Ensure that the Counter field successfully overflows its sequence from `0xFFFFFF` to `0x000000`.
86+
- Ensure that after a new process is created through a fork() or similar process creation operation, the "random number
87+
unique to a machine and process" is no longer the same as the parent process that created the new process.
11288

113-
- ``0x00000000``: To match ``"Jan 1st, 1970 00:00:00 UTC"``
114-
- ``0x7FFFFFFF``: To match ``"Jan 19th, 2038 03:14:07 UTC"``
115-
- ``0x80000000``: To match ``"Jan 19th, 2038 03:14:08 UTC"``
116-
- ``0xFFFFFFFF``: To match ``"Feb 7th, 2106 06:28:15 UTC"``
89+
## Motivation for Change
11790

118-
- Ensure that the Counter field successfully overflows its sequence from
119-
``0xFFFFFF`` to ``0x000000``.
91+
Besides the specific exclusion of MD5 as an allowed hashing algorithm, the information in this specification is meant to
92+
align the ObjectID generation algorithm of both drivers and the server.
12093

121-
- Ensure that after a new process is created through a fork() or similar
122-
process creation operation, the "random number unique to a machine and
123-
process" is no longer the same as the parent process that created the new
124-
process.
94+
## Design Rationale
12595

126-
Motivation for Change
127-
=====================
96+
**Timestamp:** The timestamp is a 32-bit **unsigned** integer, as it allows us to extend the furthest date that the
97+
timestamp can represent from the year 2038 to 2106. There is no reason why MongoDB would generate a timestamp to mean a
98+
date before 1970, as MongoDB did not exist back then.
12899

129-
Besides the specific exclusion of MD5 as an allowed hashing algorithm, the
130-
information in this specification is meant to align the ObjectID generation
131-
algorithm of both drivers and the server.
100+
**Random Value:** Originally, this field consisted of the Machine ID and Process ID fields. There were numerous
101+
divergences between drivers due to implementation choices, and the Machine ID field traditionally used the MD5 hashing
102+
algorithm which can't be used on FIPS compliant machines. In order to allow for a similar behaviour among all drivers
103+
**and** the MongoDB Server, these two fields have been collated together into a single 5-byte random value, unique to a
104+
machine and process.
132105

133-
Design Rationale
134-
================
106+
**Counter:** The counter makes it possible to have multiple ObjectIDs per second, per server, and per process. As the
107+
counter can overflow, there is a possibility of having duplicate ObjectIDs if you create more than 16 million ObjectIDs
108+
per second in the same process on a single machine.
135109

136-
**Timestamp:** The timestamp is a 32-bit **unsigned** integer, as it allows us
137-
to extend the furthest date that the timestamp can represent from the year 2038
138-
to 2106. There is no reason why MongoDB would generate a timestamp to mean a
139-
date before 1970, as MongoDB did not exist back then.
110+
**Endianness:** The *Timestamp* and *Counter* are big endian because we can then use `memcmp` to order ObjectIDs, and we
111+
want to ensure an increasing order.
140112

141-
**Random Value:** Originally, this field consisted of the Machine ID and
142-
Process ID fields. There were numerous divergences between drivers due to
143-
implementation choices, and the Machine ID field traditionally used the MD5
144-
hashing algorithm which can't be used on FIPS compliant machines. In order to
145-
allow for a similar behaviour among all drivers **and** the MongoDB Server,
146-
these two fields have been collated together into a single 5-byte random value,
147-
unique to a machine and process.
113+
## Backwards Compatibility
148114

149-
**Counter:** The counter makes it possible to have multiple ObjectIDs per
150-
second, per server, and per process. As the counter can overflow, there is a
151-
possibility of having duplicate ObjectIDs if you create more than 16 million
152-
ObjectIDs per second in the same process on a single machine.
115+
This specification requires that the existing *Machine ID* and *Process ID* fields are merged into a single 5-byte
116+
value. This will change the behaviour of ObjectID generation, as well as the behaviour of drivers that currently have
117+
getters and setters for the original *Machine ID* and *Process ID* fields.
153118

154-
**Endianness:** The *Timestamp* and *Counter* are big endian because we can
155-
then use ``memcmp`` to order ObjectIDs, and we want to ensure an increasing order.
119+
## Reference Implementation
156120

121+
Currently there is no full reference implementation yet.
157122

158-
Backwards Compatibility
159-
=======================
123+
## Changelog
160124

161-
This specification requires that the existing *Machine ID* and *Process ID*
162-
fields are merged into a single 5-byte value. This will change the behaviour of
163-
ObjectID generation, as well as the behaviour of drivers that currently have
164-
getters and setters for the original *Machine ID* and *Process ID* fields.
125+
- 2024-07-30: Migrated from reStructuredText to Markdown.
165126

166-
Reference Implementation
167-
========================
127+
- 2022-10-05: Remove spec front matter and reformat changelog.
168128

169-
Currently there is no full reference implementation yet.
129+
- 2019-01-14: Clarify that the random numbers don't need to be cryptographically\
130+
secure. Add a test to test that the
131+
unique value is different in forked processes.
132+
133+
- 2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian,\
134+
and add the reason why.
135+
136+
- 2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte\
137+
unique value
170138

171-
Changelog
172-
=========
173-
174-
:2022-10-05: Remove spec front matter and reformat changelog.
175-
:2019-01-14: Clarify that the random numbers don't need to be cryptographically
176-
secure. Add a test to test that the unique value is different in
177-
forked processes.
178-
:2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian,
179-
and add the reason why.
180-
:2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte
181-
unique value
182-
:2018-05-22: Initial Release
139+
- 2018-05-22: Initial Release

source/objectid.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
2+
.. note::
3+
This specification has been converted to Markdown and renamed to
4+
`objectid.md <objectid.md>`_.

0 commit comments

Comments
 (0)