|
1 |
| -.. role:: javascript(code) |
2 |
| - :language: javascript |
| 1 | +# ObjectID format |
3 | 2 |
|
4 |
| -=============== |
5 |
| -ObjectID format |
6 |
| -=============== |
| 3 | +- Status: Accepted |
| 4 | +- Minimum Server Version: N/A |
7 | 5 |
|
8 |
| -:Status: Accepted |
9 |
| -:Minimum Server Version: N/A |
| 6 | +______________________________________________________________________ |
10 | 7 |
|
11 |
| -.. contents:: |
| 8 | +## Abstract |
12 | 9 |
|
13 |
| --------- |
| 10 | +This specification documents the format and data contents of ObjectID BSON values that the drivers and the server |
| 11 | +generate when no field values have been specified (e.g. creating an ObjectID BSON value when no `_id` field is present |
| 12 | +in a document). It is primarily aimed to provide an alternative to the historical use of the MD5 hashing algorithm for |
| 13 | +the machine information field of the ObjectID, which is problematic when providing a FIPS compliant implementation. It |
| 14 | +also documents existing best practices for the timestamp and counter fields. |
14 | 15 |
|
15 |
| -Abstract |
16 |
| -======== |
| 16 | +## META |
17 | 17 |
|
18 |
| -This specification documents the format and data contents of ObjectID BSON |
19 |
| -values that the drivers and the server generate when no field values have been |
20 |
| -specified (e.g. creating an ObjectID BSON value when no _id field is present |
21 |
| -in a document). It is primarily aimed to provide an alternative to the |
22 |
| -historical use of the MD5 hashing algorithm for the machine information field |
23 |
| -of the ObjectID, which is problematic when providing a FIPS compliant |
24 |
| -implementation. It also documents existing best practices for the timestamp |
25 |
| -and counter fields. |
| 18 | +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and |
| 19 | +"OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). |
26 | 20 |
|
27 |
| -META |
28 |
| -==== |
| 21 | +## Specification |
29 | 22 |
|
30 |
| -The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, |
31 |
| -“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be |
32 |
| -interpreted as described in `RFC 2119 <https://www.ietf.org/rfc/rfc2119.txt>`_. |
| 23 | +The [ObjectID](https://www.mongodb.com/docs/manual/reference/method/ObjectId/) BSON type is a 12-byte value consisting |
| 24 | +of three different portions (fields): |
33 | 25 |
|
34 |
| -Specification |
35 |
| -============= |
36 |
| - |
37 |
| -The ObjectID_ BSON type is a 12-byte value consisting of three different |
38 |
| -portions (fields): |
39 |
| - |
40 |
| -- a 4-byte value representing the seconds since the Unix epoch in the highest |
41 |
| - order bytes, |
| 26 | +- a 4-byte value representing the seconds since the Unix epoch in the highest order bytes, |
42 | 27 | - a 5-byte random number unique to a machine and process,
|
43 | 28 | - a 3-byte counter, starting with a random value.
|
44 | 29 |
|
45 |
| -:: |
46 |
| - |
47 |
| - 4 byte timestamp 5 byte process unique 3 byte counter |
48 |
| - |<----------------->|<---------------------->|<------------>| |
49 |
| - [----|----|----|----|----|----|----|----|----|----|----|----] |
50 |
| - 0 4 8 12 |
| 30 | +``` |
| 31 | +4 byte timestamp 5 byte process unique 3 byte counter |
| 32 | +|<----------------->|<---------------------->|<------------>| |
| 33 | +[----|----|----|----|----|----|----|----|----|----|----|----] |
| 34 | +0 4 8 12 |
| 35 | +``` |
51 | 36 |
|
52 |
| -.. _ObjectID: https://www.mongodb.com/docs/manual/reference/method/ObjectId/ |
| 37 | +### Timestamp Field |
53 | 38 |
|
54 |
| -Timestamp Field |
55 |
| ---------------- |
| 39 | +This 4-byte big endian field represents the seconds since the Unix epoch (Jan 1st, 1970, midnight UTC). It is an ever |
| 40 | +increasing value that will have a range until about Jan 7th, 2106. |
56 | 41 |
|
57 |
| -This 4-byte big endian field represents the seconds since the Unix epoch (Jan |
58 |
| -1st, 1970, midnight UTC). It is an ever increasing value that will have a |
59 |
| -range until about Jan 7th, 2106. |
| 42 | +Drivers MUST create ObjectIDs with this value representing the number of seconds since the Unix epoch. |
60 | 43 |
|
61 |
| -Drivers MUST create ObjectIDs with this value representing the number of |
62 |
| -seconds since the Unix epoch. |
| 44 | +Drivers MUST interpret this value as an **unsigned 32-bit integer** when conversions to language specific date/time |
| 45 | +values are created, and when converting this to a timestamp. |
63 | 46 |
|
64 |
| -Drivers MUST interpret this value as an **unsigned 32-bit integer** when |
65 |
| -conversions to language specific date/time values are created, and when |
66 |
| -converting this to a timestamp. |
| 47 | +Drivers SHOULD have an accessor method on an ObjectID class for obtaining the timestamp value. |
67 | 48 |
|
68 |
| -Drivers SHOULD have an accessor method on an ObjectID class for obtaining the |
69 |
| -timestamp value. |
| 49 | +### Random Value |
70 | 50 |
|
71 |
| -Random Value |
72 |
| ------------- |
| 51 | +A 5-byte field consisting of a random value generated once per process. This random value is unique to the machine and |
| 52 | +process. |
73 | 53 |
|
74 |
| -A 5-byte field consisting of a random value generated once per process. This |
75 |
| -random value is unique to the machine and process. |
| 54 | +Drivers MUST NOT have an accessor method on an ObjectID class for obtaining this value. |
76 | 55 |
|
77 |
| -Drivers MUST NOT have an accessor method on an ObjectID class for obtaining |
78 |
| -this value. |
| 56 | +The random number does not have to be cryptographic. If possible, use a PRNG with OS supplied entropy that SHOULD NOT |
| 57 | +block to wait for more entropy to become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of process |
| 58 | +and machine by combining time, process ID, and hostname. |
79 | 59 |
|
80 |
| -The random number does not have to be cryptographic. If possible, use a PRNG |
81 |
| -with OS supplied entropy that SHOULD NOT block to wait for more entropy to |
82 |
| -become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of |
83 |
| -process and machine by combining time, process ID, and hostname. |
84 |
| - |
85 |
| -Counter |
86 |
| -------- |
| 60 | +### Counter |
87 | 61 |
|
88 | 62 | A 3-byte big endian counter.
|
89 | 63 |
|
90 |
| -This counter MUST be initialised to a random value when the driver is first |
91 |
| -activated. After initialisation, the counter MUST be increased by 1 for every |
92 |
| -ObjectID creation. |
| 64 | +This counter MUST be initialised to a random value when the driver is first activated. After initialisation, the counter |
| 65 | +MUST be increased by 1 for every ObjectID creation. |
93 | 66 |
|
94 |
| -When the counter overflows (i.e., hits 16777215+1), the counter MUST be reset |
95 |
| -to 0. |
| 67 | +When the counter overflows (i.e., hits 16777215+1), the counter MUST be reset to 0. |
96 | 68 |
|
97 |
| -Drivers MUST NOT have an accessor method on an ObjectID class for obtaining |
98 |
| -this value. |
| 69 | +Drivers MUST NOT have an accessor method on an ObjectID class for obtaining this value. |
99 | 70 |
|
100 |
| -The random number does not have to be cryptographic. If possible, use a PRNG |
101 |
| -with OS supplied entropy that SHOULD NOT block to wait for more entropy to |
102 |
| -become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of |
103 |
| -process and machine by combining time, process ID, and hostname. |
| 71 | +The random number does not have to be cryptographic. If possible, use a PRNG with OS supplied entropy that SHOULD NOT |
| 72 | +block to wait for more entropy to become available. Otherwise, seed a deterministic PRNG to ensure uniqueness of process |
| 73 | +and machine by combining time, process ID, and hostname. |
104 | 74 |
|
105 |
| -Test Plan |
106 |
| -========= |
| 75 | +## Test Plan |
107 | 76 |
|
108 | 77 | Drivers MUST:
|
109 | 78 |
|
110 |
| -- Ensure that the Timestamp field is represented as an unsigned 32-bit |
111 |
| - representing the number of seconds since the Epoch for the Timestamp values: |
| 79 | +- Ensure that the Timestamp field is represented as an unsigned 32-bit representing the number of seconds since the |
| 80 | + Epoch for the Timestamp values: |
| 81 | + - `0x00000000`: To match `"Jan 1st, 1970 00:00:00 UTC"` |
| 82 | + - `0x7FFFFFFF`: To match `"Jan 19th, 2038 03:14:07 UTC"` |
| 83 | + - `0x80000000`: To match `"Jan 19th, 2038 03:14:08 UTC"` |
| 84 | + - `0xFFFFFFFF`: To match `"Feb 7th, 2106 06:28:15 UTC"` |
| 85 | +- Ensure that the Counter field successfully overflows its sequence from `0xFFFFFF` to `0x000000`. |
| 86 | +- Ensure that after a new process is created through a fork() or similar process creation operation, the "random number |
| 87 | + unique to a machine and process" is no longer the same as the parent process that created the new process. |
112 | 88 |
|
113 |
| - - ``0x00000000``: To match ``"Jan 1st, 1970 00:00:00 UTC"`` |
114 |
| - - ``0x7FFFFFFF``: To match ``"Jan 19th, 2038 03:14:07 UTC"`` |
115 |
| - - ``0x80000000``: To match ``"Jan 19th, 2038 03:14:08 UTC"`` |
116 |
| - - ``0xFFFFFFFF``: To match ``"Feb 7th, 2106 06:28:15 UTC"`` |
| 89 | +## Motivation for Change |
117 | 90 |
|
118 |
| -- Ensure that the Counter field successfully overflows its sequence from |
119 |
| - ``0xFFFFFF`` to ``0x000000``. |
| 91 | +Besides the specific exclusion of MD5 as an allowed hashing algorithm, the information in this specification is meant to |
| 92 | +align the ObjectID generation algorithm of both drivers and the server. |
120 | 93 |
|
121 |
| -- Ensure that after a new process is created through a fork() or similar |
122 |
| - process creation operation, the "random number unique to a machine and |
123 |
| - process" is no longer the same as the parent process that created the new |
124 |
| - process. |
| 94 | +## Design Rationale |
125 | 95 |
|
126 |
| -Motivation for Change |
127 |
| -===================== |
| 96 | +**Timestamp:** The timestamp is a 32-bit **unsigned** integer, as it allows us to extend the furthest date that the |
| 97 | +timestamp can represent from the year 2038 to 2106. There is no reason why MongoDB would generate a timestamp to mean a |
| 98 | +date before 1970, as MongoDB did not exist back then. |
128 | 99 |
|
129 |
| -Besides the specific exclusion of MD5 as an allowed hashing algorithm, the |
130 |
| -information in this specification is meant to align the ObjectID generation |
131 |
| -algorithm of both drivers and the server. |
| 100 | +**Random Value:** Originally, this field consisted of the Machine ID and Process ID fields. There were numerous |
| 101 | +divergences between drivers due to implementation choices, and the Machine ID field traditionally used the MD5 hashing |
| 102 | +algorithm which can't be used on FIPS compliant machines. In order to allow for a similar behaviour among all drivers |
| 103 | +**and** the MongoDB Server, these two fields have been collated together into a single 5-byte random value, unique to a |
| 104 | +machine and process. |
132 | 105 |
|
133 |
| -Design Rationale |
134 |
| -================ |
| 106 | +**Counter:** The counter makes it possible to have multiple ObjectIDs per second, per server, and per process. As the |
| 107 | +counter can overflow, there is a possibility of having duplicate ObjectIDs if you create more than 16 million ObjectIDs |
| 108 | +per second in the same process on a single machine. |
135 | 109 |
|
136 |
| -**Timestamp:** The timestamp is a 32-bit **unsigned** integer, as it allows us |
137 |
| -to extend the furthest date that the timestamp can represent from the year 2038 |
138 |
| -to 2106. There is no reason why MongoDB would generate a timestamp to mean a |
139 |
| -date before 1970, as MongoDB did not exist back then. |
| 110 | +**Endianness:** The *Timestamp* and *Counter* are big endian because we can then use `memcmp` to order ObjectIDs, and we |
| 111 | +want to ensure an increasing order. |
140 | 112 |
|
141 |
| -**Random Value:** Originally, this field consisted of the Machine ID and |
142 |
| -Process ID fields. There were numerous divergences between drivers due to |
143 |
| -implementation choices, and the Machine ID field traditionally used the MD5 |
144 |
| -hashing algorithm which can't be used on FIPS compliant machines. In order to |
145 |
| -allow for a similar behaviour among all drivers **and** the MongoDB Server, |
146 |
| -these two fields have been collated together into a single 5-byte random value, |
147 |
| -unique to a machine and process. |
| 113 | +## Backwards Compatibility |
148 | 114 |
|
149 |
| -**Counter:** The counter makes it possible to have multiple ObjectIDs per |
150 |
| -second, per server, and per process. As the counter can overflow, there is a |
151 |
| -possibility of having duplicate ObjectIDs if you create more than 16 million |
152 |
| -ObjectIDs per second in the same process on a single machine. |
| 115 | +This specification requires that the existing *Machine ID* and *Process ID* fields are merged into a single 5-byte |
| 116 | +value. This will change the behaviour of ObjectID generation, as well as the behaviour of drivers that currently have |
| 117 | +getters and setters for the original *Machine ID* and *Process ID* fields. |
153 | 118 |
|
154 |
| -**Endianness:** The *Timestamp* and *Counter* are big endian because we can |
155 |
| -then use ``memcmp`` to order ObjectIDs, and we want to ensure an increasing order. |
| 119 | +## Reference Implementation |
156 | 120 |
|
| 121 | +Currently there is no full reference implementation yet. |
157 | 122 |
|
158 |
| -Backwards Compatibility |
159 |
| -======================= |
| 123 | +## Changelog |
160 | 124 |
|
161 |
| -This specification requires that the existing *Machine ID* and *Process ID* |
162 |
| -fields are merged into a single 5-byte value. This will change the behaviour of |
163 |
| -ObjectID generation, as well as the behaviour of drivers that currently have |
164 |
| -getters and setters for the original *Machine ID* and *Process ID* fields. |
| 125 | +- 2024-07-30: Migrated from reStructuredText to Markdown. |
165 | 126 |
|
166 |
| -Reference Implementation |
167 |
| -======================== |
| 127 | +- 2022-10-05: Remove spec front matter and reformat changelog. |
168 | 128 |
|
169 |
| -Currently there is no full reference implementation yet. |
| 129 | +- 2019-01-14: Clarify that the random numbers don't need to be cryptographically\ |
| 130 | + secure. Add a test to test that the |
| 131 | + unique value is different in forked processes. |
| 132 | + |
| 133 | +- 2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian,\ |
| 134 | + and add the reason why. |
| 135 | + |
| 136 | +- 2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte\ |
| 137 | + unique value |
170 | 138 |
|
171 |
| -Changelog |
172 |
| -========= |
173 |
| - |
174 |
| -:2022-10-05: Remove spec front matter and reformat changelog. |
175 |
| -:2019-01-14: Clarify that the random numbers don't need to be cryptographically |
176 |
| - secure. Add a test to test that the unique value is different in |
177 |
| - forked processes. |
178 |
| -:2018-10-11: Clarify that the *Timestamp* and *Counter* fields are big endian, |
179 |
| - and add the reason why. |
180 |
| -:2018-07-02: Replaced Machine ID and Process ID fields with a single 5-byte |
181 |
| - unique value |
182 |
| -:2018-05-22: Initial Release |
| 139 | +- 2018-05-22: Initial Release |
0 commit comments