Skip to content

Commit f94ca83

Browse files
abhishekkumar2718gitster
authored andcommitted
doc: add corrected commit date info
With generation data chunk and corrected commit dates implemented, let's update the technical documentation for commit-graph. Signed-off-by: Abhishek Kumar <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 4f69b8c commit f94ca83

File tree

2 files changed

+69
-14
lines changed

2 files changed

+69
-14
lines changed

Documentation/technical/commit-graph-format.txt

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,7 @@ Git commit graph format
44
The Git commit graph stores a list of commit OIDs and some associated
55
metadata, including:
66

7-
- The generation number of the commit. Commits with no parents have
8-
generation number 1; commits with parents have generation number
9-
one more than the maximum generation number of its parents. We
10-
reserve zero as special, and can be used to mark a generation
11-
number invalid or as "not computed".
7+
- The generation number of the commit.
128

139
- The root tree OID.
1410

@@ -86,13 +82,26 @@ CHUNK DATA:
8682
position. If there are more than two parents, the second value
8783
has its most-significant bit on and the other bits store an array
8884
position into the Extra Edge List chunk.
89-
* The next 8 bytes store the generation number of the commit and
85+
* The next 8 bytes store the topological level (generation number v1)
86+
of the commit and
9087
the commit time in seconds since EPOCH. The generation number
9188
uses the higher 30 bits of the first 4 bytes, while the commit
9289
time uses the 32 bits of the second 4 bytes, along with the lowest
9390
2 bits of the lowest byte, storing the 33rd and 34th bit of the
9491
commit time.
9592

93+
Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes)
94+
* This list of 4-byte values store corrected commit date offsets for the
95+
commits, arranged in the same order as commit data chunk.
96+
* If the corrected commit date offset cannot be stored within 31 bits,
97+
the value has its most-significant bit on and the other bits store
98+
the position of corrected commit date into the Generation Data Overflow
99+
chunk.
100+
101+
Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
102+
* This list of 8-byte values stores the corrected commit dates for commits
103+
with corrected commit date offsets that cannot be stored within 31 bits.
104+
96105
Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
97106
This list of 4-byte values store the second through nth parents for
98107
all octopus merges. The second parent value in the commit data stores

Documentation/technical/commit-graph.txt

Lines changed: 54 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,31 @@ A consumer may load the following info for a commit from the graph:
3838

3939
Values 1-4 satisfy the requirements of parse_commit_gently().
4040

41-
Define the "generation number" of a commit recursively as follows:
41+
There are two definitions of generation number:
42+
1. Corrected committer dates (generation number v2)
43+
2. Topological levels (generation nummber v1)
4244

43-
* A commit with no parents (a root commit) has generation number one.
45+
Define "corrected committer date" of a commit recursively as follows:
4446

45-
* A commit with at least one parent has generation number one more than
46-
the largest generation number among its parents.
47+
* A commit with no parents (a root commit) has corrected committer date
48+
equal to its committer date.
4749

48-
Equivalently, the generation number of a commit A is one more than the
50+
* A commit with at least one parent has corrected committer date equal to
51+
the maximum of its commiter date and one more than the largest corrected
52+
committer date among its parents.
53+
54+
* As a special case, a root commit with timestamp zero has corrected commit
55+
date of 1, to be able to distinguish it from GENERATION_NUMBER_ZERO
56+
(that is, an uncomputed corrected commit date).
57+
58+
Define the "topological level" of a commit recursively as follows:
59+
60+
* A commit with no parents (a root commit) has topological level of one.
61+
62+
* A commit with at least one parent has topological level one more than
63+
the largest topological level among its parents.
64+
65+
Equivalently, the topological level of a commit A is one more than the
4966
length of a longest path from A to a root commit. The recursive definition
5067
is easier to use for computation and observing the following property:
5168

@@ -60,14 +77,19 @@ is easier to use for computation and observing the following property:
6077
generation numbers, then we always expand the boundary commit with highest
6178
generation number and can easily detect the stopping condition.
6279

80+
The properties applies to both versions of generation number, that is both
81+
corrected committer dates and topological levels.
82+
6383
This property can be used to significantly reduce the time it takes to
6484
walk commits and determine topological relationships. Without generation
6585
numbers, the general heuristic is the following:
6686

6787
If A and B are commits with commit time X and Y, respectively, and
6888
X < Y, then A _probably_ cannot reach B.
6989

70-
This heuristic is currently used whenever the computation is allowed to
90+
In absence of corrected commit dates (for example, old versions of Git or
91+
mixed generation graph chains),
92+
this heuristic is currently used whenever the computation is allowed to
7193
violate topological relationships due to clock skew (such as "git log"
7294
with default order), but is not used when the topological order is
7395
required (such as merge base calculations, "git log --graph").
@@ -77,7 +99,7 @@ in the commit graph. We can treat these commits as having "infinite"
7799
generation number and walk until reaching commits with known generation
78100
number.
79101

80-
We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not
102+
We use the macro GENERATION_NUMBER_INFINITY to mark commits not
81103
in the commit-graph file. If a commit-graph file was written by a version
82104
of Git that did not compute generation numbers, then those commits will
83105
have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
@@ -93,7 +115,7 @@ fully-computed generation numbers. Using strict inequality may result in
93115
walking a few extra commits, but the simplicity in dealing with commits
94116
with generation number *_INFINITY or *_ZERO is valuable.
95117

96-
We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose
118+
We use the macro GENERATION_NUMBER_MAX for commits whose
97119
generation numbers are computed to be at least this value. We limit at
98120
this value since it is the largest value that can be stored in the
99121
commit-graph file using the 30 bits available to generation numbers. This
@@ -267,6 +289,30 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum
267289
number of commits) could be extracted into config settings for full
268290
flexibility.
269291

292+
## Handling Mixed Generation Number Chains
293+
294+
With the introduction of generation number v2 and generation data chunk, the
295+
following scenario is possible:
296+
297+
1. "New" Git writes a commit-graph with the corrected commit dates.
298+
2. "Old" Git writes a split commit-graph on top without corrected commit dates.
299+
300+
A naive approach of using the newest available generation number from
301+
each layer would lead to violated expectations: the lower layer would
302+
use corrected commit dates which are much larger than the topological
303+
levels of the higher layer. For this reason, Git inspects each layer to
304+
see if any layer is missing corrected commit dates. In such a case, Git
305+
only uses topological level
306+
307+
When writing a new layer in split commit-graph, we write corrected commit
308+
dates if the topmost layer has corrected commit dates written. This
309+
guarantees that if a layer has corrected commit dates, all lower layers
310+
must have corrected commit dates as well.
311+
312+
When merging layers, we do not consider whether the merged layers had corrected
313+
commit dates. Instead, the new layer will have corrected commit dates if and
314+
only if all existing layers below the new layer have corrected commit dates.
315+
270316
## Deleting graph-{hash} files
271317

272318
After a new tip file is written, some `graph-{hash}` files may no longer

0 commit comments

Comments
 (0)