Skip to content

Commit 82d8475

Browse files
committed
commit-graph: merge commit-graph chains
When searching for a commit in a commit-graph chain of G graphs with N commits, the search takes O(G log N) time. If we always add a new tip graph with every write, the linear G term will start to dominate and slow the lookup process. To keep lookups fast, but also keep most incremental writes fast, create a strategy for merging levels of the commit-graph chain. The strategy is detailed in the commit-graph design document, but is summarized by these two conditions: 1. If the number of commits we are adding is more than half the number of commits in the graph below, then merge with that graph. 2. If we are writing more than 64,000 commits into a single graph, then merge with all lower graphs. The numeric values in the conditions above are currently constant, but can become config options in a future update. As we merge levels of the commit-graph chain, check that the commits still exist in the repository. A garbage-collection operation may have removed those commits from the object store and we do not want to persist them in the commit-graph chain. This is a non-issue if the 'git gc' process wrote a new, single-level commit-graph file. After we merge levels, the old graph-{hash}.graph files are no longer referenced by the commit-graph-chain file. We will expire these files in a future change. Signed-off-by: Derrick Stolee <[email protected]>
1 parent 55e4289 commit 82d8475

File tree

3 files changed

+251
-33
lines changed

3 files changed

+251
-33
lines changed

Documentation/technical/commit-graph.txt

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,87 @@ positions to refer to their parents, which may be in `graph-{hash1}.graph` or
186186
its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 +
187187
X2).
188188

189+
Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data
190+
specifying the hashes of all files in the lower layers. In the above example,
191+
`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains
192+
`{hash0}` and `{hash1}`.
193+
194+
## Merging commit-graph files
195+
196+
If we only added a new commit-graph file on every write, we would run into a
197+
linear search problem through many commit-graph files. Instead, we use a merge
198+
strategy to decide when the stack should collapse some number of levels.
199+
200+
The diagram below shows such a collapse. As a set of new commits are added, it
201+
is determined by the merge strategy that the files should collapse to
202+
`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and
203+
the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}`
204+
file.
205+
206+
+---------------------+
207+
| |
208+
| (new commits) |
209+
| |
210+
+---------------------+
211+
| |
212+
+-----------------------+ +---------------------+
213+
| graph-{hash2} |->| |
214+
+-----------------------+ +---------------------+
215+
| | |
216+
+-----------------------+ +---------------------+
217+
| | | |
218+
| graph-{hash1} |->| |
219+
| | | |
220+
+-----------------------+ +---------------------+
221+
| tmp_graphXXX
222+
+-----------------------+
223+
| |
224+
| |
225+
| |
226+
| graph-{hash0} |
227+
| |
228+
| |
229+
| |
230+
+-----------------------+
231+
232+
During this process, the commits to write are combined, sorted and we write the
233+
contents to a temporary file, all while holding a `commit-graph-chain.lock`
234+
lock-file. When the file is flushed, we rename it to `graph-{hash3}`
235+
according to the computed `{hash3}`. Finally, we write the new chain data to
236+
`commit-graph-chain.lock`:
237+
238+
```
239+
{hash3}
240+
{hash0}
241+
```
242+
243+
We then close the lock-file.
244+
245+
## Merge Strategy
246+
247+
When writing a set of commits that do not exist in the commit-graph stack of
248+
height N, we default to creating a new file at level N + 1. We then decide to
249+
merge with the Nth level if one of two conditions hold:
250+
251+
1. The expected file size for level N + 1 is at least half the file size for
252+
level N.
253+
254+
2. Level N + 1 contains more than MAX_SPLIT_COMMITS commits (64,0000
255+
commits).
256+
257+
This decision cascades down the levels: when we merge a level we create a new
258+
set of commits that then compares to the next level.
259+
260+
The first condition bounds the number of levels to be logarithmic in the total
261+
number of commits. The second condition bounds the total number of commits in
262+
a `graph-{hashN}` file and not in the `commit-graph` file, preventing
263+
significant performance issues when the stack merges and another process only
264+
partially reads the previous stack.
265+
266+
The merge strategy values (2 for the size multiple, 64,000 for the maximum
267+
number of commits) could be extracted into config settings for full
268+
flexibility.
269+
189270
Related Links
190271
-------------
191272
[0] https://bugs.chromium.org/p/git/issues/detail?id=8

commit-graph.c

Lines changed: 157 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1277,36 +1277,6 @@ static int write_graph_chunk_base(struct hashfile *f,
12771277
return 0;
12781278
}
12791279

1280-
static void init_commit_graph_chain(struct write_commit_graph_context *ctx)
1281-
{
1282-
struct commit_graph *g = ctx->r->objects->commit_graph;
1283-
uint32_t i;
1284-
1285-
ctx->new_base_graph = g;
1286-
ctx->base_graph_name = xstrdup(g->filename);
1287-
ctx->new_num_commits_in_base = g->num_commits + g->num_commits_in_base;
1288-
1289-
ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1;
1290-
1291-
ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after);
1292-
ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after);
1293-
1294-
for (i = 0; i < ctx->num_commit_graphs_before - 1; i++)
1295-
ctx->commit_graph_filenames_after[i] = xstrdup(ctx->commit_graph_filenames_before[i]);
1296-
1297-
if (ctx->num_commit_graphs_before)
1298-
ctx->commit_graph_filenames_after[ctx->num_commit_graphs_before - 1] =
1299-
get_split_graph_filename(ctx->obj_dir, oid_to_hex(&g->oid));
1300-
1301-
i = ctx->num_commit_graphs_before - 1;
1302-
1303-
while (g) {
1304-
ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid));
1305-
i--;
1306-
g = g->base_graph;
1307-
}
1308-
}
1309-
13101280
static int write_commit_graph_file(struct write_commit_graph_context *ctx)
13111281
{
13121282
uint32_t i;
@@ -1484,6 +1454,155 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
14841454
return 0;
14851455
}
14861456

1457+
static int split_strategy_max_commits = 64000;
1458+
static float split_strategy_size_mult = 2.0f;
1459+
1460+
static void split_graph_merge_strategy(struct write_commit_graph_context *ctx)
1461+
{
1462+
struct commit_graph *g = ctx->r->objects->commit_graph;
1463+
uint32_t num_commits = ctx->commits.nr;
1464+
uint32_t i;
1465+
1466+
g = ctx->r->objects->commit_graph;
1467+
ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1;
1468+
1469+
while (g && (g->num_commits <= split_strategy_size_mult * num_commits ||
1470+
num_commits > split_strategy_max_commits)) {
1471+
num_commits += g->num_commits;
1472+
g = g->base_graph;
1473+
1474+
ctx->num_commit_graphs_after--;
1475+
}
1476+
1477+
ctx->new_base_graph = g;
1478+
1479+
ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after);
1480+
ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after);
1481+
1482+
for (i = 0; i < ctx->num_commit_graphs_after &&
1483+
i < ctx->num_commit_graphs_before; i++)
1484+
ctx->commit_graph_filenames_after[i] = xstrdup(ctx->commit_graph_filenames_before[i]);
1485+
1486+
i = ctx->num_commit_graphs_before - 1;
1487+
g = ctx->r->objects->commit_graph;
1488+
1489+
while (g) {
1490+
if (i < ctx->num_commit_graphs_after)
1491+
ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid));
1492+
1493+
i--;
1494+
g = g->base_graph;
1495+
}
1496+
}
1497+
1498+
static void merge_commit_graph(struct write_commit_graph_context *ctx,
1499+
struct commit_graph *g)
1500+
{
1501+
uint32_t i;
1502+
uint32_t offset = g->num_commits_in_base;
1503+
1504+
ALLOC_GROW(ctx->commits.list, ctx->commits.nr + g->num_commits, ctx->commits.alloc);
1505+
1506+
for (i = 0; i < g->num_commits; i++) {
1507+
struct object_id oid;
1508+
struct commit *result;
1509+
1510+
display_progress(ctx->progress, i + 1);
1511+
1512+
load_oid_from_graph(g, i + offset, &oid);
1513+
1514+
/* only add commits if they still exist in the repo */
1515+
result = lookup_commit_reference_gently(ctx->r, &oid, 1);
1516+
1517+
if (result) {
1518+
ctx->commits.list[ctx->commits.nr] = result;
1519+
ctx->commits.nr++;
1520+
}
1521+
}
1522+
}
1523+
1524+
static int commit_compare(const void *_a, const void *_b)
1525+
{
1526+
const struct commit *a = *(const struct commit **)_a;
1527+
const struct commit *b = *(const struct commit **)_b;
1528+
return oidcmp(&a->object.oid, &b->object.oid);
1529+
}
1530+
1531+
static void deduplicate_commits(struct write_commit_graph_context *ctx)
1532+
{
1533+
uint32_t i, num_parents, last_distinct = 0, duplicates = 0;
1534+
struct commit_list *parent;
1535+
1536+
if (ctx->report_progress)
1537+
ctx->progress = start_delayed_progress(
1538+
_("De-duplicating merged commits"),
1539+
ctx->commits.nr);
1540+
1541+
QSORT(ctx->commits.list, ctx->commits.nr, commit_compare);
1542+
1543+
ctx->num_extra_edges = 0;
1544+
for (i = 1; i < ctx->commits.nr; i++) {
1545+
display_progress(ctx->progress, i);
1546+
1547+
if (oideq(&ctx->commits.list[last_distinct]->object.oid,
1548+
&ctx->commits.list[i]->object.oid)) {
1549+
duplicates++;
1550+
} else {
1551+
if (duplicates)
1552+
ctx->commits.list[last_distinct + 1] = ctx->commits.list[i];
1553+
last_distinct++;
1554+
1555+
num_parents = 0;
1556+
for (parent = ctx->commits.list[i]->parents; parent; parent = parent->next)
1557+
num_parents++;
1558+
1559+
if (num_parents > 2)
1560+
ctx->num_extra_edges += num_parents - 2;
1561+
}
1562+
}
1563+
1564+
ctx->commits.nr -= duplicates;
1565+
stop_progress(&ctx->progress);
1566+
}
1567+
1568+
static void merge_commit_graphs(struct write_commit_graph_context *ctx)
1569+
{
1570+
struct commit_graph *g = ctx->r->objects->commit_graph;
1571+
uint32_t current_graph_number = ctx->num_commit_graphs_before;
1572+
struct strbuf progress_title = STRBUF_INIT;
1573+
1574+
while (g && current_graph_number >= ctx->num_commit_graphs_after) {
1575+
current_graph_number--;
1576+
1577+
if (ctx->report_progress) {
1578+
if (current_graph_number)
1579+
strbuf_addf(&progress_title,
1580+
_("Merging commit-graph-%d"),
1581+
current_graph_number);
1582+
else
1583+
strbuf_addstr(&progress_title,
1584+
_("Merging commit-graph"));
1585+
ctx->progress = start_delayed_progress(progress_title.buf, 0);
1586+
}
1587+
1588+
merge_commit_graph(ctx, g);
1589+
stop_progress(&ctx->progress);
1590+
strbuf_release(&progress_title);
1591+
1592+
g = g->base_graph;
1593+
}
1594+
1595+
if (g) {
1596+
ctx->new_base_graph = g;
1597+
ctx->new_num_commits_in_base = g->num_commits + g->num_commits_in_base;
1598+
}
1599+
1600+
if (ctx->new_base_graph)
1601+
ctx->base_graph_name = xstrdup(ctx->new_base_graph->filename);
1602+
1603+
deduplicate_commits(ctx);
1604+
}
1605+
14871606
int write_commit_graph(const char *obj_dir,
14881607
struct string_list *pack_indexes,
14891608
struct string_list *commit_hex,
@@ -1529,6 +1648,9 @@ int write_commit_graph(const char *obj_dir,
15291648
ctx->approx_nr_objects = approximate_object_count();
15301649
ctx->oids.alloc = ctx->approx_nr_objects / 32;
15311650

1651+
if (ctx->split && ctx->oids.alloc > split_strategy_max_commits)
1652+
ctx->oids.alloc = split_strategy_max_commits;
1653+
15321654
if (ctx->append) {
15331655
prepare_commit_graph_one(ctx->r, ctx->obj_dir);
15341656
if (ctx->r->objects->commit_graph)
@@ -1582,9 +1704,11 @@ int write_commit_graph(const char *obj_dir,
15821704
if (!ctx->commits.nr)
15831705
goto cleanup;
15841706

1585-
if (ctx->split)
1586-
init_commit_graph_chain(ctx);
1587-
else
1707+
if (ctx->split) {
1708+
split_graph_merge_strategy(ctx);
1709+
1710+
merge_commit_graphs(ctx);
1711+
} else
15881712
ctx->num_commit_graphs_after = 1;
15891713

15901714
compute_generation_numbers(ctx);

t/t5323-split-commit-graph.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,4 +117,17 @@ test_expect_success 'add one commit, write a tip graph' '
117117

118118
graph_git_behavior 'three-layer commit-graph: commit 11 vs 6' commits/11 commits/6
119119

120+
test_expect_success 'add one commit, write a merged graph' '
121+
test_commit 12 &&
122+
git branch commits/12 &&
123+
git commit-graph write --reachable --split &&
124+
test_path_is_file $graphdir/commit-graph-chain &&
125+
test_line_count = 2 $graphdir/commit-graph-chain &&
126+
ls $graphdir/graph-*.graph >graph-files &&
127+
test_line_count = 4 graph-files &&
128+
verify_chain_files_exist $graphdir
129+
'
130+
131+
graph_git_behavior 'merged commit-graph: commit 12 vs 6' commits/12 commits/6
132+
120133
test_done

0 commit comments

Comments
 (0)