Skip to content

Commit 8c52fd7

Browse files
committed
Merge branch 'ds/multi-pack-index' into pu
When there are too many packfiles in a repository (which is not recommended), looking up an object in these would require consulting many pack .idx files; a new mechanism to have a single file that consolidates all of these .idx files is introduced. * ds/multi-pack-index: (23 commits) midx: clear midx on repack packfile: skip loading index if in multi-pack-index midx: prevent duplicate packfile loads midx: use midx in approximate_object_count midx: use existing midx when writing new one midx: use midx in abbreviation calculations midx: read objects from multi-pack-index config: create core.multiPackIndex setting midx: write object offsets midx: write object id fanout chunk midx: write object ids in a chunk midx: sort and deduplicate objects from packfiles midx: read pack names into array multi-pack-index: write pack names in chunk multi-pack-index: read packfile list packfile: generalize pack directory list t5319: expand test data multi-pack-index: load into memory midx: write header information to lockfile multi-pack-index: add 'write' verb ...
2 parents 3be7092 + 1fe8a5e commit 8c52fd7

21 files changed

+1720
-43
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,9 @@
100100
/git-mergetool--lib
101101
/git-mktag
102102
/git-mktree
103-
/git-name-rev
103+
/git-multi-pack-index
104104
/git-mv
105+
/git-name-rev
105106
/git-notes
106107
/git-p4
107108
/git-pack-redundant

Documentation/config.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -924,6 +924,11 @@ gc.commitGraph::
924924
required. Default is false. See linkgit:git-commit-graph[1]
925925
for details.
926926

927+
core.multiPackIndex::
928+
Use the multi-pack-index file to track multiple packfiles using a
929+
single index. See link:technical/multi-pack-index.html[the
930+
multi-pack-index design document].
931+
927932
core.sparseCheckout::
928933
Enable "sparse checkout" feature. See section "Sparse checkout" in
929934
linkgit:git-read-tree[1] for more information.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
git-multi-pack-index(1)
2+
======================
3+
4+
NAME
5+
----
6+
git-multi-pack-index - Write and verify multi-pack-indexes
7+
8+
9+
SYNOPSIS
10+
--------
11+
[verse]
12+
'git multi-pack-index' [--object-dir=<dir>] <verb>
13+
14+
DESCRIPTION
15+
-----------
16+
Write or verify a multi-pack-index (MIDX) file.
17+
18+
OPTIONS
19+
-------
20+
21+
--object-dir=<dir>::
22+
Use given directory for the location of Git objects. We check
23+
`<dir>/packs/multi-pack-index` for the current MIDX file, and
24+
`<dir>/packs` for the pack-files to index.
25+
26+
write::
27+
When given as the verb, write a new MIDX file to
28+
`<dir>/packs/multi-pack-index`.
29+
30+
31+
EXAMPLES
32+
--------
33+
34+
* Write a MIDX file for the packfiles in the current .git folder.
35+
+
36+
-----------------------------------------------
37+
$ git multi-pack-index write
38+
-----------------------------------------------
39+
40+
* Write a MIDX file for the packfiles in an alternate object store.
41+
+
42+
-----------------------------------------------
43+
$ git multi-pack-index --object-dir <alt> write
44+
-----------------------------------------------
45+
46+
47+
SEE ALSO
48+
--------
49+
See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
50+
Document] and link:technical/pack-format.html[The Multi-Pack-Index
51+
Format] for more information on the multi-pack-index feature.
52+
53+
54+
GIT
55+
---
56+
Part of the linkgit:git[1] suite
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
Multi-Pack-Index (MIDX) Design Notes
2+
====================================
3+
4+
The Git object directory contains a 'pack' directory containing
5+
packfiles (with suffix ".pack") and pack-indexes (with suffix
6+
".idx"). The pack-indexes provide a way to lookup objects and
7+
navigate to their offset within the pack, but these must come
8+
in pairs with the packfiles. This pairing depends on the file
9+
names, as the pack-index differs only in suffix with its pack-
10+
file. While the pack-indexes provide fast lookup per packfile,
11+
this performance degrades as the number of packfiles increases,
12+
because abbreviations need to inspect every packfile and we are
13+
more likely to have a miss on our most-recently-used packfile.
14+
For some large repositories, repacking into a single packfile
15+
is not feasible due to storage space or excessive repack times.
16+
17+
The multi-pack-index (MIDX for short) stores a list of objects
18+
and their offsets into multiple packfiles. It contains:
19+
20+
- A list of packfile names.
21+
- A sorted list of object IDs.
22+
- A list of metadata for the ith object ID including:
23+
- A value j referring to the jth packfile.
24+
- An offset within the jth packfile for the object.
25+
- If large offsets are required, we use another list of large
26+
offsets similar to version 2 pack-indexes.
27+
28+
Thus, we can provide O(log N) lookup time for any number
29+
of packfiles.
30+
31+
Design Details
32+
--------------
33+
34+
- The MIDX is stored in a file named 'multi-pack-index' in the
35+
.git/objects/pack directory. This could be stored in the pack
36+
directory of an alternate. It refers only to packfiles in that
37+
same directory.
38+
39+
- The pack.multiIndex config setting must be on to consume MIDX files.
40+
41+
- The file format includes parameters for the object ID hash
42+
function, so a future change of hash algorithm does not require
43+
a change in format.
44+
45+
- The MIDX keeps only one record per object ID. If an object appears
46+
in multiple packfiles, then the MIDX selects the copy in the most-
47+
recently modified packfile.
48+
49+
- If there exist packfiles in the pack directory not registered in
50+
the MIDX, then those packfiles are loaded into the `packed_git`
51+
list and `packed_git_mru` cache.
52+
53+
- The pack-indexes (.idx files) remain in the pack directory so we
54+
can delete the MIDX file, set core.midx to false, or downgrade
55+
without any loss of information.
56+
57+
- The MIDX file format uses a chunk-based approach (similar to the
58+
commit-graph file) that allows optional data to be added.
59+
60+
Future Work
61+
-----------
62+
63+
- Add a 'verify' subcommand to the 'git midx' builtin to verify the
64+
contents of the multi-pack-index file match the offsets listed in
65+
the corresponding pack-indexes.
66+
67+
- The multi-pack-index allows many packfiles, especially in a context
68+
where repacking is expensive (such as a very large repo), or
69+
unexpected maintenance time is unacceptable (such as a high-demand
70+
build machine). However, the multi-pack-index needs to be rewritten
71+
in full every time. We can extend the format to be incremental, so
72+
writes are fast. By storing a small "tip" multi-pack-index that
73+
points to large "base" MIDX files, we can keep writes fast while
74+
still reducing the number of binary searches required for object
75+
lookups.
76+
77+
- The reachability bitmap is currently paired directly with a single
78+
packfile, using the pack-order as the object order to hopefully
79+
compress the bitmaps well using run-length encoding. This could be
80+
extended to pair a reachability bitmap with a multi-pack-index. If
81+
the multi-pack-index is extended to store a "stable object order"
82+
(a function Order(hash) = integer that is constant for a given hash,
83+
even as the multi-pack-index is updated) then a reachability bitmap
84+
could point to a multi-pack-index and be updated independently.
85+
86+
- Packfiles can be marked as "special" using empty files that share
87+
the initial name but replace ".pack" with ".keep" or ".promisor".
88+
We can add an optional chunk of data to the multi-pack-index that
89+
records flags of information about the packfiles. This allows new
90+
states, such as 'repacked' or 'redeltified', that can help with
91+
pack maintenance in a multi-pack environment. It may also be
92+
helpful to organize packfiles by object type (commit, tree, blob,
93+
etc.) and use this metadata to help that maintenance.
94+
95+
- The partial clone feature records special "promisor" packs that
96+
may point to objects that are not stored locally, but available
97+
on request to a server. The multi-pack-index does not currently
98+
track these promisor packs.
99+
100+
Related Links
101+
-------------
102+
[0] https://bugs.chromium.org/p/git/issues/detail?id=6
103+
Chromium work item for: Multi-Pack Index (MIDX)
104+
105+
[1] https://public-inbox.org/git/[email protected]/
106+
An earlier RFC for the multi-pack-index feature
107+
108+
[2] https://public-inbox.org/git/alpine.DEB.2.20.1803091557510.23109@alexmv-linux/
109+
Git Merge 2018 Contributor's summit notes (includes discussion of MIDX)

Documentation/technical/pack-format.txt

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -252,3 +252,80 @@ Pack file entry: <+
252252
corresponding packfile.
253253

254254
20-byte SHA-1-checksum of all of the above.
255+
256+
== multi-pack-index (MIDX) files have the following format:
257+
258+
The multi-pack-index files refer to multiple pack-files and loose objects.
259+
260+
In order to allow extensions that add extra data to the MIDX, we organize
261+
the body into "chunks" and provide a lookup table at the beginning of the
262+
body. The header includes certain length values, such as the number of packs,
263+
the number of base MIDX files, hash lengths and types.
264+
265+
All 4-byte numbers are in network order.
266+
267+
HEADER:
268+
269+
4-byte signature:
270+
The signature is: {'M', 'I', 'D', 'X'}
271+
272+
1-byte version number:
273+
Git only writes or recognizes version 1.
274+
275+
1-byte Object Id Version
276+
Git only writes or recognizes version 1 (SHA1).
277+
278+
1-byte number of "chunks"
279+
280+
1-byte number of base multi-pack-index files:
281+
This value is currently always zero.
282+
283+
4-byte number of pack files
284+
285+
CHUNK LOOKUP:
286+
287+
(C + 1) * 12 bytes providing the chunk offsets:
288+
First 4 bytes describe chunk id. Value 0 is a terminating label.
289+
Other 8 bytes provide offset in current file for chunk to start.
290+
(Chunks are provided in file-order, so you can infer the length
291+
using the next chunk position if necessary.)
292+
293+
The remaining data in the body is described one chunk at a time, and
294+
these chunks may be given in any order. Chunks are required unless
295+
otherwise specified.
296+
297+
CHUNK DATA:
298+
299+
Packfile Names (ID: {'P', 'N', 'A', 'M'})
300+
Stores the packfile names as concatenated, null-terminated strings.
301+
Packfiles must be listed in lexicographic order for fast lookups by
302+
name. This is the only chunk not guaranteed to be a multiple of four
303+
bytes in length, so should be the last chunk for alignment reasons.
304+
305+
OID Fanout (ID: {'O', 'I', 'D', 'F'})
306+
The ith entry, F[i], stores the number of OIDs with first
307+
byte at most i. Thus F[255] stores the total
308+
number of objects.
309+
310+
OID Lookup (ID: {'O', 'I', 'D', 'L'})
311+
The OIDs for all objects in the MIDX are stored in lexicographic
312+
order in this chunk.
313+
314+
Object Offsets (ID: {'O', 'O', 'F', 'F'})
315+
Stores two 4-byte values for every object.
316+
1: The pack-int-id for the pack storing this object.
317+
2: The offset within the pack.
318+
If all offsets are less than 2^31, then the large offset chunk
319+
will not exist and offsets are stored as in IDX v1.
320+
If there is at least one offset value larger than 2^32-1, then
321+
the large offset chunk must exist. If the large offset chunk
322+
exists and the 31st bit is on, then removing that bit reveals
323+
the row in the large offsets containing the 8-byte offset of
324+
this object.
325+
326+
[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
327+
8-byte offsets into large packfiles.
328+
329+
TRAILER:
330+
331+
20-byte SHA1-checksum of the above contents.

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -719,6 +719,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o
719719
TEST_BUILTINS_OBJS += test-path-utils.o
720720
TEST_BUILTINS_OBJS += test-prio-queue.o
721721
TEST_BUILTINS_OBJS += test-read-cache.o
722+
TEST_BUILTINS_OBJS += test-read-midx.o
722723
TEST_BUILTINS_OBJS += test-ref-store.o
723724
TEST_BUILTINS_OBJS += test-regex.o
724725
TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -895,6 +896,7 @@ LIB_OBJS += merge.o
895896
LIB_OBJS += merge-blobs.o
896897
LIB_OBJS += merge-recursive.o
897898
LIB_OBJS += mergesort.o
899+
LIB_OBJS += midx.o
898900
LIB_OBJS += name-hash.o
899901
LIB_OBJS += negotiator/default.o
900902
LIB_OBJS += notes.o
@@ -1057,6 +1059,7 @@ BUILTIN_OBJS += builtin/merge-recursive.o
10571059
BUILTIN_OBJS += builtin/merge-tree.o
10581060
BUILTIN_OBJS += builtin/mktag.o
10591061
BUILTIN_OBJS += builtin/mktree.o
1062+
BUILTIN_OBJS += builtin/multi-pack-index.o
10601063
BUILTIN_OBJS += builtin/mv.o
10611064
BUILTIN_OBJS += builtin/name-rev.o
10621065
BUILTIN_OBJS += builtin/notes.o

builtin.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ extern int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
191191
extern int cmd_merge_tree(int argc, const char **argv, const char *prefix);
192192
extern int cmd_mktag(int argc, const char **argv, const char *prefix);
193193
extern int cmd_mktree(int argc, const char **argv, const char *prefix);
194+
extern int cmd_multi_pack_index(int argc, const char **argv, const char *prefix);
194195
extern int cmd_mv(int argc, const char **argv, const char *prefix);
195196
extern int cmd_name_rev(int argc, const char **argv, const char *prefix);
196197
extern int cmd_notes(int argc, const char **argv, const char *prefix);

builtin/multi-pack-index.c

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
#include "builtin.h"
2+
#include "cache.h"
3+
#include "config.h"
4+
#include "parse-options.h"
5+
#include "midx.h"
6+
7+
static char const * const builtin_multi_pack_index_usage[] = {
8+
N_("git multi-pack-index [--object-dir=<dir>] write"),
9+
NULL
10+
};
11+
12+
static struct opts_multi_pack_index {
13+
const char *object_dir;
14+
} opts;
15+
16+
int cmd_multi_pack_index(int argc, const char **argv,
17+
const char *prefix)
18+
{
19+
static struct option builtin_multi_pack_index_options[] = {
20+
OPT_FILENAME(0, "object-dir", &opts.object_dir,
21+
N_("object directory containing set of packfile and pack-index pairs")),
22+
OPT_END(),
23+
};
24+
25+
git_config(git_default_config, NULL);
26+
27+
argc = parse_options(argc, argv, prefix,
28+
builtin_multi_pack_index_options,
29+
builtin_multi_pack_index_usage, 0);
30+
31+
if (!opts.object_dir)
32+
opts.object_dir = get_object_directory();
33+
34+
if (argc == 0)
35+
goto usage;
36+
37+
if (!strcmp(argv[0], "write")) {
38+
if (argc > 1)
39+
goto usage;
40+
41+
return write_midx_file(opts.object_dir);
42+
}
43+
44+
usage:
45+
usage_with_options(builtin_multi_pack_index_usage,
46+
builtin_multi_pack_index_options);
47+
}

builtin/repack.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include "strbuf.h"
99
#include "string-list.h"
1010
#include "argv-array.h"
11+
#include "midx.h"
1112
#include "remote-odb.h"
1213

1314
static int delta_base_offset = 1;
@@ -175,6 +176,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
175176
int no_update_server_info = 0;
176177
int quiet = 0;
177178
int local = 0;
179+
int midx_cleared = 0;
178180

179181
struct option builtin_repack_options[] = {
180182
OPT_BIT('a', NULL, &pack_everything,
@@ -334,6 +336,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
334336
for_each_string_list_item(item, &names) {
335337
for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
336338
char *fname, *fname_old;
339+
340+
if (!midx_cleared) {
341+
/* if we move a packfile, it will invalidated the midx */
342+
clear_midx_file(get_object_directory());
343+
midx_cleared = 1;
344+
}
345+
337346
fname = mkpathdup("%s/pack-%s%s", packdir,
338347
item->string, exts[ext].name);
339348
if (!file_exists(fname)) {

command-list.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ git-merge-index plumbingmanipulators
123123
git-merge-one-file purehelpers
124124
git-mergetool ancillarymanipulators complete
125125
git-merge-tree ancillaryinterrogators
126+
git-multi-pack-index plumbingmanipulators
126127
git-mktag plumbingmanipulators
127128
git-mktree plumbingmanipulators
128129
git-mv mainporcelain worktree

git.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -508,6 +508,7 @@ static struct cmd_struct commands[] = {
508508
{ "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT },
509509
{ "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT },
510510
{ "mktree", cmd_mktree, RUN_SETUP },
511+
{ "multi-pack-index", cmd_multi_pack_index, RUN_SETUP_GENTLY },
511512
{ "mv", cmd_mv, RUN_SETUP | NEED_WORK_TREE },
512513
{ "name-rev", cmd_name_rev, RUN_SETUP },
513514
{ "notes", cmd_notes, RUN_SETUP },

0 commit comments

Comments
 (0)