Skip to content

Commit ee1f0c2

Browse files
derrickstoleegitster
authored andcommitted
read-cache: add index.skipHash config option
The previous change allowed skipping the hashing portion of the hashwrite API, using it instead as a buffered write API. Disabling the hashwrite can be particularly helpful when the write operation is in a critical path. One such critical path is the writing of the index. This operation is so critical that the sparse index was created specifically to reduce the size of the index to make these writes (and reads) faster. This trade-off between file stability at rest and write-time performance is not easy to balance. The index is an interesting case for a couple reasons: 1. Writes block users. Writing the index takes place in many user- blocking foreground operations. The speed improvement directly impacts their use. Other file formats are typically written in the background (commit-graph, multi-pack-index) or are super-critical to correctness (pack-files). 2. Index files are short lived. It is rare that a user leaves an index for a long time with many staged changes. Outside of staged changes, the index can be completely destroyed and rewritten with minimal impact to the user. Following a similar approach to one used in the microsoft/git fork [1], add a new config option (index.skipHash) that allows disabling this hashing during the index write. The cost is that we can no longer validate the contents for corruption-at-rest using the trailing hash. [1] microsoft@21fed2d We load this config from the repository config given by istate->repo, with a fallback to the_repository if it is not set. While older Git versions will not recognize the null hash as a special case, the file format itself is still being met in terms of its structure. Using this null hash will still allow Git operations to function across older versions. The one exception is 'git fsck' which checks the hash of the index file. This used to be a check on every index read, but was split out to just the index in a33fc72 (read-cache: force_verify_index_checksum, 2017-04-14) and released first in Git 2.13.0. Document the versions that relaxed these restrictions, with the optimistic expectation that this change will be included in Git 2.40.0. Here, we disable this check if the trailing hash is all zeroes. We add a warning to the config option that this may cause undesirable behavior with older Git versions. As a quick comparison, I tested 'git update-index --force-write' with and without index.skipHash=true on a copy of the Linux kernel repository. Benchmark 1: with hash Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms] Range (min … max): 34.3 ms … 79.1 ms 82 runs Benchmark 2: without hash Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms] Range (min … max): 16.3 ms … 42.0 ms 69 runs Summary 'without hash' ran 1.78 ± 0.76 times faster than 'with hash' These performance benefits are substantial enough to allow users the ability to opt-in to this feature, even with the potential confusion with older 'git fsck' versions. Test this new config option, both at a command-line level and within a submodule. The confirmation is currently limited to confirm that 'git fsck' does not complain about the index. Future updates will make this test more robust. It is critical that this test is placed before the test_index_version tests, since those tests obliterate the .git/config file and hence lose the setting from GIT_TEST_DEFAULT_HASH, if set. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 1687150 commit ee1f0c2

File tree

3 files changed

+37
-1
lines changed

3 files changed

+37
-1
lines changed

Documentation/config/index.txt

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,14 @@ index.version::
3030
Specify the version with which new index files should be
3131
initialized. This does not affect existing repositories.
3232
If `feature.manyFiles` is enabled, then the default is 4.
33+
34+
index.skipHash::
35+
When enabled, do not compute the trailing hash for the index file.
36+
This accelerates Git commands that manipulate the index, such as
37+
`git add`, `git commit`, or `git status`. Instead of storing the
38+
checksum, write a trailing set of bytes with value zero, indicating
39+
that the computation was skipped.
40+
+
41+
If you enable `index.skipHash`, then Git clients older than 2.13.0 will
42+
refuse to parse the index and Git clients older than 2.40.0 will report an
43+
error during `git fsck`.

read-cache.c

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1817,6 +1817,8 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
18171817
git_hash_ctx c;
18181818
unsigned char hash[GIT_MAX_RAWSZ];
18191819
int hdr_version;
1820+
unsigned char *start, *end;
1821+
struct object_id oid;
18201822

18211823
if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
18221824
return error(_("bad signature 0x%08x"), hdr->hdr_signature);
@@ -1827,10 +1829,16 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
18271829
if (!verify_index_checksum)
18281830
return 0;
18291831

1832+
end = (unsigned char *)hdr + size;
1833+
start = end - the_hash_algo->rawsz;
1834+
oidread(&oid, start);
1835+
if (oideq(&oid, null_oid()))
1836+
return 0;
1837+
18301838
the_hash_algo->init_fn(&c);
18311839
the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz);
18321840
the_hash_algo->final_fn(hash, &c);
1833-
if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz))
1841+
if (!hasheq(hash, start))
18341842
return error(_("bad index file sha1 signature"));
18351843
return 0;
18361844
}
@@ -2915,9 +2923,12 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
29152923
int ieot_entries = 1;
29162924
struct index_entry_offset_table *ieot = NULL;
29172925
int nr, nr_threads;
2926+
struct repository *r = istate->repo ? istate->repo : the_repository;
29182927

29192928
f = hashfd(tempfile->fd, tempfile->filename.buf);
29202929

2930+
repo_config_get_bool(r, "index.skiphash", &f->skip_hash);
2931+
29212932
for (i = removed = extended = 0; i < entries; i++) {
29222933
if (cache[i]->ce_flags & CE_REMOVE)
29232934
removed++;

t/t1600-index.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,20 @@ test_expect_success 'out of bounds index.version issues warning' '
6565
)
6666
'
6767

68+
test_expect_success 'index.skipHash config option' '
69+
rm -f .git/index &&
70+
git -c index.skipHash=true add a &&
71+
git fsck &&
72+
73+
test_commit start &&
74+
git -c protocol.file.allow=always submodule add ./ sub &&
75+
git config index.skipHash false &&
76+
git -C sub config index.skipHash true &&
77+
>sub/file &&
78+
git -C sub add a &&
79+
git -C sub fsck
80+
'
81+
6882
test_index_version () {
6983
INDEX_VERSION_CONFIG=$1 &&
7084
FEATURE_MANY_FILES=$2 &&

0 commit comments

Comments
 (0)