Skip to content

Commit b986df5

Browse files
jeffhostetlergitster
authored andcommitted
read-cache: speed up has_dir_name (part 2)
Teach has_dir_name() to see if the path of the new item is greater than the last path in the index array before attempting to search for it. has_dir_name() is looking for file/directory collisions in the index and has to consider each sub-directory prefix in turn. This can cause multiple binary searches for each path. During operations like checkout, merge_working_tree() populates the new index in sorted order, so we expect to be able to append in many cases. This commit is part 2 of 2. This commit handles the additional possible short-cuts as we look at each sub-directory prefix. The net-net gains for add_index_entry_with_check() and both had_dir_name() commits are best seen for very large repos. Here are results for an INFLATED version of linux.git with 1M files. $ GIT_PERF_REPO=/mnt/test/linux_inflated.git/ ./run upstream/base HEAD ./p0006-read-tree-checkout.sh Test upstream/base HEAD 0006.2: read-tree br_base br_ballast (1043893) 3.79(3.63+0.15) 2.68(2.52+0.15) -29.3% 0006.3: switch between br_base br_ballast (1043893) 7.55(6.58+0.44) 6.03(4.60+0.43) -20.1% 0006.4: switch between br_ballast br_ballast_plus_1 (1043893) 10.84(9.26+0.59) 8.44(7.06+0.65) -22.1% 0006.5: switch between aliases (1043893) 10.93(9.39+0.58) 10.24(7.04+0.63) -6.3% Here are results for a synthetic repo with 4.2M files. $ GIT_PERF_REPO=~/work/gfw/t/perf/repos/gen-many-files-10.4.3.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (4194305) 29.96(19.26+10.50) 23.76(13.42+10.12) -20.7% 0006.3: switch between br_base br_ballast (4194305) 56.95(36.08+16.83) 45.54(25.94+15.68) -20.0% 0006.4: switch between br_ballast br_ballast_plus_1 (4194305) 90.94(51.50+31.52) 78.22(39.39+30.70) -14.0% 0006.5: switch between aliases (4194305) 93.72(51.63+34.09) 77.94(39.00+30.88) -16.8% Results for medium repos (like linux.git) are mixed and have more variance (probably do to disk IO unrelated to this test. $ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (57994) 0.25(0.21+0.03) 0.20(0.17+0.02) -20.0% 0006.3: switch between br_base br_ballast (57994) 10.67(6.06+2.92) 10.51(5.94+2.91) -1.5% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.47+0.16) 0.52(0.40+0.13) -11.9% 0006.5: switch between aliases (57994) 0.59(0.44+0.17) 0.51(0.38+0.14) -13.6% $ GIT_PERF_REPO=/mnt/test/linux.git/ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (57994) 0.24(0.21+0.02) 0.21(0.18+0.02) -12.5% 0006.3: switch between br_base br_ballast (57994) 10.42(5.98+2.91) 10.66(5.86+3.09) +2.3% 0006.4: switch between br_ballast br_ballast_plus_1 (57994) 0.59(0.49+0.13) 0.53(0.37+0.16) -10.2% 0006.5: switch between aliases (57994) 0.59(0.43+0.17) 0.50(0.37+0.14) -15.3% Results for smaller repos (like git.git) are not significant. $ ./run HEAD~3 HEAD ./p0006-read-tree-checkout.sh Test HEAD~3 HEAD 0006.2: read-tree br_base br_ballast (3043) 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0006.3: switch between br_base br_ballast (3043) 0.31(0.17+0.11) 0.29(0.19+0.08) -6.5% 0006.4: switch between br_ballast br_ballast_plus_1 (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0% 0006.5: switch between aliases (3043) 0.03(0.02+0.00) 0.03(0.02+0.00) +0.0% Signed-off-by: Jeff Hostetler <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 06b6d81 commit b986df5

File tree

1 file changed

+62
-1
lines changed

1 file changed

+62
-1
lines changed

read-cache.c

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -965,7 +965,7 @@ static int has_dir_name(struct index_state *istate,
965965
}
966966

967967
for (;;) {
968-
int len;
968+
size_t len;
969969

970970
for (;;) {
971971
if (*--slash == '/')
@@ -975,6 +975,67 @@ static int has_dir_name(struct index_state *istate,
975975
}
976976
len = slash - name;
977977

978+
if (cmp_last > 0) {
979+
/*
980+
* (len + 1) is a directory boundary (including
981+
* the trailing slash). And since the loop is
982+
* decrementing "slash", the first iteration is
983+
* the longest directory prefix; subsequent
984+
* iterations consider parent directories.
985+
*/
986+
987+
if (len + 1 <= len_eq_last) {
988+
/*
989+
* The directory prefix (including the trailing
990+
* slash) also appears as a prefix in the last
991+
* entry, so the remainder cannot collide (because
992+
* strcmp said the whole path was greater).
993+
*
994+
* EQ: last: xxx/A
995+
* this: xxx/B
996+
*
997+
* LT: last: xxx/file_A
998+
* this: xxx/file_B
999+
*/
1000+
return retval;
1001+
}
1002+
1003+
if (len > len_eq_last) {
1004+
/*
1005+
* This part of the directory prefix (excluding
1006+
* the trailing slash) is longer than the known
1007+
* equal portions, so this sub-directory cannot
1008+
* collide with a file.
1009+
*
1010+
* GT: last: xxxA
1011+
* this: xxxB/file
1012+
*/
1013+
return retval;
1014+
}
1015+
1016+
if (istate->cache_nr > 0 &&
1017+
ce_namelen(istate->cache[istate->cache_nr - 1]) > len) {
1018+
/*
1019+
* The directory prefix lines up with part of
1020+
* a longer file or directory name, but sorts
1021+
* after it, so this sub-directory cannot
1022+
* collide with a file.
1023+
*
1024+
* last: xxx/yy-file (because '-' sorts before '/')
1025+
* this: xxx/yy/abc
1026+
*/
1027+
return retval;
1028+
}
1029+
1030+
/*
1031+
* This is a possible collision. Fall through and
1032+
* let the regular search code handle it.
1033+
*
1034+
* last: xxx
1035+
* this: xxx/file
1036+
*/
1037+
}
1038+
9781039
pos = index_name_stage_pos(istate, name, len, stage);
9791040
if (pos >= 0) {
9801041
/*

0 commit comments

Comments
 (0)