Skip to content

Commit 2a75ef0

Browse files
committed
diffcore-rename, merge-ort: use handle_early_known_dir_renames()
Put the work of the last several patches to work by calling the new handle_early_known_dir_renames() function just after doing basename-guided rename detection. For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.671 s ± 0.052 s 5.714 s ± 0.087 s mega-renames: 13.653 s ± 0.104 s 11.439 s ± 0.192 s just-one-mega: 499.4 ms ± 3.6 ms 501.1 ms ± 3.8 ms While this improvement looks rather modest for these testcases (because all the previous optimizations were sufficient to nearly remove all time spent in rename detection already), there was a cherry-pick in a real world (but private) repository at $DAYJOB that showed a speedup factor of ~7x from this optimization. An important side note for future further optimization: There is a possible improvement to this optimization that I have not yet attempted: we could first check whether exact renames provide enough information for us to determine directory renames, and avoid doing basename-guided rename detection on some or all of the RELEVANT_LOCATION files within those directories. In effect, this variant would mean doing the handle_early_known_dir_renames() both after exact rename detection and again after basename-guided rename detection, though I remember thinking at one point that there was some extra cleanup needed between the steps if we were to try that. Checking for skippable renames an extra time might be a valuable optimization based on an understanding of the difference in performance of exact rename detection and basename-guided rename detection: While basename-guided rename detection is faster than full inexact rename detection with the latter's NxM file content comparisons, basename-guided rename detection is vastly slower than exact rename detection. The reason for this is that the first time you need to compare a file's contents to another is the most expensive; after diffcore-rename has generated a sequence of hashes for a file's contents, any subsequent comparisons of that file (via the sequence of hashes) to another file is much cheaper. So, removing files from rename detection before doing some basename-guided rename detection could potentially improve this optimization further. However, this particular optimization was actually the last one I did in original implementation order, and by the time I implemented this idea, every testcase I had was sufficiently fast that further optimization was unwarranted. If future testcases arise that tax rename detection more heavily, it may be worth implementing this more involved variant. Signed-off-by: Elijah Newren <[email protected]>
1 parent 79321ab commit 2a75ef0

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

diffcore-rename.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1078,7 +1078,6 @@ static int remove_unneeded_paths_from_src(int num_src,
10781078
return new_num_src;
10791079
}
10801080

1081-
MAYBE_UNUSED
10821081
static int handle_early_known_dir_renames(int num_src,
10831082
struct dir_rename_info *info,
10841083
struct strintmap *relevant_sources,
@@ -1312,11 +1311,19 @@ void diffcore_rename_extended(struct diff_options *options,
13121311
* Cull sources, again:
13131312
* - remove ones involved in renames (found via basenames)
13141313
* - remove ones not found in relevant_sources
1314+
* and
1315+
* - remove ones in relevant_sources which are needed only
1316+
* for directory renames, if no ancestory directory actually
1317+
* needs to know any more individual path renames under them
13151318
*/
13161319
trace2_region_enter("diff", "cull basename", options->repo);
13171320
num_sources = remove_unneeded_paths_from_src(num_sources,
13181321
want_copies,
13191322
relevant_sources);
1323+
num_sources = handle_early_known_dir_renames(num_sources,
1324+
&info,
1325+
relevant_sources,
1326+
dirs_removed);
13201327
trace2_region_leave("diff", "cull basename", options->repo);
13211328
}
13221329

0 commit comments

Comments
 (0)