You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
diffcore-rename, merge-ort: use handle_early_known_dir_renames()
Put the work of the last several patches to work by calling the new
handle_early_known_dir_renames() function just after doing
basename-guided rename detection.
For the testcases mentioned in commit 557ac03 ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.671 s ± 0.052 s 5.714 s ± 0.087 s
mega-renames: 13.653 s ± 0.104 s 11.439 s ± 0.192 s
just-one-mega: 499.4 ms ± 3.6 ms 501.1 ms ± 3.8 ms
While this improvement looks rather modest for these testcases (because
all the previous optimizations were sufficient to nearly remove all time
spent in rename detection already), there was a cherry-pick in a real
world (but private) repository at $DAYJOB that showed a speedup factor
of ~7x from this optimization.
An important side note for future further optimization:
There is a possible improvement to this optimization that I have not yet
attempted: we could first check whether exact renames provide enough
information for us to determine directory renames, and avoid doing
basename-guided rename detection on some or all of the RELEVANT_LOCATION
files within those directories. In effect, this variant would mean
doing the handle_early_known_dir_renames() both after exact rename
detection and again after basename-guided rename detection, though I
remember thinking at one point that there was some extra cleanup needed
between the steps if we were to try that. Checking for skippable
renames an extra time might be a valuable optimization based on an
understanding of the difference in performance of exact rename detection
and basename-guided rename detection:
While basename-guided rename detection is faster than full inexact
rename detection with the latter's NxM file content comparisons,
basename-guided rename detection is vastly slower than exact rename
detection. The reason for this is that the first time you need to
compare a file's contents to another is the most expensive; after
diffcore-rename has generated a sequence of hashes for a file's
contents, any subsequent comparisons of that file (via the sequence
of hashes) to another file is much cheaper.
So, removing files from rename detection before doing some
basename-guided rename detection could potentially improve this
optimization further. However, this particular optimization was
actually the last one I did in original implementation order, and by the
time I implemented this idea, every testcase I had was sufficiently fast
that further optimization was unwarranted. If future testcases arise
that tax rename detection more heavily, it may be worth implementing
this more involved variant.
Signed-off-by: Elijah Newren <[email protected]>
0 commit comments