Skip to content

Commit b454241

Browse files
derrickstoleegitster
authored andcommitted
revision.c: generation-based topo-order algorithm
The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. This can dramatically reduce the computation time to write a fixed number of commits, such as when limiting with "-n <N>" or filling the first page of a pager. When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING, then it may terminate without walking all reachable commits. This does not occur if we do not specify UNINTERESTING commits. 2. Run sort_in_topological_order(), which is an implementation of Kahn's algorithm. It first iterates through the entire set of important commits and computes the in-degree of each (plus one, as we use 'zero' as a special value here). Then, we walk the commits in priority order, adding them to the priority queue if and only if their in-degree is one. As we remove commits from this priority queue, we decrement the in-degree of their parents. 3. While we are peeling commits for output, get_revision_1() uses pop_commit on the full list of commits computed by sort_in_topological_order(). In the new algorithm, these three steps correspond to three different commit walks. We run these walks simultaneously, and advance each only as far as necessary to satisfy the requirements of the 'higher order' walk. We know when we can pause each walk by using generation numbers from the commit- graph feature. Recall that the generation number of a commit satisfies: * If the commit has at least one parent, then the generation number is one more than the maximum generation number among its parents. * If the commit has no parent, then the generation number is one. There are two special generation numbers: * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and indicates that the commit is not stored in the commit-graph and the generation number was not previously calculated. * GENERATION_NUMBER_ZERO: this value (0) is a special indicator to say that the commit-graph was generated by a version of Git that does not compute generation numbers (such as v2.18.0). Since we use generation_numbers_enabled() before using the new algorithm, we do not need to worry about GENERATION_NUMBER_ZERO. However, the existence of GENERATION_NUMBER_INFINITY implies the following weaker statement than the usual we expect from generation numbers: If A and B are commits with generation numbers gen(A) and gen(B) and gen(A) < gen(B), then A cannot reach B. Thus, we will walk in each of our stages until the "maximum unexpanded generation number" is strictly lower than the generation number of a commit we are about to use. The walks are as follows: 1. EXPLORE: using the explore_queue priority queue (ordered by maximizing the generation number), parse each reachable commit until all commits in the queue have generation number strictly lower than needed. During this walk, update the UNINTERESTING flags as necessary. 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since we walk in order of decreasing generation number, we know that discovering an in-degree value of 0 means the value for that commit was not initialized, so should be initialized to two. (Recall that in-degree value "1" is what we use to say a commit is ready for output.) As we iterate the parents of a commit during this walk, ensure the EXPLORE walk has walked beyond their generation numbers. 3. TOPO: using the topo_queue priority queue (ordered based on the sort_order given, which could be commit-date, author- date, or typical topo-order which treats the queue as a LIFO stack), remove a commit from the queue and decrement the in-degree of each parent. If a parent has an in-degree of one, then we add it to the topo_queue. Before we decrement the in-degree, however, ensure the INDEGREE walk has walked beyond that generation number. The implementations of these walks are in the following methods: * explore_walk_step and explore_to_depth * indegree_walk_step and compute_indegrees_to_depth * next_topo_commit and expand_topo_walk These methods have some patterns that may seem strange at first, but they are probably carry-overs from their equivalents in limit_list and sort_in_topological_order. One thing that is missing from this implementation is a proper way to stop walking when the entire queue is UNINTERESTING, so this implementation is not enabled by comparisions, such as in 'git rev-list --topo-order A..B'. This can be updated in the future. In my local testing, I used the following Git commands on the Linux repository in three modes: HEAD~1 with no commit-graph, HEAD~1 with a commit-graph, and HEAD with a commit-graph. This allows comparing the benefits we get from parsing commits from the commit-graph and then again the benefits we get by restricting the set of commits we walk. Test: git rev-list --topo-order -100 HEAD HEAD~1, no commit-graph: 6.80 s HEAD~1, w/ commit-graph: 0.77 s HEAD, w/ commit-graph: 0.02 s Test: git rev-list --topo-order -100 HEAD -- tools HEAD~1, no commit-graph: 9.63 s HEAD~1, w/ commit-graph: 6.06 s HEAD, w/ commit-graph: 0.06 s This speedup is due to a few things. First, the new generation- number-enabled algorithm walks commits on order of the number of results output (subject to some branching structure expectations). Since we limit to 100 results, we are running a query similar to filling a single page of results. Second, when specifying a path, we must parse the root tree object for each commit we walk. The previous benefits from the commit-graph are entirely from reading the commit-graph instead of parsing commits. Since we need to parse trees for the same number of commits as before, we slow down significantly from the non-path-based query. For the test above, I specifically selected a path that is changed frequently, including by merge commits. A less-frequently-changed path (such as 'README') has similar end-to-end time since we need to walk the same number of commits (before determining we do not have 100 hits). However, get the benefit that the output is presented to the user as it is discovered, much the same as a normal 'git log' command (no '--topo-order'). This is an improved user experience, even if the command has the same runtime. Helped-by: Jeff King <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 5284fc5 commit b454241

File tree

3 files changed

+194
-7
lines changed

3 files changed

+194
-7
lines changed

object.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ struct object_array {
5959

6060
/*
6161
* object flag allocation:
62-
* revision.h: 0---------10 2526
62+
* revision.h: 0---------10 25----28
6363
* fetch-pack.c: 01
6464
* negotiator/default.c: 2--5
6565
* walker.c: 0-2
@@ -78,7 +78,7 @@ struct object_array {
7878
* builtin/show-branch.c: 0-------------------------------------------26
7979
* builtin/unpack-objects.c: 2021
8080
*/
81-
#define FLAG_BITS 27
81+
#define FLAG_BITS 29
8282

8383
/*
8484
* The object type is stored in 3 bits.

revision.c

Lines changed: 190 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
#include "argv-array.h"
2727
#include "commit-reach.h"
2828
#include "commit-graph.h"
29+
#include "prio-queue.h"
2930

3031
volatile show_early_output_fn_t show_early_output;
3132

@@ -2895,31 +2896,215 @@ static int mark_uninteresting(const struct object_id *oid,
28952896
return 0;
28962897
}
28972898

2898-
struct topo_walk_info {};
2899+
define_commit_slab(indegree_slab, int);
2900+
define_commit_slab(author_date_slab, timestamp_t);
2901+
2902+
struct topo_walk_info {
2903+
uint32_t min_generation;
2904+
struct prio_queue explore_queue;
2905+
struct prio_queue indegree_queue;
2906+
struct prio_queue topo_queue;
2907+
struct indegree_slab indegree;
2908+
struct author_date_slab author_date;
2909+
};
2910+
2911+
static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag)
2912+
{
2913+
if (c->object.flags & flag)
2914+
return;
2915+
2916+
c->object.flags |= flag;
2917+
prio_queue_put(q, c);
2918+
}
2919+
2920+
static void explore_walk_step(struct rev_info *revs)
2921+
{
2922+
struct topo_walk_info *info = revs->topo_walk_info;
2923+
struct commit_list *p;
2924+
struct commit *c = prio_queue_get(&info->explore_queue);
2925+
2926+
if (!c)
2927+
return;
2928+
2929+
if (parse_commit_gently(c, 1) < 0)
2930+
return;
2931+
2932+
if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
2933+
record_author_date(&info->author_date, c);
2934+
2935+
if (revs->max_age != -1 && (c->date < revs->max_age))
2936+
c->object.flags |= UNINTERESTING;
2937+
2938+
if (process_parents(revs, c, NULL, NULL) < 0)
2939+
return;
2940+
2941+
if (c->object.flags & UNINTERESTING)
2942+
mark_parents_uninteresting(c);
2943+
2944+
for (p = c->parents; p; p = p->next)
2945+
test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
2946+
}
2947+
2948+
static void explore_to_depth(struct rev_info *revs,
2949+
uint32_t gen_cutoff)
2950+
{
2951+
struct topo_walk_info *info = revs->topo_walk_info;
2952+
struct commit *c;
2953+
while ((c = prio_queue_peek(&info->explore_queue)) &&
2954+
c->generation >= gen_cutoff)
2955+
explore_walk_step(revs);
2956+
}
2957+
2958+
static void indegree_walk_step(struct rev_info *revs)
2959+
{
2960+
struct commit_list *p;
2961+
struct topo_walk_info *info = revs->topo_walk_info;
2962+
struct commit *c = prio_queue_get(&info->indegree_queue);
2963+
2964+
if (!c)
2965+
return;
2966+
2967+
if (parse_commit_gently(c, 1) < 0)
2968+
return;
2969+
2970+
explore_to_depth(revs, c->generation);
2971+
2972+
for (p = c->parents; p; p = p->next) {
2973+
struct commit *parent = p->item;
2974+
int *pi = indegree_slab_at(&info->indegree, parent);
2975+
2976+
if (*pi)
2977+
(*pi)++;
2978+
else
2979+
*pi = 2;
2980+
2981+
test_flag_and_insert(&info->indegree_queue, parent, TOPO_WALK_INDEGREE);
2982+
2983+
if (revs->first_parent_only)
2984+
return;
2985+
}
2986+
}
2987+
2988+
static void compute_indegrees_to_depth(struct rev_info *revs,
2989+
uint32_t gen_cutoff)
2990+
{
2991+
struct topo_walk_info *info = revs->topo_walk_info;
2992+
struct commit *c;
2993+
while ((c = prio_queue_peek(&info->indegree_queue)) &&
2994+
c->generation >= gen_cutoff)
2995+
indegree_walk_step(revs);
2996+
}
28992997

29002998
static void init_topo_walk(struct rev_info *revs)
29012999
{
29023000
struct topo_walk_info *info;
3001+
struct commit_list *list;
29033002
revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
29043003
info = revs->topo_walk_info;
29053004
memset(info, 0, sizeof(struct topo_walk_info));
29063005

2907-
limit_list(revs);
2908-
sort_in_topological_order(&revs->commits, revs->sort_order);
3006+
init_indegree_slab(&info->indegree);
3007+
memset(&info->explore_queue, 0, sizeof(info->explore_queue));
3008+
memset(&info->indegree_queue, 0, sizeof(info->indegree_queue));
3009+
memset(&info->topo_queue, 0, sizeof(info->topo_queue));
3010+
3011+
switch (revs->sort_order) {
3012+
default: /* REV_SORT_IN_GRAPH_ORDER */
3013+
info->topo_queue.compare = NULL;
3014+
break;
3015+
case REV_SORT_BY_COMMIT_DATE:
3016+
info->topo_queue.compare = compare_commits_by_commit_date;
3017+
break;
3018+
case REV_SORT_BY_AUTHOR_DATE:
3019+
init_author_date_slab(&info->author_date);
3020+
info->topo_queue.compare = compare_commits_by_author_date;
3021+
info->topo_queue.cb_data = &info->author_date;
3022+
break;
3023+
}
3024+
3025+
info->explore_queue.compare = compare_commits_by_gen_then_commit_date;
3026+
info->indegree_queue.compare = compare_commits_by_gen_then_commit_date;
3027+
3028+
info->min_generation = GENERATION_NUMBER_INFINITY;
3029+
for (list = revs->commits; list; list = list->next) {
3030+
struct commit *c = list->item;
3031+
3032+
if (parse_commit_gently(c, 1))
3033+
continue;
3034+
3035+
test_flag_and_insert(&info->explore_queue, c, TOPO_WALK_EXPLORED);
3036+
test_flag_and_insert(&info->indegree_queue, c, TOPO_WALK_INDEGREE);
3037+
3038+
if (c->generation < info->min_generation)
3039+
info->min_generation = c->generation;
3040+
3041+
*(indegree_slab_at(&info->indegree, c)) = 1;
3042+
3043+
if (revs->sort_order == REV_SORT_BY_AUTHOR_DATE)
3044+
record_author_date(&info->author_date, c);
3045+
}
3046+
compute_indegrees_to_depth(revs, info->min_generation);
3047+
3048+
for (list = revs->commits; list; list = list->next) {
3049+
struct commit *c = list->item;
3050+
3051+
if (*(indegree_slab_at(&info->indegree, c)) == 1)
3052+
prio_queue_put(&info->topo_queue, c);
3053+
}
3054+
3055+
/*
3056+
* This is unfortunate; the initial tips need to be shown
3057+
* in the order given from the revision traversal machinery.
3058+
*/
3059+
if (revs->sort_order == REV_SORT_IN_GRAPH_ORDER)
3060+
prio_queue_reverse(&info->topo_queue);
29093061
}
29103062

29113063
static struct commit *next_topo_commit(struct rev_info *revs)
29123064
{
2913-
return pop_commit(&revs->commits);
3065+
struct commit *c;
3066+
struct topo_walk_info *info = revs->topo_walk_info;
3067+
3068+
/* pop next off of topo_queue */
3069+
c = prio_queue_get(&info->topo_queue);
3070+
3071+
if (c)
3072+
*(indegree_slab_at(&info->indegree, c)) = 0;
3073+
3074+
return c;
29143075
}
29153076

29163077
static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
29173078
{
2918-
if (process_parents(revs, commit, &revs->commits, NULL) < 0) {
3079+
struct commit_list *p;
3080+
struct topo_walk_info *info = revs->topo_walk_info;
3081+
if (process_parents(revs, commit, NULL, NULL) < 0) {
29193082
if (!revs->ignore_missing_links)
29203083
die("Failed to traverse parents of commit %s",
29213084
oid_to_hex(&commit->object.oid));
29223085
}
3086+
3087+
for (p = commit->parents; p; p = p->next) {
3088+
struct commit *parent = p->item;
3089+
int *pi;
3090+
3091+
if (parse_commit_gently(parent, 1) < 0)
3092+
continue;
3093+
3094+
if (parent->generation < info->min_generation) {
3095+
info->min_generation = parent->generation;
3096+
compute_indegrees_to_depth(revs, info->min_generation);
3097+
}
3098+
3099+
pi = indegree_slab_at(&info->indegree, parent);
3100+
3101+
(*pi)--;
3102+
if (*pi == 1)
3103+
prio_queue_put(&info->topo_queue, parent);
3104+
3105+
if (revs->first_parent_only)
3106+
return;
3107+
}
29233108
}
29243109

29253110
int prepare_revision_walk(struct rev_info *revs)

revision.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
#define USER_GIVEN (1u<<25) /* given directly by the user */
2525
#define TRACK_LINEAR (1u<<26)
2626
#define ALL_REV_FLAGS (((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
27+
#define TOPO_WALK_EXPLORED (1u<<27)
28+
#define TOPO_WALK_INDEGREE (1u<<28)
2729

2830
#define DECORATE_SHORT_REFS 1
2931
#define DECORATE_FULL_REFS 2

0 commit comments

Comments
 (0)