Skip to content

Commit ae3f36d

Browse files
Barret Rhodengitster
authored andcommitted
blame: add the ability to ignore commits and their changes
Commits that make formatting changes or function renames are often not interesting when blaming a file. A user may deem such a commit as 'not interesting' and want to ignore and its changes it when assigning blame. For example, say a file has the following git history / rev-list: ---O---A---X---B---C---D---Y---E---F Commits X and Y both touch a particular line, and the other commits do not: X: "Take a third parameter" -MyFunc(1, 2); +MyFunc(1, 2, 3); Y: "Remove camelcase" -MyFunc(1, 2, 3); +my_func(1, 2, 3); git-blame will blame Y for the change. I'd like to be able to ignore Y: both the existence of the commit as well as any changes it made. This differs from -S rev-list, which specifies the list of commits to process for the blame. We would still process Y, but just don't let the blame 'stick.' This patch adds the ability for users to ignore a revision with --ignore-rev=rev, which may be repeated. They can specify a set of files of full object names of revs, e.g. SHA-1 hashes, one per line. A single file may be specified with the blame.ignoreRevFile config option or with --ignore-rev-file=file. Both the config option and the command line option may be repeated multiple times. An empty file name "" will clear the list of revs from previously processed files. Config options are processed before command line options. For a typical use case, projects will maintain the file containing revisions for commits that perform mass reformatting, and their users have the option to ignore all of the commits in that file. Additionally, a user can use the --ignore-rev option for one-off investigation. To go back to the example above, X was a substantive change to the function, but not the change the user is interested in. The user inspected X, but wanted to find the previous change to that line - perhaps a commit that introduced that function call. To make this work, we can't simply remove all ignored commits from the rev-list. We need to diff the changes introduced by Y so that we can ignore them. We let the blames get passed to Y, just like when processing normally. When Y is the target, we make sure that Y does not *keep* any blames. Any changes that Y is responsible for get passed to its parent. Note we make one pass through all of the scapegoats (parents) to attempt to pass blame normally; we don't know if we *need* to ignore the commit until we've checked all of the parents. The blame_entry will get passed up the tree until we find a commit that has a diff chunk that affects those lines. One issue is that the ignored commit *did* make some change, and there is no general solution to finding the line in the parent commit that corresponds to a given line in the ignored commit. That makes it hard to attribute a particular line within an ignored commit's diff correctly. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) #include "a.h" commit-b 12) #include "b.h" Commit X, which we will ignore, swaps these lines: commit-X 11) #include "b.h" commit-X 12) #include "a.h" We can pass that blame entry to the parent, but line 11 will be attributed to commit A, even though "include b.h" came from commit B. The blame mechanism will be looking at the parent's view of the file at line number 11. ignore_blame_entry() is set up to allow alternative algorithms for guessing per-line blames. Any line that is not attributed to the parent will continue to be blamed on the ignored commit as if that commit was not ignored. Upcoming patches have the ability to detect these lines and mark them in the blame output. The existing algorithm is simple: blame each line on the corresponding line in the parent's diff chunk. Any lines beyond that stay with the target. For example, the parent of an ignored commit has this, say at line 11: commit-a 11) void new_func_1(void *x, void *y); commit-b 12) void new_func_2(void *x, void *y); commit-c 13) some_line_c commit-d 14) some_line_d After a commit 'X', we have: commit-X 11) void new_func_1(void *x, commit-X 12) void *y); commit-X 13) void new_func_2(void *x, commit-X 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Commit X nets two additionally lines: 13 and 14. The current guess_line_blames() algorithm will not attribute these to the parent, whose diff chunk is only two lines - not four. When we ignore with the current algorithm, we get: commit-a 11) void new_func_1(void *x, commit-b 12) void *y); commit-X 13) void new_func_2(void *x, commit-X 14) void *y); commit-c 15) some_line_c commit-d 16) some_line_d Note that line 12 was blamed on B, though B was the commit for new_func_2(), not new_func_1(). Even when guess_line_blames() finds a line in the parent, it may still be incorrect. Signed-off-by: Barret Rhoden <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 55f808f commit ae3f36d

File tree

7 files changed

+432
-9
lines changed

7 files changed

+432
-9
lines changed

Documentation/blame-options.txt

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,5 +110,19 @@ commit. And the default value is 40. If there are more than one
110110
`-C` options given, the <num> argument of the last `-C` will
111111
take effect.
112112

113+
--ignore-rev <rev>::
114+
Ignore changes made by the revision when assigning blame, as if the
115+
change never happened. Lines that were changed or added by an ignored
116+
commit will be blamed on the previous commit that changed that line or
117+
nearby lines. This option may be specified multiple times to ignore
118+
more than one revision.
119+
120+
--ignore-revs-file <file>::
121+
Ignore revisions listed in `file`, which must be in the same format as an
122+
`fsck.skipList`. This option may be repeated, and these files will be
123+
processed after any files specified with the `blame.ignoreRevsFile` config
124+
option. An empty file name, `""`, will clear the list of revs from
125+
previously processed files.
126+
113127
-h::
114128
Show help message.

Documentation/config/blame.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,10 @@ blame.showEmail::
1919
blame.showRoot::
2020
Do not treat root commits as boundaries in linkgit:git-blame[1].
2121
This option defaults to false.
22+
23+
blame.ignoreRevsFile::
24+
Ignore revisions listed in the file, one unabbreviated object name per
25+
line, in linkgit:git-blame[1]. Whitespace and comments beginning with
26+
`#` are ignored. This option may be repeated multiple times. Empty
27+
file names will reset the list of ignored revisions. This option will
28+
be handled before the command line option `--ignore-revs-file`.

Documentation/git-blame.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ SYNOPSIS
1010
[verse]
1111
'git blame' [-c] [-b] [-l] [--root] [-t] [-f] [-n] [-s] [-e] [-p] [-w] [--incremental]
1212
[-L <range>] [-S <revs-file>] [-M] [-C] [-C] [-C] [--since=<date>]
13+
[--ignore-rev <rev>] [--ignore-revs-file <file>]
1314
[--progress] [--abbrev=<n>] [<rev> | --contents <file> | --reverse <rev>..<rev>]
1415
[--] <file>
1516

blame.c

Lines changed: 167 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -859,6 +859,103 @@ static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
859859
return n;
860860
}
861861

862+
struct blame_line_tracker {
863+
int is_parent;
864+
int s_lno;
865+
};
866+
867+
static int are_lines_adjacent(struct blame_line_tracker *first,
868+
struct blame_line_tracker *second)
869+
{
870+
return first->is_parent == second->is_parent &&
871+
first->s_lno + 1 == second->s_lno;
872+
}
873+
874+
/*
875+
* This cheap heuristic assigns lines in the chunk to their relative location in
876+
* the parent's chunk. Any additional lines are left with the target.
877+
*/
878+
static void guess_line_blames(struct blame_origin *parent,
879+
struct blame_origin *target,
880+
int tlno, int offset, int same, int parent_len,
881+
struct blame_line_tracker *line_blames)
882+
{
883+
int i, best_idx, target_idx;
884+
int parent_slno = tlno + offset;
885+
886+
for (i = 0; i < same - tlno; i++) {
887+
target_idx = tlno + i;
888+
best_idx = target_idx + offset;
889+
if (best_idx < parent_slno + parent_len) {
890+
line_blames[i].is_parent = 1;
891+
line_blames[i].s_lno = best_idx;
892+
} else {
893+
line_blames[i].is_parent = 0;
894+
line_blames[i].s_lno = target_idx;
895+
}
896+
}
897+
}
898+
899+
/*
900+
* This decides which parts of a blame entry go to the parent (added to the
901+
* ignoredp list) and which stay with the target (added to the diffp list). The
902+
* actual decision was made in a separate heuristic function, and those answers
903+
* for the lines in 'e' are in line_blames. This consumes e, essentially
904+
* putting it on a list.
905+
*
906+
* Note that the blame entries on the ignoredp list are not necessarily sorted
907+
* with respect to the parent's line numbers yet.
908+
*/
909+
static void ignore_blame_entry(struct blame_entry *e,
910+
struct blame_origin *parent,
911+
struct blame_origin *target,
912+
struct blame_entry **diffp,
913+
struct blame_entry **ignoredp,
914+
struct blame_line_tracker *line_blames)
915+
{
916+
int entry_len, nr_lines, i;
917+
918+
/*
919+
* We carve new entries off the front of e. Each entry comes from a
920+
* contiguous chunk of lines: adjacent lines from the same origin
921+
* (either the parent or the target).
922+
*/
923+
entry_len = 1;
924+
nr_lines = e->num_lines; /* e changes in the loop */
925+
for (i = 0; i < nr_lines; i++) {
926+
struct blame_entry *next = NULL;
927+
928+
/*
929+
* We are often adjacent to the next line - only split the blame
930+
* entry when we have to.
931+
*/
932+
if (i + 1 < nr_lines) {
933+
if (are_lines_adjacent(&line_blames[i],
934+
&line_blames[i + 1])) {
935+
entry_len++;
936+
continue;
937+
}
938+
next = split_blame_at(e, entry_len,
939+
blame_origin_incref(e->suspect));
940+
}
941+
if (line_blames[i].is_parent) {
942+
blame_origin_decref(e->suspect);
943+
e->suspect = blame_origin_incref(parent);
944+
e->s_lno = line_blames[i - entry_len + 1].s_lno;
945+
e->next = *ignoredp;
946+
*ignoredp = e;
947+
} else {
948+
/* e->s_lno is already in the target's address space. */
949+
e->next = *diffp;
950+
*diffp = e;
951+
}
952+
assert(e->num_lines == entry_len);
953+
e = next;
954+
entry_len = 1;
955+
}
956+
assert(!e);
957+
}
958+
862959
/*
863960
* Process one hunk from the patch between the current suspect for
864961
* blame_entry e and its parent. This first blames any unfinished
@@ -868,13 +965,20 @@ static struct blame_entry *split_blame_at(struct blame_entry *e, int len,
868965
* -C options may lead to overlapping/duplicate source line number
869966
* ranges, all we can rely on from sorting/merging is the order of the
870967
* first suspect line number.
968+
*
969+
* tlno: line number in the target where this chunk begins
970+
* same: line number in the target where this chunk ends
971+
* offset: add to tlno to get the chunk starting point in the parent
972+
* parent_len: number of lines in the parent chunk
871973
*/
872974
static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
873-
int tlno, int offset, int same,
874-
struct blame_origin *parent)
975+
int tlno, int offset, int same, int parent_len,
976+
struct blame_origin *parent,
977+
struct blame_origin *target, int ignore_diffs)
875978
{
876979
struct blame_entry *e = **srcq;
877-
struct blame_entry *samep = NULL, *diffp = NULL;
980+
struct blame_entry *samep = NULL, *diffp = NULL, *ignoredp = NULL;
981+
struct blame_line_tracker *line_blames = NULL;
878982

879983
while (e && e->s_lno < tlno) {
880984
struct blame_entry *next = e->next;
@@ -923,6 +1027,14 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
9231027
*/
9241028
samep = NULL;
9251029
diffp = NULL;
1030+
1031+
if (ignore_diffs && same - tlno > 0) {
1032+
line_blames = xcalloc(sizeof(struct blame_line_tracker),
1033+
same - tlno);
1034+
guess_line_blames(parent, target, tlno, offset, same,
1035+
parent_len, line_blames);
1036+
}
1037+
9261038
while (e && e->s_lno < same) {
9271039
struct blame_entry *next = e->next;
9281040

@@ -942,10 +1054,29 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
9421054
n->next = samep;
9431055
samep = n;
9441056
}
945-
e->next = diffp;
946-
diffp = e;
1057+
if (ignore_diffs) {
1058+
ignore_blame_entry(e, parent, target, &diffp, &ignoredp,
1059+
line_blames + e->s_lno - tlno);
1060+
} else {
1061+
e->next = diffp;
1062+
diffp = e;
1063+
}
9471064
e = next;
9481065
}
1066+
free(line_blames);
1067+
if (ignoredp) {
1068+
/*
1069+
* Note ignoredp is not sorted yet, and thus neither is dstq.
1070+
* That list must be sorted before we queue_blames(). We defer
1071+
* sorting until after all diff hunks are processed, so that
1072+
* guess_line_blames() can pick *any* line in the parent. The
1073+
* slight drawback is that we end up sorting all blame entries
1074+
* passed to the parent, including those that are unrelated to
1075+
* changes made by the ignored commit.
1076+
*/
1077+
**dstq = reverse_blame(ignoredp, **dstq);
1078+
*dstq = &ignoredp->next;
1079+
}
9491080
**srcq = reverse_blame(diffp, reverse_blame(samep, e));
9501081
/* Move across elements that are in the unblamable portion */
9511082
if (diffp)
@@ -954,7 +1085,9 @@ static void blame_chunk(struct blame_entry ***dstq, struct blame_entry ***srcq,
9541085

9551086
struct blame_chunk_cb_data {
9561087
struct blame_origin *parent;
1088+
struct blame_origin *target;
9571089
long offset;
1090+
int ignore_diffs;
9581091
struct blame_entry **dstq;
9591092
struct blame_entry **srcq;
9601093
};
@@ -967,7 +1100,8 @@ static int blame_chunk_cb(long start_a, long count_a,
9671100
if (start_a - start_b != d->offset)
9681101
die("internal error in blame::blame_chunk_cb");
9691102
blame_chunk(&d->dstq, &d->srcq, start_b, start_a - start_b,
970-
start_b + count_b, d->parent);
1103+
start_b + count_b, count_a, d->parent, d->target,
1104+
d->ignore_diffs);
9711105
d->offset = start_a + count_a - (start_b + count_b);
9721106
return 0;
9731107
}
@@ -979,7 +1113,7 @@ static int blame_chunk_cb(long start_a, long count_a,
9791113
*/
9801114
static void pass_blame_to_parent(struct blame_scoreboard *sb,
9811115
struct blame_origin *target,
982-
struct blame_origin *parent)
1116+
struct blame_origin *parent, int ignore_diffs)
9831117
{
9841118
mmfile_t file_p, file_o;
9851119
struct blame_chunk_cb_data d;
@@ -989,7 +1123,9 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
9891123
return; /* nothing remains for this target */
9901124

9911125
d.parent = parent;
1126+
d.target = target;
9921127
d.offset = 0;
1128+
d.ignore_diffs = ignore_diffs;
9931129
d.dstq = &newdest; d.srcq = &target->suspects;
9941130

9951131
fill_origin_blob(&sb->revs->diffopt, parent, &file_p, &sb->num_read_blob);
@@ -1001,8 +1137,13 @@ static void pass_blame_to_parent(struct blame_scoreboard *sb,
10011137
oid_to_hex(&parent->commit->object.oid),
10021138
oid_to_hex(&target->commit->object.oid));
10031139
/* The rest are the same as the parent */
1004-
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, parent);
1140+
blame_chunk(&d.dstq, &d.srcq, INT_MAX, d.offset, INT_MAX, 0,
1141+
parent, target, 0);
10051142
*d.dstq = NULL;
1143+
if (ignore_diffs)
1144+
newdest = llist_mergesort(newdest, get_next_blame,
1145+
set_next_blame,
1146+
compare_blame_suspect);
10061147
queue_blames(sb, parent, newdest);
10071148

10081149
return;
@@ -1506,11 +1647,28 @@ static void pass_blame(struct blame_scoreboard *sb, struct blame_origin *origin,
15061647
blame_origin_incref(porigin);
15071648
origin->previous = porigin;
15081649
}
1509-
pass_blame_to_parent(sb, origin, porigin);
1650+
pass_blame_to_parent(sb, origin, porigin, 0);
15101651
if (!origin->suspects)
15111652
goto finish;
15121653
}
15131654

1655+
/*
1656+
* Pass remaining suspects for ignored commits to their parents.
1657+
*/
1658+
if (oidset_contains(&sb->ignore_list, &commit->object.oid)) {
1659+
for (i = 0, sg = first_scapegoat(revs, commit, sb->reverse);
1660+
i < num_sg && sg;
1661+
sg = sg->next, i++) {
1662+
struct blame_origin *porigin = sg_origin[i];
1663+
1664+
if (!porigin)
1665+
continue;
1666+
pass_blame_to_parent(sb, origin, porigin, 1);
1667+
if (!origin->suspects)
1668+
goto finish;
1669+
}
1670+
}
1671+
15141672
/*
15151673
* Optionally find moves in parents' files.
15161674
*/

blame.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ struct blame_scoreboard {
117117
/* linked list of blames */
118118
struct blame_entry *ent;
119119

120+
struct oidset ignore_list;
121+
120122
/* look-up a line in the final buffer */
121123
int num_lines;
122124
int *lineno;

builtin/blame.c

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ static int no_whole_file_rename;
5252
static int show_progress;
5353
static char repeated_meta_color[COLOR_MAXLEN];
5454
static int coloring_mode;
55+
static struct string_list ignore_revs_file_list = STRING_LIST_INIT_NODUP;
5556

5657
static struct date_mode blame_date_mode = { DATE_ISO8601 };
5758
static size_t blame_date_width;
@@ -695,6 +696,16 @@ static int git_blame_config(const char *var, const char *value, void *cb)
695696
parse_date_format(value, &blame_date_mode);
696697
return 0;
697698
}
699+
if (!strcmp(var, "blame.ignorerevsfile")) {
700+
const char *str;
701+
int ret;
702+
703+
ret = git_config_pathname(&str, var, value);
704+
if (ret)
705+
return ret;
706+
string_list_insert(&ignore_revs_file_list, str);
707+
return 0;
708+
}
698709
if (!strcmp(var, "color.blame.repeatedlines")) {
699710
if (color_parse_mem(value, strlen(value), repeated_meta_color))
700711
warning(_("invalid color '%s' in color.blame.repeatedLines"),
@@ -774,6 +785,27 @@ static int is_a_rev(const char *name)
774785
return OBJ_NONE < oid_object_info(the_repository, &oid, NULL);
775786
}
776787

788+
static void build_ignorelist(struct blame_scoreboard *sb,
789+
struct string_list *ignore_revs_file_list,
790+
struct string_list *ignore_rev_list)
791+
{
792+
struct string_list_item *i;
793+
struct object_id oid;
794+
795+
oidset_init(&sb->ignore_list, 0);
796+
for_each_string_list_item(i, ignore_revs_file_list) {
797+
if (!strcmp(i->string, ""))
798+
oidset_clear(&sb->ignore_list);
799+
else
800+
oidset_parse_file(&sb->ignore_list, i->string);
801+
}
802+
for_each_string_list_item(i, ignore_rev_list) {
803+
if (get_oid_committish(i->string, &oid))
804+
die(_("cannot find revision %s to ignore"), i->string);
805+
oidset_insert(&sb->ignore_list, &oid);
806+
}
807+
}
808+
777809
int cmd_blame(int argc, const char **argv, const char *prefix)
778810
{
779811
struct rev_info revs;
@@ -785,6 +817,7 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
785817
struct progress_info pi = { NULL, 0 };
786818

787819
struct string_list range_list = STRING_LIST_INIT_NODUP;
820+
struct string_list ignore_rev_list = STRING_LIST_INIT_NODUP;
788821
int output_option = 0, opt = 0;
789822
int show_stats = 0;
790823
const char *revs_file = NULL;
@@ -806,6 +839,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
806839
OPT_BIT('s', NULL, &output_option, N_("Suppress author name and timestamp (Default: off)"), OUTPUT_NO_AUTHOR),
807840
OPT_BIT('e', "show-email", &output_option, N_("Show author email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL),
808841
OPT_BIT('w', NULL, &xdl_opts, N_("Ignore whitespace differences"), XDF_IGNORE_WHITESPACE),
842+
OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("Ignore <rev> when blaming")),
843+
OPT_STRING_LIST(0, "ignore-revs-file", &ignore_revs_file_list, N_("file"), N_("Ignore revisions from <file>")),
809844
OPT_BIT(0, "color-lines", &output_option, N_("color redundant metadata from previous line differently"), OUTPUT_COLOR_LINE),
810845
OPT_BIT(0, "color-by-age", &output_option, N_("color lines by age"), OUTPUT_SHOW_AGE_WITH_COLOR),
811846

@@ -995,6 +1030,9 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
9951030
sb.contents_from = contents_from;
9961031
sb.reverse = reverse;
9971032
sb.repo = the_repository;
1033+
build_ignorelist(&sb, &ignore_revs_file_list, &ignore_rev_list);
1034+
string_list_clear(&ignore_revs_file_list, 0);
1035+
string_list_clear(&ignore_rev_list, 0);
9981036
setup_scoreboard(&sb, path, &o);
9991037
lno = sb.num_lines;
10001038

0 commit comments

Comments
 (0)