Skip to content

Commit be5c9fb

Browse files
peffgitster
authored andcommitted
logmsg_reencode: lazily load missing commit buffers
Usually a commit that makes it to logmsg_reencode will have been parsed, and the commit->buffer struct member will be valid. However, some code paths will free commit buffers after having used them (for example, the log traversal machinery will do so to keep memory usage down). Most of the time this is fine; log should only show a commit once, and then exits. However, there are some code paths where this does not work. At least two are known: 1. A commit may be shown as part of a regular ref, and then it may be shown again as part of a submodule diff (e.g., if a repo contains refs to both the superproject and subproject). 2. A notes-cache commit may be shown during "log --all", and then later used to access a textconv cache during a diff. Lazily loading in logmsg_reencode does not necessarily catch all such cases, but it should catch most of them. Users of the commit buffer tend to be either parsing for structure (in which they will call parse_commit, and either we will already have parsed, or we will load commit->buffer lazily there), or outputting (either to the user, or fetching a part of the commit message via format_commit_message). In the latter case, we should always be using logmsg_reencode anyway (and typically we do so via the pretty-print machinery). If there are any cases that this misses, we can fix them up to use logmsg_reencode (or handle them on a case-by-case basis if that is inappropriate). Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent dd0d388 commit be5c9fb

File tree

3 files changed

+57
-21
lines changed

3 files changed

+57
-21
lines changed

builtin/blame.c

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1424,19 +1424,6 @@ static void get_commit_info(struct commit *commit,
14241424

14251425
commit_info_init(ret);
14261426

1427-
/*
1428-
* We've operated without save_commit_buffer, so
1429-
* we now need to populate them for output.
1430-
*/
1431-
if (!commit->buffer) {
1432-
enum object_type type;
1433-
unsigned long size;
1434-
commit->buffer =
1435-
read_sha1_file(commit->object.sha1, &type, &size);
1436-
if (!commit->buffer)
1437-
die("Cannot read commit %s",
1438-
sha1_to_hex(commit->object.sha1));
1439-
}
14401427
encoding = get_log_output_encoding();
14411428
message = logmsg_reencode(commit, encoding);
14421429
get_ac_line(message, "\nauthor ",

pretty.c

Lines changed: 49 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -592,18 +592,59 @@ char *logmsg_reencode(const struct commit *commit,
592592
char *msg = commit->buffer;
593593
char *out;
594594

595+
if (!msg) {
596+
enum object_type type;
597+
unsigned long size;
598+
599+
msg = read_sha1_file(commit->object.sha1, &type, &size);
600+
if (!msg)
601+
die("Cannot read commit object %s",
602+
sha1_to_hex(commit->object.sha1));
603+
if (type != OBJ_COMMIT)
604+
die("Expected commit for '%s', got %s",
605+
sha1_to_hex(commit->object.sha1), typename(type));
606+
}
607+
595608
if (!output_encoding || !*output_encoding)
596609
return msg;
597610
encoding = get_header(commit, msg, "encoding");
598611
use_encoding = encoding ? encoding : utf8;
599-
if (same_encoding(use_encoding, output_encoding))
600-
if (encoding) /* we'll strip encoding header later */
601-
out = xstrdup(commit->buffer);
602-
else
603-
return msg; /* nothing to do */
604-
else
605-
out = reencode_string(commit->buffer,
606-
output_encoding, use_encoding);
612+
if (same_encoding(use_encoding, output_encoding)) {
613+
/*
614+
* No encoding work to be done. If we have no encoding header
615+
* at all, then there's nothing to do, and we can return the
616+
* message verbatim (whether newly allocated or not).
617+
*/
618+
if (!encoding)
619+
return msg;
620+
621+
/*
622+
* Otherwise, we still want to munge the encoding header in the
623+
* result, which will be done by modifying the buffer. If we
624+
* are using a fresh copy, we can reuse it. But if we are using
625+
* the cached copy from commit->buffer, we need to duplicate it
626+
* to avoid munging commit->buffer.
627+
*/
628+
out = msg;
629+
if (out == commit->buffer)
630+
out = xstrdup(out);
631+
}
632+
else {
633+
/*
634+
* There's actual encoding work to do. Do the reencoding, which
635+
* still leaves the header to be replaced in the next step. At
636+
* this point, we are done with msg. If we allocated a fresh
637+
* copy, we can free it.
638+
*/
639+
out = reencode_string(msg, output_encoding, use_encoding);
640+
if (out && msg != commit->buffer)
641+
free(msg);
642+
}
643+
644+
/*
645+
* This replacement actually consumes the buffer we hand it, so we do
646+
* not have to worry about freeing the old "out" here.
647+
*/
607648
if (out)
608649
out = replace_encoding_header(out, output_encoding);
609650

t/t4042-diff-textconv-caching.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,4 +106,12 @@ test_expect_success 'switching diff driver produces correct results' '
106106
test_cmp expect actual
107107
'
108108

109+
# The point here is to test that we can log the notes cache and still use it to
110+
# produce a diff later (older versions of git would segfault on this). It's
111+
# much more likely to come up in the real world with "log --all -p", but using
112+
# --no-walk lets us reliably reproduce the order of traversal.
113+
test_expect_success 'log notes cache and still use cache for -p' '
114+
git log --no-walk -p refs/notes/textconv/magic HEAD
115+
'
116+
109117
test_done

0 commit comments

Comments
 (0)