Skip to content

Commit 748af44

Browse files
npitregitster
authored andcommitted
sha1_file: be paranoid when creating loose objects
We don't want the data being deflated and stored into loose objects to be different from what we expect. While the deflated data is protected by a CRC which is good enough for safe data retrieval operations, we still want to be doubly sure that the source data used at object creation time is still what we expected once that data has been deflated and its CRC32 computed. The most plausible data corruption may occur if the source file is modified while Git is deflating and writing it out in a loose object. Or Git itself could have a bug causing memory corruption. Or even bad RAM could cause trouble. So it is best to make sure everything is coherent and checksum protected from beginning to end. To do so we compute the SHA1 of the data being deflated _after_ the deflate operation has consumed that data, and make sure it matches with the expected SHA1. This way we can rely on the CRC32 checked by the inflate operation to provide a good indication that the data is still coherent with its SHA1 hash. One pathological case we ignore is when the data is modified before (or during) deflate call, but changed back before it is hashed. There is some overhead of course. Using 'git add' on a set of large files: Before: real 0m25.210s user 0m23.783s sys 0m1.408s After: real 0m26.537s user 0m25.175s sys 0m1.358s The overhead is around 5% for full data coherency guarantee. Signed-off-by: Nicolas Pitre <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 9892beb commit 748af44

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

sha1_file.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2283,6 +2283,8 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
22832283
int fd, ret;
22842284
unsigned char compressed[4096];
22852285
z_stream stream;
2286+
git_SHA_CTX c;
2287+
unsigned char parano_sha1[20];
22862288
char *filename;
22872289
static char tmpfile[PATH_MAX];
22882290

@@ -2302,18 +2304,22 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
23022304
deflateInit(&stream, zlib_compression_level);
23032305
stream.next_out = compressed;
23042306
stream.avail_out = sizeof(compressed);
2307+
git_SHA1_Init(&c);
23052308

23062309
/* First header.. */
23072310
stream.next_in = (unsigned char *)hdr;
23082311
stream.avail_in = hdrlen;
23092312
while (deflate(&stream, 0) == Z_OK)
23102313
/* nothing */;
2314+
git_SHA1_Update(&c, hdr, hdrlen);
23112315

23122316
/* Then the data itself.. */
23132317
stream.next_in = buf;
23142318
stream.avail_in = len;
23152319
do {
2320+
unsigned char *in0 = stream.next_in;
23162321
ret = deflate(&stream, Z_FINISH);
2322+
git_SHA1_Update(&c, in0, stream.next_in - in0);
23172323
if (write_buffer(fd, compressed, stream.next_out - compressed) < 0)
23182324
die("unable to write sha1 file");
23192325
stream.next_out = compressed;
@@ -2325,6 +2331,9 @@ static int write_loose_object(const unsigned char *sha1, char *hdr, int hdrlen,
23252331
ret = deflateEnd(&stream);
23262332
if (ret != Z_OK)
23272333
die("deflateEnd on object %s failed (%d)", sha1_to_hex(sha1), ret);
2334+
git_SHA1_Final(parano_sha1, &c);
2335+
if (hashcmp(sha1, parano_sha1) != 0)
2336+
die("confused by unstable object source data for %s", sha1_to_hex(sha1));
23282337

23292338
close_sha1_file(fd);
23302339

0 commit comments

Comments
 (0)