Skip to content

Commit 73320e4

Browse files
ttaylorrgitster
authored andcommitted
builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent fe86abd commit 73320e4

File tree

2 files changed

+33
-4
lines changed

2 files changed

+33
-4
lines changed

builtin/repack.c

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,8 @@ static int repack_config(const char *var, const char *value, void *cb)
9595
}
9696

9797
/*
98-
* Adds all packs hex strings to either fname_nonkept_list or
99-
* fname_kept_list based on whether each pack has a corresponding
98+
* Adds all packs hex strings (pack-$HASH) to either fname_nonkept_list
99+
* or fname_kept_list based on whether each pack has a corresponding
100100
* .keep file or not. Packs without a .keep file are not to be kept
101101
* if we are going to pack everything into one file.
102102
*/
@@ -107,6 +107,7 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
107107
DIR *dir;
108108
struct dirent *e;
109109
char *fname;
110+
struct strbuf buf = STRBUF_INIT;
110111

111112
if (!(dir = opendir(packdir)))
112113
return;
@@ -115,11 +116,15 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
115116
size_t len;
116117
int i;
117118

118-
if (!strip_suffix(e->d_name, ".pack", &len))
119+
if (!strip_suffix(e->d_name, ".idx", &len))
119120
continue;
120121

122+
strbuf_reset(&buf);
123+
strbuf_add(&buf, e->d_name, len);
124+
strbuf_addstr(&buf, ".pack");
125+
121126
for (i = 0; i < extra_keep->nr; i++)
122-
if (!fspathcmp(e->d_name, extra_keep->items[i].string))
127+
if (!fspathcmp(buf.buf, extra_keep->items[i].string))
123128
break;
124129

125130
fname = xmemdupz(e->d_name, len);
@@ -136,6 +141,7 @@ static void collect_pack_filenames(struct string_list *fname_nonkept_list,
136141
}
137142
}
138143
closedir(dir);
144+
strbuf_release(&buf);
139145

140146
string_list_sort(fname_kept_list);
141147
}

t/t7700-repack.sh

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ test_description='git repack works correctly'
1010
commit_and_pack () {
1111
test_commit "$@" 1>&2 &&
1212
incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
13+
# Remove any loose object(s) created by test_commit, since they have
14+
# already been packed. Leaving these around can create subtly different
15+
# packs with `pack-objects`'s `--unpacked` option.
16+
git prune-packed 1>&2 &&
1317
echo pack-${incrpackid}.pack
1418
}
1519

@@ -209,6 +213,8 @@ test_expect_success 'repack --keep-pack' '
209213
test_create_repo keep-pack &&
210214
(
211215
cd keep-pack &&
216+
# avoid producing difference packs to delta/base choices
217+
git config pack.window 0 &&
212218
P1=$(commit_and_pack 1) &&
213219
P2=$(commit_and_pack 2) &&
214220
P3=$(commit_and_pack 3) &&
@@ -220,6 +226,23 @@ test_expect_success 'repack --keep-pack' '
220226
grep -q $P1 new-counts &&
221227
grep -q $P4 new-counts &&
222228
test_line_count = 3 new-counts &&
229+
git fsck &&
230+
231+
P5=$(commit_and_pack --no-tag 5) &&
232+
git reset --hard HEAD^ &&
233+
git reflog expire --all --expire=all &&
234+
rm -f ".git/objects/pack/${P5%.pack}.idx" &&
235+
rm -f ".git/objects/info/commit-graph" &&
236+
for from in $(find .git/objects/pack -type f -name "${P5%.pack}.*")
237+
do
238+
to="$(dirname "$from")/.tmp-1234-$(basename "$from")" &&
239+
mv "$from" "$to" || return 1
240+
done &&
241+
242+
git repack --cruft -d --keep-pack $P1 --keep-pack $P4 &&
243+
244+
ls .git/objects/pack/*.pack >newer-counts &&
245+
test_cmp new-counts newer-counts &&
223246
git fsck
224247
)
225248
'

0 commit comments

Comments
 (0)