Skip to content

Commit 802f878

Browse files
committed
Merge branch 'jk/in-pack-size-measurement'
"git cat-file --batch-check=<format>" is added, primarily to allow on-disk footprint of objects in packfiles (often they are a lot smaller than their true size, when expressed as deltas) to be reported. * jk/in-pack-size-measurement: pack-revindex: radix-sort the revindex pack-revindex: use unsigned to store number of objects cat-file: split --batch input lines on whitespace cat-file: add %(objectsize:disk) format atom cat-file: add --batch-check=<format> cat-file: refactor --batch option parsing cat-file: teach --batch to stream blob objects t1006: modernize output comparisons teach sha1_object_info_extended a "disk_size" query zero-initialize object_info structs
2 parents b12aecd + 8b8dfd5 commit 802f878

File tree

7 files changed

+390
-106
lines changed

7 files changed

+390
-106
lines changed

Documentation/git-cat-file.txt

Lines changed: 68 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,16 @@ OPTIONS
5858
to apply the filter to the content recorded in the index at <path>.
5959

6060
--batch::
61-
Print the SHA-1, type, size, and contents of each object provided on
62-
stdin. May not be combined with any other options or arguments.
61+
--batch=<format>::
62+
Print object information and contents for each object provided
63+
on stdin. May not be combined with any other options or arguments.
64+
See the section `BATCH OUTPUT` below for details.
6365

6466
--batch-check::
65-
Print the SHA-1, type, and size of each object provided on stdin. May not
66-
be combined with any other options or arguments.
67+
--batch-check=<format>::
68+
Print object information for each object provided on stdin. May
69+
not be combined with any other options or arguments. See the
70+
section `BATCH OUTPUT` below for details.
6771

6872
OUTPUT
6973
------
@@ -78,28 +82,81 @@ If '-p' is specified, the contents of <object> are pretty-printed.
7882
If <type> is specified, the raw (though uncompressed) contents of the <object>
7983
will be returned.
8084

81-
If '--batch' is specified, output of the following form is printed for each
82-
object specified on stdin:
85+
BATCH OUTPUT
86+
------------
87+
88+
If `--batch` or `--batch-check` is given, `cat-file` will read objects
89+
from stdin, one per line, and print information about them.
90+
91+
Each line is split at the first whitespace boundary. All characters
92+
before that whitespace are considered as a whole object name, and are
93+
parsed as if given to linkgit:git-rev-parse[1]. Characters after that
94+
whitespace can be accessed using the `%(rest)` atom (see below).
95+
96+
You can specify the information shown for each object by using a custom
97+
`<format>`. The `<format>` is copied literally to stdout for each
98+
object, with placeholders of the form `%(atom)` expanded, followed by a
99+
newline. The available atoms are:
100+
101+
`objectname`::
102+
The 40-hex object name of the object.
103+
104+
`objecttype`::
105+
The type of of the object (the same as `cat-file -t` reports).
106+
107+
`objectsize`::
108+
The size, in bytes, of the object (the same as `cat-file -s`
109+
reports).
110+
111+
`objectsize:disk`::
112+
The size, in bytes, that the object takes up on disk. See the
113+
note about on-disk sizes in the `CAVEATS` section below.
114+
115+
`rest`::
116+
The text (if any) found after the first run of whitespace on the
117+
input line (i.e., the "rest" of the line).
118+
119+
If no format is specified, the default format is `%(objectname)
120+
%(objecttype) %(objectsize)`.
121+
122+
If `--batch` is specified, the object information is followed by the
123+
object contents (consisting of `%(objectsize)` bytes), followed by a
124+
newline.
125+
126+
For example, `--batch` without a custom format would produce:
83127

84128
------------
85129
<sha1> SP <type> SP <size> LF
86130
<contents> LF
87131
------------
88132

89-
If '--batch-check' is specified, output of the following form is printed for
90-
each object specified on stdin:
133+
Whereas `--batch-check='%(objectname) %(objecttype)'` would produce:
91134

92135
------------
93-
<sha1> SP <type> SP <size> LF
136+
<sha1> SP <type> LF
94137
------------
95138

96-
For both '--batch' and '--batch-check', output of the following form is printed
97-
for each object specified on stdin that does not exist in the repository:
139+
If a name is specified on stdin that cannot be resolved to an object in
140+
the repository, then `cat-file` will ignore any custom format and print:
98141

99142
------------
100143
<object> SP missing LF
101144
------------
102145

146+
147+
CAVEATS
148+
-------
149+
150+
Note that the sizes of objects on disk are reported accurately, but care
151+
should be taken in drawing conclusions about which refs or objects are
152+
responsible for disk usage. The size of a packed non-delta object may be
153+
much larger than the size of objects which delta against it, but the
154+
choice of which object is the base and which is the delta is arbitrary
155+
and is subject to change during a repack. Note also that multiple copies
156+
of an object may be present in the object database; in this case, it is
157+
undefined which copy's size will be reported.
158+
159+
103160
GIT
104161
---
105162
Part of the linkgit:git[1] suite

builtin/cat-file.c

Lines changed: 173 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,6 @@
1313
#include "userdiff.h"
1414
#include "streaming.h"
1515

16-
#define BATCH 1
17-
#define BATCH_CHECK 2
18-
1916
static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
2017
{
2118
unsigned char sha1[20];
@@ -117,54 +114,174 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
117114
return 0;
118115
}
119116

120-
static int batch_one_object(const char *obj_name, int print_contents)
121-
{
117+
struct expand_data {
122118
unsigned char sha1[20];
123-
enum object_type type = 0;
119+
enum object_type type;
124120
unsigned long size;
125-
void *contents = NULL;
121+
unsigned long disk_size;
122+
const char *rest;
123+
124+
/*
125+
* If mark_query is true, we do not expand anything, but rather
126+
* just mark the object_info with items we wish to query.
127+
*/
128+
int mark_query;
129+
130+
/*
131+
* After a mark_query run, this object_info is set up to be
132+
* passed to sha1_object_info_extended. It will point to the data
133+
* elements above, so you can retrieve the response from there.
134+
*/
135+
struct object_info info;
136+
};
137+
138+
static int is_atom(const char *atom, const char *s, int slen)
139+
{
140+
int alen = strlen(atom);
141+
return alen == slen && !memcmp(atom, s, alen);
142+
}
143+
144+
static void expand_atom(struct strbuf *sb, const char *atom, int len,
145+
void *vdata)
146+
{
147+
struct expand_data *data = vdata;
148+
149+
if (is_atom("objectname", atom, len)) {
150+
if (!data->mark_query)
151+
strbuf_addstr(sb, sha1_to_hex(data->sha1));
152+
} else if (is_atom("objecttype", atom, len)) {
153+
if (!data->mark_query)
154+
strbuf_addstr(sb, typename(data->type));
155+
} else if (is_atom("objectsize", atom, len)) {
156+
if (data->mark_query)
157+
data->info.sizep = &data->size;
158+
else
159+
strbuf_addf(sb, "%lu", data->size);
160+
} else if (is_atom("objectsize:disk", atom, len)) {
161+
if (data->mark_query)
162+
data->info.disk_sizep = &data->disk_size;
163+
else
164+
strbuf_addf(sb, "%lu", data->disk_size);
165+
} else if (is_atom("rest", atom, len)) {
166+
if (!data->mark_query && data->rest)
167+
strbuf_addstr(sb, data->rest);
168+
} else
169+
die("unknown format element: %.*s", len, atom);
170+
}
171+
172+
static size_t expand_format(struct strbuf *sb, const char *start, void *data)
173+
{
174+
const char *end;
175+
176+
if (*start != '(')
177+
return 0;
178+
end = strchr(start + 1, ')');
179+
if (!end)
180+
die("format element '%s' does not end in ')'", start);
181+
182+
expand_atom(sb, start + 1, end - start - 1, data);
183+
184+
return end - start + 1;
185+
}
186+
187+
static void print_object_or_die(int fd, const unsigned char *sha1,
188+
enum object_type type, unsigned long size)
189+
{
190+
if (type == OBJ_BLOB) {
191+
if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
192+
die("unable to stream %s to stdout", sha1_to_hex(sha1));
193+
}
194+
else {
195+
enum object_type rtype;
196+
unsigned long rsize;
197+
void *contents;
198+
199+
contents = read_sha1_file(sha1, &rtype, &rsize);
200+
if (!contents)
201+
die("object %s disappeared", sha1_to_hex(sha1));
202+
if (rtype != type)
203+
die("object %s changed type!?", sha1_to_hex(sha1));
204+
if (rsize != size)
205+
die("object %s change size!?", sha1_to_hex(sha1));
206+
207+
write_or_die(fd, contents, size);
208+
free(contents);
209+
}
210+
}
211+
212+
struct batch_options {
213+
int enabled;
214+
int print_contents;
215+
const char *format;
216+
};
217+
218+
static int batch_one_object(const char *obj_name, struct batch_options *opt,
219+
struct expand_data *data)
220+
{
221+
struct strbuf buf = STRBUF_INIT;
126222

127223
if (!obj_name)
128224
return 1;
129225

130-
if (get_sha1(obj_name, sha1)) {
226+
if (get_sha1(obj_name, data->sha1)) {
131227
printf("%s missing\n", obj_name);
132228
fflush(stdout);
133229
return 0;
134230
}
135231

136-
if (print_contents == BATCH)
137-
contents = read_sha1_file(sha1, &type, &size);
138-
else
139-
type = sha1_object_info(sha1, &size);
140-
141-
if (type <= 0) {
232+
data->type = sha1_object_info_extended(data->sha1, &data->info);
233+
if (data->type <= 0) {
142234
printf("%s missing\n", obj_name);
143235
fflush(stdout);
144-
if (print_contents == BATCH)
145-
free(contents);
146236
return 0;
147237
}
148238

149-
printf("%s %s %lu\n", sha1_to_hex(sha1), typename(type), size);
150-
fflush(stdout);
239+
strbuf_expand(&buf, opt->format, expand_format, data);
240+
strbuf_addch(&buf, '\n');
241+
write_or_die(1, buf.buf, buf.len);
242+
strbuf_release(&buf);
151243

152-
if (print_contents == BATCH) {
153-
write_or_die(1, contents, size);
154-
printf("\n");
155-
fflush(stdout);
156-
free(contents);
244+
if (opt->print_contents) {
245+
print_object_or_die(1, data->sha1, data->type, data->size);
246+
write_or_die(1, "\n", 1);
157247
}
158-
159248
return 0;
160249
}
161250

162-
static int batch_objects(int print_contents)
251+
static int batch_objects(struct batch_options *opt)
163252
{
164253
struct strbuf buf = STRBUF_INIT;
254+
struct expand_data data;
255+
256+
if (!opt->format)
257+
opt->format = "%(objectname) %(objecttype) %(objectsize)";
258+
259+
/*
260+
* Expand once with our special mark_query flag, which will prime the
261+
* object_info to be handed to sha1_object_info_extended for each
262+
* object.
263+
*/
264+
memset(&data, 0, sizeof(data));
265+
data.mark_query = 1;
266+
strbuf_expand(&buf, opt->format, expand_format, &data);
267+
data.mark_query = 0;
165268

166269
while (strbuf_getline(&buf, stdin, '\n') != EOF) {
167-
int error = batch_one_object(buf.buf, print_contents);
270+
char *p;
271+
int error;
272+
273+
/*
274+
* Split at first whitespace, tying off the beginning of the
275+
* string and saving the remainder (or NULL) in data.rest.
276+
*/
277+
p = strpbrk(buf.buf, " \t");
278+
if (p) {
279+
while (*p && strchr(" \t", *p))
280+
*p++ = '\0';
281+
}
282+
data.rest = p;
283+
284+
error = batch_one_object(buf.buf, opt, &data);
168285
if (error)
169286
return error;
170287
}
@@ -186,10 +303,29 @@ static int git_cat_file_config(const char *var, const char *value, void *cb)
186303
return git_default_config(var, value, cb);
187304
}
188305

306+
static int batch_option_callback(const struct option *opt,
307+
const char *arg,
308+
int unset)
309+
{
310+
struct batch_options *bo = opt->value;
311+
312+
if (unset) {
313+
memset(bo, 0, sizeof(*bo));
314+
return 0;
315+
}
316+
317+
bo->enabled = 1;
318+
bo->print_contents = !strcmp(opt->long_name, "batch");
319+
bo->format = arg;
320+
321+
return 0;
322+
}
323+
189324
int cmd_cat_file(int argc, const char **argv, const char *prefix)
190325
{
191-
int opt = 0, batch = 0;
326+
int opt = 0;
192327
const char *exp_type = NULL, *obj_name = NULL;
328+
struct batch_options batch = {0};
193329

194330
const struct option options[] = {
195331
OPT_GROUP(N_("<type> can be one of: blob, tree, commit, tag")),
@@ -200,12 +336,12 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
200336
OPT_SET_INT('p', NULL, &opt, N_("pretty-print object's content"), 'p'),
201337
OPT_SET_INT(0, "textconv", &opt,
202338
N_("for blob objects, run textconv on object's content"), 'c'),
203-
OPT_SET_INT(0, "batch", &batch,
204-
N_("show info and content of objects fed from the standard input"),
205-
BATCH),
206-
OPT_SET_INT(0, "batch-check", &batch,
207-
N_("show info about objects fed from the standard input"),
208-
BATCH_CHECK),
339+
{ OPTION_CALLBACK, 0, "batch", &batch, "format",
340+
N_("show info and content of objects fed from the standard input"),
341+
PARSE_OPT_OPTARG, batch_option_callback },
342+
{ OPTION_CALLBACK, 0, "batch-check", &batch, "format",
343+
N_("show info about objects fed from the standard input"),
344+
PARSE_OPT_OPTARG, batch_option_callback },
209345
OPT_END()
210346
};
211347

@@ -222,19 +358,19 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
222358
else
223359
usage_with_options(cat_file_usage, options);
224360
}
225-
if (!opt && !batch) {
361+
if (!opt && !batch.enabled) {
226362
if (argc == 2) {
227363
exp_type = argv[0];
228364
obj_name = argv[1];
229365
} else
230366
usage_with_options(cat_file_usage, options);
231367
}
232-
if (batch && (opt || argc)) {
368+
if (batch.enabled && (opt || argc)) {
233369
usage_with_options(cat_file_usage, options);
234370
}
235371

236-
if (batch)
237-
return batch_objects(batch);
372+
if (batch.enabled)
373+
return batch_objects(&batch);
238374

239375
return cat_one_file(opt, exp_type, obj_name);
240376
}

cache.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1130,6 +1130,7 @@ extern int unpack_object_header(struct packed_git *, struct pack_window **, off_
11301130
struct object_info {
11311131
/* Request */
11321132
unsigned long *sizep;
1133+
unsigned long *disk_sizep;
11331134

11341135
/* Response */
11351136
enum {

0 commit comments

Comments
 (0)