Skip to content

Commit 440c705

Browse files
john-caigitster
authored andcommitted
cat-file: add --batch-command mode
Add a new flag --batch-command that accepts commands and arguments from stdin, similar to git-update-ref --stdin. At GitLab, we use a pair of long running cat-file processes when accessing object content. One for iterating over object metadata with --batch-check, and the other to grab object contents with --batch. However, if we had --batch-command, we wouldn't need to keep both processes around, and instead just have one --batch-command process where we can flip between getting object info, and getting object contents. Since we have a pair of cat-file processes per repository, this means we can get rid of roughly half of long lived git cat-file processes. Given there are many repositories being accessed at any given time, this can lead to huge savings. git cat-file --batch-command will enter an interactive command mode whereby the user can enter in commands and their arguments that get queued in memory: <command1> [arg1] [arg2] LF <command2> [arg1] [arg2] LF When --buffer mode is used, commands will be queued in memory until a flush command is issued that execute them: flush LF The reason for a flush command is that when a consumer process (A) talks to a git cat-file process (B) and interactively writes to and reads from it in --buffer mode, (A) needs to be able to control when the buffer is flushed to stdout. Currently, from (A)'s perspective, the only way is to either 1. kill (B)'s process 2. send an invalid object to stdin. 1. is not ideal from a performance perspective as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution. With this mechanism of queueing up commands and letting (A) issue a flush command, process (A) can control when the buffer is flushed and can guarantee it will receive all of the output when in --buffer mode. --batch-command also will not allow (B) to flush to stdout until a flush is received. This patch adds the basic structure for adding command which can be extended in the future to add more commands. It also adds the following two commands (on top of the flush command): contents <object> LF info <object> LF The contents command takes an <object> argument and prints out the object contents. The info command takes an <object> argument and prints out the object metadata. These can be used in the following way with --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF flush LF info <object> LF flush LF When used without --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF info <object> LF Helped-by: Ævar Arnfjörð Bjarmason <[email protected]> Signed-off-by: John Cai <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 4cf5d53 commit 440c705

File tree

3 files changed

+300
-6
lines changed

3 files changed

+300
-6
lines changed

Documentation/git-cat-file.txt

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,33 @@ OPTIONS
9696
need to specify the path, separated by whitespace. See the
9797
section `BATCH OUTPUT` below for details.
9898

99+
--batch-command::
100+
--batch-command=<format>::
101+
Enter a command mode that reads commands and arguments from stdin. May
102+
only be combined with `--buffer`, `--textconv` or `--filters`. In the
103+
case of `--textconv` or `--filters`, the input lines also need to specify
104+
the path, separated by whitespace. See the section `BATCH OUTPUT` below
105+
for details.
106+
+
107+
`--batch-command` recognizes the following commands:
108+
+
109+
--
110+
contents <object>::
111+
Print object contents for object reference `<object>`. This corresponds to
112+
the output of `--batch`.
113+
114+
info <object>::
115+
Print object info for object reference `<object>`. This corresponds to the
116+
output of `--batch-check`.
117+
118+
flush::
119+
Used with `--buffer` to execute all preceding commands that were issued
120+
since the beginning or since the last flush was issued. When `--buffer`
121+
is used, no output will come until a `flush` is issued. When `--buffer`
122+
is not used, commands are flushed each time without issuing `flush`.
123+
--
124+
+
125+
99126
--batch-all-objects::
100127
Instead of reading a list of objects on stdin, perform the
101128
requested batch operation on all objects in the repository and
@@ -110,7 +137,7 @@ OPTIONS
110137
that a process can interactively read and write from
111138
`cat-file`. With this option, the output uses normal stdio
112139
buffering; this is much more efficient when invoking
113-
`--batch-check` on a large number of objects.
140+
`--batch-check` or `--batch-command` on a large number of objects.
114141

115142
--unordered::
116143
When `--batch-all-objects` is in use, visit objects in an
@@ -202,6 +229,13 @@ from stdin, one per line, and print information about them. By default,
202229
the whole line is considered as an object, as if it were fed to
203230
linkgit:git-rev-parse[1].
204231

232+
When `--batch-command` is given, `cat-file` will read commands from stdin,
233+
one per line, and print information based on the command given. With
234+
`--batch-command`, the `info` command followed by an object will print
235+
information about the object the same way `--batch-check` would, and the
236+
`contents` command followed by an object prints contents in the same way
237+
`--batch` would.
238+
205239
You can specify the information shown for each object by using a custom
206240
`<format>`. The `<format>` is copied literally to stdout for each
207241
object, with placeholders of the form `%(atom)` expanded, followed by a
@@ -237,9 +271,9 @@ newline. The available atoms are:
237271
If no format is specified, the default format is `%(objectname)
238272
%(objecttype) %(objectsize)`.
239273

240-
If `--batch` is specified, the object information is followed by the
241-
object contents (consisting of `%(objectsize)` bytes), followed by a
242-
newline.
274+
If `--batch` is specified, or if `--batch-command` is used with the `contents`
275+
command, the object information is followed by the object contents (consisting
276+
of `%(objectsize)` bytes), followed by a newline.
243277

244278
For example, `--batch` without a custom format would produce:
245279

builtin/cat-file.c

Lines changed: 143 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
enum batch_mode {
2121
BATCH_MODE_CONTENTS,
2222
BATCH_MODE_INFO,
23+
BATCH_MODE_QUEUE_AND_DISPATCH,
2324
};
2425

2526
struct batch_options {
@@ -513,6 +514,135 @@ static int batch_unordered_packed(const struct object_id *oid,
513514
data);
514515
}
515516

517+
typedef void (*parse_cmd_fn_t)(struct batch_options *, const char *,
518+
struct strbuf *, struct expand_data *);
519+
520+
struct queued_cmd {
521+
parse_cmd_fn_t fn;
522+
char *line;
523+
};
524+
525+
static void parse_cmd_contents(struct batch_options *opt,
526+
const char *line,
527+
struct strbuf *output,
528+
struct expand_data *data)
529+
{
530+
opt->batch_mode = BATCH_MODE_CONTENTS;
531+
batch_one_object(line, output, opt, data);
532+
}
533+
534+
static void parse_cmd_info(struct batch_options *opt,
535+
const char *line,
536+
struct strbuf *output,
537+
struct expand_data *data)
538+
{
539+
opt->batch_mode = BATCH_MODE_INFO;
540+
batch_one_object(line, output, opt, data);
541+
}
542+
543+
static void dispatch_calls(struct batch_options *opt,
544+
struct strbuf *output,
545+
struct expand_data *data,
546+
struct queued_cmd *cmd,
547+
int nr)
548+
{
549+
int i;
550+
551+
if (!opt->buffer_output)
552+
die(_("flush is only for --buffer mode"));
553+
554+
for (i = 0; i < nr; i++)
555+
cmd[i].fn(opt, cmd[i].line, output, data);
556+
557+
fflush(stdout);
558+
}
559+
560+
static void free_cmds(struct queued_cmd *cmd, size_t *nr)
561+
{
562+
size_t i;
563+
564+
for (i = 0; i < *nr; i++)
565+
FREE_AND_NULL(cmd[i].line);
566+
567+
*nr = 0;
568+
}
569+
570+
571+
static const struct parse_cmd {
572+
const char *name;
573+
parse_cmd_fn_t fn;
574+
unsigned takes_args;
575+
} commands[] = {
576+
{ "contents", parse_cmd_contents, 1},
577+
{ "info", parse_cmd_info, 1},
578+
{ "flush", NULL, 0},
579+
};
580+
581+
static void batch_objects_command(struct batch_options *opt,
582+
struct strbuf *output,
583+
struct expand_data *data)
584+
{
585+
struct strbuf input = STRBUF_INIT;
586+
struct queued_cmd *queued_cmd = NULL;
587+
size_t alloc = 0, nr = 0;
588+
589+
while (!strbuf_getline(&input, stdin)) {
590+
int i;
591+
const struct parse_cmd *cmd = NULL;
592+
const char *p = NULL, *cmd_end;
593+
struct queued_cmd call = {0};
594+
595+
if (!input.len)
596+
die(_("empty command in input"));
597+
if (isspace(*input.buf))
598+
die(_("whitespace before command: '%s'"), input.buf);
599+
600+
for (i = 0; i < ARRAY_SIZE(commands); i++) {
601+
if (!skip_prefix(input.buf, commands[i].name, &cmd_end))
602+
continue;
603+
604+
cmd = &commands[i];
605+
if (cmd->takes_args) {
606+
if (*cmd_end != ' ')
607+
die(_("%s requires arguments"),
608+
commands[i].name);
609+
610+
p = cmd_end + 1;
611+
} else if (*cmd_end) {
612+
die(_("%s takes no arguments"),
613+
commands[i].name);
614+
}
615+
616+
break;
617+
}
618+
619+
if (!cmd)
620+
die(_("unknown command: '%s'"), input.buf);
621+
622+
if (!strcmp(cmd->name, "flush")) {
623+
dispatch_calls(opt, output, data, queued_cmd, nr);
624+
free_cmds(queued_cmd, &nr);
625+
} else if (!opt->buffer_output) {
626+
cmd->fn(opt, p, output, data);
627+
} else {
628+
ALLOC_GROW(queued_cmd, nr + 1, alloc);
629+
call.fn = cmd->fn;
630+
call.line = xstrdup_or_null(p);
631+
queued_cmd[nr++] = call;
632+
}
633+
}
634+
635+
if (opt->buffer_output &&
636+
nr &&
637+
!git_env_bool("GIT_TEST_CAT_FILE_NO_FLUSH_ON_EXIT", 0)) {
638+
dispatch_calls(opt, output, data, queued_cmd, nr);
639+
free_cmds(queued_cmd, &nr);
640+
}
641+
642+
free(queued_cmd);
643+
strbuf_release(&input);
644+
}
645+
516646
static int batch_objects(struct batch_options *opt)
517647
{
518648
struct strbuf input = STRBUF_INIT;
@@ -595,6 +725,11 @@ static int batch_objects(struct batch_options *opt)
595725
save_warning = warn_on_object_refname_ambiguity;
596726
warn_on_object_refname_ambiguity = 0;
597727

728+
if (opt->batch_mode == BATCH_MODE_QUEUE_AND_DISPATCH) {
729+
batch_objects_command(opt, &output, &data);
730+
goto cleanup;
731+
}
732+
598733
while (strbuf_getline(&input, stdin) != EOF) {
599734
if (data.split_on_whitespace) {
600735
/*
@@ -613,6 +748,7 @@ static int batch_objects(struct batch_options *opt)
613748
batch_one_object(input.buf, &output, opt, &data);
614749
}
615750

751+
cleanup:
616752
strbuf_release(&input);
617753
strbuf_release(&output);
618754
warn_on_object_refname_ambiguity = save_warning;
@@ -645,6 +781,8 @@ static int batch_option_callback(const struct option *opt,
645781
bo->batch_mode = BATCH_MODE_CONTENTS;
646782
else if (!strcmp(opt->long_name, "batch-check"))
647783
bo->batch_mode = BATCH_MODE_INFO;
784+
else if (!strcmp(opt->long_name, "batch-command"))
785+
bo->batch_mode = BATCH_MODE_QUEUE_AND_DISPATCH;
648786
else
649787
BUG("%s given to batch-option-callback", opt->long_name);
650788

@@ -666,7 +804,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
666804
N_("git cat-file <type> <object>"),
667805
N_("git cat-file (-e | -p) <object>"),
668806
N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
669-
N_("git cat-file (--batch | --batch-check) [--batch-all-objects]\n"
807+
N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
670808
" [--buffer] [--follow-symlinks] [--unordered]\n"
671809
" [--textconv | --filters]"),
672810
N_("git cat-file (--textconv | --filters)\n"
@@ -695,6 +833,10 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
695833
N_("like --batch, but don't emit <contents>"),
696834
PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
697835
batch_option_callback),
836+
OPT_CALLBACK_F(0, "batch-command", &batch, N_("format"),
837+
N_("read commands from stdin"),
838+
PARSE_OPT_OPTARG | PARSE_OPT_NONEG,
839+
batch_option_callback),
698840
OPT_CMDMODE(0, "batch-all-objects", &opt,
699841
N_("with --batch[-check]: ignores stdin, batches all known objects"), 'b'),
700842
/* Batch-specific options */

0 commit comments

Comments
 (0)