Skip to content

Commit 2841e8f

Browse files
larsxschneidergitster
authored andcommitted
convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in edcc858 ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 1514c8e commit 2841e8f

File tree

9 files changed

+575
-90
lines changed

9 files changed

+575
-90
lines changed

Documentation/gitattributes.txt

Lines changed: 65 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -425,8 +425,8 @@ packet: git< capability=clean
425425
packet: git< capability=smudge
426426
packet: git< 0000
427427
------------------------
428-
Supported filter capabilities in version 2 are "clean" and
429-
"smudge".
428+
Supported filter capabilities in version 2 are "clean", "smudge",
429+
and "delay".
430430

431431
Afterwards Git sends a list of "key=value" pairs terminated with
432432
a flush packet. The list will contain at least the filter command
@@ -512,12 +512,73 @@ the protocol then Git will stop the filter process and restart it
512512
with the next file that needs to be processed. Depending on the
513513
`filter.<driver>.required` flag Git will interpret that as error.
514514

515-
After the filter has processed a blob it is expected to wait for
516-
the next "key=value" list containing a command. Git will close
515+
After the filter has processed a command it is expected to wait for
516+
a "key=value" list containing the next command. Git will close
517517
the command pipe on exit. The filter is expected to detect EOF
518518
and exit gracefully on its own. Git will wait until the filter
519519
process has stopped.
520520

521+
Delay
522+
^^^^^
523+
524+
If the filter supports the "delay" capability, then Git can send the
525+
flag "can-delay" after the filter command and pathname. This flag
526+
denotes that the filter can delay filtering the current blob (e.g. to
527+
compensate network latencies) by responding with no content but with
528+
the status "delayed" and a flush packet.
529+
------------------------
530+
packet: git> command=smudge
531+
packet: git> pathname=path/testfile.dat
532+
packet: git> can-delay=1
533+
packet: git> 0000
534+
packet: git> CONTENT
535+
packet: git> 0000
536+
packet: git< status=delayed
537+
packet: git< 0000
538+
------------------------
539+
540+
If the filter supports the "delay" capability then it must support the
541+
"list_available_blobs" command. If Git sends this command, then the
542+
filter is expected to return a list of pathnames representing blobs
543+
that have been delayed earlier and are now available.
544+
The list must be terminated with a flush packet followed
545+
by a "success" status that is also terminated with a flush packet. If
546+
no blobs for the delayed paths are available, yet, then the filter is
547+
expected to block the response until at least one blob becomes
548+
available. The filter can tell Git that it has no more delayed blobs
549+
by sending an empty list. As soon as the filter responds with an empty
550+
list, Git stops asking. All blobs that Git has not received at this
551+
point are considered missing and will result in an error.
552+
553+
------------------------
554+
packet: git> command=list_available_blobs
555+
packet: git> 0000
556+
packet: git< pathname=path/testfile.dat
557+
packet: git< pathname=path/otherfile.dat
558+
packet: git< 0000
559+
packet: git< status=success
560+
packet: git< 0000
561+
------------------------
562+
563+
After Git received the pathnames, it will request the corresponding
564+
blobs again. These requests contain a pathname and an empty content
565+
section. The filter is expected to respond with the smudged content
566+
in the usual way as explained above.
567+
------------------------
568+
packet: git> command=smudge
569+
packet: git> pathname=path/testfile.dat
570+
packet: git> 0000
571+
packet: git> 0000 # empty content!
572+
packet: git< status=success
573+
packet: git< 0000
574+
packet: git< SMUDGED_CONTENT
575+
packet: git< 0000
576+
packet: git< 0000 # empty list, keep "status=success" unchanged!
577+
------------------------
578+
579+
Example
580+
^^^^^^^
581+
521582
A long running filter demo implementation can be found in
522583
`contrib/long-running-filter/example.pl` located in the Git
523584
core repository. If you develop your own long running filter

builtin/checkout.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,8 @@ static int checkout_paths(const struct checkout_opts *opts,
376376
state.force = 1;
377377
state.refresh_cache = 1;
378378
state.istate = &the_index;
379+
380+
enable_delayed_checkout(&state);
379381
for (pos = 0; pos < active_nr; pos++) {
380382
struct cache_entry *ce = active_cache[pos];
381383
if (ce->ce_flags & CE_MATCHED) {
@@ -390,6 +392,7 @@ static int checkout_paths(const struct checkout_opts *opts,
390392
pos = skip_same_name(ce, pos) - 1;
391393
}
392394
}
395+
errs |= finish_delayed_checkout(&state);
393396

394397
if (write_locked_index(&the_index, lock_file, COMMIT_LOCK))
395398
die(_("unable to write new index file"));

cache.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1544,6 +1544,7 @@ struct checkout {
15441544
struct index_state *istate;
15451545
const char *base_dir;
15461546
int base_dir_len;
1547+
struct delayed_checkout *delayed_checkout;
15471548
unsigned force:1,
15481549
quiet:1,
15491550
not_new:1,
@@ -1553,6 +1554,8 @@ struct checkout {
15531554

15541555
#define TEMPORARY_FILENAME_LENGTH 25
15551556
extern int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath);
1557+
extern void enable_delayed_checkout(struct checkout *state);
1558+
extern int finish_delayed_checkout(struct checkout *state);
15561559

15571560
struct cache_def {
15581561
struct strbuf path;

convert.c

Lines changed: 94 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,7 @@ static int apply_single_file_filter(const char *path, const char *src, size_t le
496496

497497
#define CAP_CLEAN (1u<<0)
498498
#define CAP_SMUDGE (1u<<1)
499+
#define CAP_DELAY (1u<<2)
499500

500501
struct cmd2process {
501502
struct subprocess_entry subprocess; /* must be the first member! */
@@ -521,6 +522,7 @@ static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
521522
} known_caps[] = {
522523
{ "clean", CAP_CLEAN },
523524
{ "smudge", CAP_SMUDGE },
525+
{ "delay", CAP_DELAY },
524526
};
525527

526528
sigchain_push(SIGPIPE, SIG_IGN);
@@ -605,9 +607,11 @@ static void handle_filter_error(const struct strbuf *filter_status,
605607

606608
static int apply_multi_file_filter(const char *path, const char *src, size_t len,
607609
int fd, struct strbuf *dst, const char *cmd,
608-
const unsigned int wanted_capability)
610+
const unsigned int wanted_capability,
611+
struct delayed_checkout *dco)
609612
{
610613
int err;
614+
int can_delay = 0;
611615
struct cmd2process *entry;
612616
struct child_process *process;
613617
struct strbuf nbuf = STRBUF_INIT;
@@ -662,6 +666,14 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
662666
if (err)
663667
goto done;
664668

669+
if ((entry->supported_capabilities & CAP_DELAY) &&
670+
dco && dco->state == CE_CAN_DELAY) {
671+
can_delay = 1;
672+
err = packet_write_fmt_gently(process->in, "can-delay=1\n");
673+
if (err)
674+
goto done;
675+
}
676+
665677
err = packet_flush_gently(process->in);
666678
if (err)
667679
goto done;
@@ -677,14 +689,73 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
677689
if (err)
678690
goto done;
679691

680-
err = strcmp(filter_status.buf, "success");
692+
if (can_delay && !strcmp(filter_status.buf, "delayed")) {
693+
string_list_insert(&dco->filters, cmd);
694+
string_list_insert(&dco->paths, path);
695+
} else {
696+
/* The filter got the blob and wants to send us a response. */
697+
err = strcmp(filter_status.buf, "success");
698+
if (err)
699+
goto done;
700+
701+
err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
702+
if (err)
703+
goto done;
704+
705+
err = subprocess_read_status(process->out, &filter_status);
706+
if (err)
707+
goto done;
708+
709+
err = strcmp(filter_status.buf, "success");
710+
}
711+
712+
done:
713+
sigchain_pop(SIGPIPE);
714+
715+
if (err)
716+
handle_filter_error(&filter_status, entry, wanted_capability);
717+
else
718+
strbuf_swap(dst, &nbuf);
719+
strbuf_release(&nbuf);
720+
return !err;
721+
}
722+
723+
724+
int async_query_available_blobs(const char *cmd, struct string_list *available_paths)
725+
{
726+
int err;
727+
char *line;
728+
struct cmd2process *entry;
729+
struct child_process *process;
730+
struct strbuf filter_status = STRBUF_INIT;
731+
732+
assert(subprocess_map_initialized);
733+
entry = (struct cmd2process *)subprocess_find_entry(&subprocess_map, cmd);
734+
if (!entry) {
735+
error("external filter '%s' is not available anymore although "
736+
"not all paths have been filtered", cmd);
737+
return 0;
738+
}
739+
process = &entry->subprocess.process;
740+
sigchain_push(SIGPIPE, SIG_IGN);
741+
742+
err = packet_write_fmt_gently(
743+
process->in, "command=list_available_blobs\n");
681744
if (err)
682745
goto done;
683746

684-
err = read_packetized_to_strbuf(process->out, &nbuf) < 0;
747+
err = packet_flush_gently(process->in);
685748
if (err)
686749
goto done;
687750

751+
while ((line = packet_read_line(process->out, NULL))) {
752+
const char *path;
753+
if (skip_prefix(line, "pathname=", &path))
754+
string_list_insert(available_paths, xstrdup(path));
755+
else
756+
; /* ignore unknown keys */
757+
}
758+
688759
err = subprocess_read_status(process->out, &filter_status);
689760
if (err)
690761
goto done;
@@ -695,10 +766,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
695766
sigchain_pop(SIGPIPE);
696767

697768
if (err)
698-
handle_filter_error(&filter_status, entry, wanted_capability);
699-
else
700-
strbuf_swap(dst, &nbuf);
701-
strbuf_release(&nbuf);
769+
handle_filter_error(&filter_status, entry, 0);
702770
return !err;
703771
}
704772

@@ -713,7 +781,8 @@ static struct convert_driver {
713781

714782
static int apply_filter(const char *path, const char *src, size_t len,
715783
int fd, struct strbuf *dst, struct convert_driver *drv,
716-
const unsigned int wanted_capability)
784+
const unsigned int wanted_capability,
785+
struct delayed_checkout *dco)
717786
{
718787
const char *cmd = NULL;
719788

@@ -731,7 +800,8 @@ static int apply_filter(const char *path, const char *src, size_t len,
731800
if (cmd && *cmd)
732801
return apply_single_file_filter(path, src, len, fd, dst, cmd);
733802
else if (drv->process && *drv->process)
734-
return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
803+
return apply_multi_file_filter(path, src, len, fd, dst,
804+
drv->process, wanted_capability, dco);
735805

736806
return 0;
737807
}
@@ -1072,7 +1142,7 @@ int would_convert_to_git_filter_fd(const char *path)
10721142
if (!ca.drv->required)
10731143
return 0;
10741144

1075-
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
1145+
return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL);
10761146
}
10771147

10781148
const char *get_convert_attr_ascii(const char *path)
@@ -1109,7 +1179,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
11091179

11101180
convert_attrs(&ca, path);
11111181

1112-
ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
1182+
ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN, NULL);
11131183
if (!ret && ca.drv && ca.drv->required)
11141184
die("%s: clean filter '%s' failed", path, ca.drv->name);
11151185

@@ -1134,7 +1204,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
11341204
assert(ca.drv);
11351205
assert(ca.drv->clean || ca.drv->process);
11361206

1137-
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
1207+
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN, NULL))
11381208
die("%s: clean filter '%s' failed", path, ca.drv->name);
11391209

11401210
crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -1143,7 +1213,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
11431213

11441214
static int convert_to_working_tree_internal(const char *path, const char *src,
11451215
size_t len, struct strbuf *dst,
1146-
int normalizing)
1216+
int normalizing, struct delayed_checkout *dco)
11471217
{
11481218
int ret = 0, ret_filter = 0;
11491219
struct conv_attrs ca;
@@ -1168,21 +1238,29 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
11681238
}
11691239
}
11701240

1171-
ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
1241+
ret_filter = apply_filter(
1242+
path, src, len, -1, dst, ca.drv, CAP_SMUDGE, dco);
11721243
if (!ret_filter && ca.drv && ca.drv->required)
11731244
die("%s: smudge filter %s failed", path, ca.drv->name);
11741245

11751246
return ret | ret_filter;
11761247
}
11771248

1249+
int async_convert_to_working_tree(const char *path, const char *src,
1250+
size_t len, struct strbuf *dst,
1251+
void *dco)
1252+
{
1253+
return convert_to_working_tree_internal(path, src, len, dst, 0, dco);
1254+
}
1255+
11781256
int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst)
11791257
{
1180-
return convert_to_working_tree_internal(path, src, len, dst, 0);
1258+
return convert_to_working_tree_internal(path, src, len, dst, 0, NULL);
11811259
}
11821260

11831261
int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst)
11841262
{
1185-
int ret = convert_to_working_tree_internal(path, src, len, dst, 1);
1263+
int ret = convert_to_working_tree_internal(path, src, len, dst, 1, NULL);
11861264
if (ret) {
11871265
src = dst->buf;
11881266
len = dst->len;

convert.h

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
#ifndef CONVERT_H
55
#define CONVERT_H
66

7+
#include "string-list.h"
8+
79
enum safe_crlf {
810
SAFE_CRLF_FALSE = 0,
911
SAFE_CRLF_FAIL = 1,
@@ -32,6 +34,26 @@ enum eol {
3234
#endif
3335
};
3436

37+
enum ce_delay_state {
38+
CE_NO_DELAY = 0,
39+
CE_CAN_DELAY = 1,
40+
CE_RETRY = 2
41+
};
42+
43+
struct delayed_checkout {
44+
/*
45+
* State of the currently processed cache entry. If the state is
46+
* CE_CAN_DELAY, then the filter can delay the current cache entry.
47+
* If the state is CE_RETRY, then this signals the filter that the
48+
* cache entry was requested before.
49+
*/
50+
enum ce_delay_state state;
51+
/* List of filter drivers that signaled delayed blobs. */
52+
struct string_list filters;
53+
/* List of delayed blobs identified by their path. */
54+
struct string_list paths;
55+
};
56+
3557
extern enum eol core_eol;
3658
extern const char *get_cached_convert_stats_ascii(const char *path);
3759
extern const char *get_wt_convert_stats_ascii(const char *path);
@@ -42,6 +64,10 @@ extern int convert_to_git(const char *path, const char *src, size_t len,
4264
struct strbuf *dst, enum safe_crlf checksafe);
4365
extern int convert_to_working_tree(const char *path, const char *src,
4466
size_t len, struct strbuf *dst);
67+
extern int async_convert_to_working_tree(const char *path, const char *src,
68+
size_t len, struct strbuf *dst,
69+
void *dco);
70+
extern int async_query_available_blobs(const char *cmd, struct string_list *available_paths);
4571
extern int renormalize_buffer(const char *path, const char *src, size_t len,
4672
struct strbuf *dst);
4773
static inline int would_convert_to_git(const char *path)

0 commit comments

Comments
 (0)