Skip to content

Commit 7a1ec8e

Browse files
committed
Merge branch 'jt/fetch-cdn-offload' into pu
WIP for allowing a response to "git fetch" to instruct the bulk of the pack contents to be instead taken from elsewhere (aka CDN). * jt/fetch-cdn-offload: fixup! upload-pack: refactor reading of pack-objects out SQUASH??? upload-pack: send part of packfile response as uri upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http: improve documentation of http_pack_request http: use --stdin and --keep when downloading pack
2 parents 8cd2f16 + 774f528 commit 7a1ec8e

File tree

13 files changed

+544
-100
lines changed

13 files changed

+544
-100
lines changed

Documentation/git-http-fetch.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ git-http-fetch - Download from a remote Git repository via HTTP
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin] <commit> <url>
12+
'git http-fetch' [-c] [-t] [-a] [-d] [-v] [-w filename] [--recover] [--stdin | --packfile | <commit>] <url>
1313

1414
DESCRIPTION
1515
-----------
@@ -40,6 +40,11 @@ commit-id::
4040

4141
<commit-id>['\t'<filename-as-in--w>]
4242

43+
--packfile::
44+
Instead of a commit id on the command line (which is not expected in
45+
this case), 'git http-fetch' fetches the packfile directly at the given
46+
URL and generates the corresponding .idx file.
47+
4348
--recover::
4449
Verify that everything reachable from target is fetched. Used after
4550
an earlier fetch is interrupted.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Packfile URIs
2+
=============
3+
4+
This feature allows servers to serve part of their packfile response as URIs.
5+
This allows server designs that improve scalability in bandwidth and CPU usage
6+
(for example, by serving some data through a CDN), and (in the future) provides
7+
some measure of resumability to clients.
8+
9+
This feature is available only in protocol version 2.
10+
11+
Protocol
12+
--------
13+
14+
The server advertises `packfile-uris`.
15+
16+
If the client then communicates which protocols (HTTPS, etc.) it supports with
17+
a `packfile-uris` argument, the server MAY send a `packfile-uris` section
18+
directly before the `packfile` section (right after `wanted-refs` if it is
19+
sent) containing URIs of any of the given protocols. The URIs point to
20+
packfiles that use only features that the client has declared that it supports
21+
(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of
22+
this section.
23+
24+
Clients then should understand that the returned packfile could be incomplete,
25+
and that it needs to download all the given URIs before the fetch or clone is
26+
complete.
27+
28+
Server design
29+
-------------
30+
31+
The server can be trivially made compatible with the proposed protocol by
32+
having it advertise `packfile-uris`, tolerating the client sending
33+
`packfile-uris`, and never sending any `packfile-uris` section. But we should
34+
include some sort of non-trivial implementation in the Minimum Viable Product,
35+
at least so that we can test the client.
36+
37+
This is the implementation: a feature, marked experimental, that allows the
38+
server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
39+
<uri>` entries. Whenever the list of objects to be sent is assembled, a blob
40+
with the given sha1 can be replaced by the given URI. This allows, for example,
41+
servers to delegate serving of large blobs to CDNs.
42+
43+
Client design
44+
-------------
45+
46+
While fetching, the client needs to remember the list of URIs and cannot
47+
declare that the fetch is complete until all URIs have been downloaded as
48+
packfiles.
49+
50+
The division of work (initial fetch + additional URIs) introduces convenient
51+
points for resumption of an interrupted clone - such resumption can be done
52+
after the Minimum Viable Product (see "Future work").
53+
54+
The client can inhibit this feature (i.e. refrain from sending the
55+
`packfile-urls` parameter) by passing --no-packfile-urls to `git fetch`.
56+
57+
Future work
58+
-----------
59+
60+
The protocol design allows some evolution of the server and client without any
61+
need for protocol changes, so only a small-scoped design is included here to
62+
form the MVP. For example, the following can be done:
63+
64+
* On the server, a long-running process that takes in entire requests and
65+
outputs a list of URIs and the corresponding inclusion and exclusion sets of
66+
objects. This allows, e.g., signed URIs to be used and packfiles for common
67+
requests to be cached.
68+
* On the client, resumption of clone. If a clone is interrupted, information
69+
could be recorded in the repository's config and a "clone-resume" command
70+
can resume the clone in progress. (Resumption of subsequent fetches is more
71+
difficult because that must deal with the user wanting to use the repository
72+
even after the fetch was interrupted.)
73+
74+
There are some possible features that will require a change in protocol:
75+
76+
* Additional HTTP headers (e.g. authentication)
77+
* Byte range support
78+
* Different file formats referenced by URIs (e.g. raw object)

Documentation/technical/protocol-v2.txt

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -325,11 +325,12 @@ included in the client's request:
325325

326326
The response of `fetch` is broken into a number of sections separated by
327327
delimiter packets (0001), with each section beginning with its section
328-
header.
328+
header. Most sections are sent only when the packfile is sent.
329329

330-
output = *section
331-
section = (acknowledgments | shallow-info | wanted-refs | packfile)
332-
(flush-pkt | delim-pkt)
330+
output = acknowledgements flush-pkt |
331+
[acknowledgments delim-pkt] [shallow-info delim-pkt]
332+
[wanted-refs delim-pkt] [packfile-uris delim-pkt]
333+
packfile flush-pkt
333334

334335
acknowledgments = PKT-LINE("acknowledgments" LF)
335336
(nak | *ack)
@@ -347,13 +348,17 @@ header.
347348
*PKT-LINE(wanted-ref LF)
348349
wanted-ref = obj-id SP refname
349350

351+
packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
352+
packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)
353+
350354
packfile = PKT-LINE("packfile" LF)
351355
*PKT-LINE(%x01-03 *%x00-ff)
352356

353357
acknowledgments section
354-
* If the client determines that it is finished with negotiations
355-
by sending a "done" line, the acknowledgments sections MUST be
356-
omitted from the server's response.
358+
* If the client determines that it is finished with negotiations by
359+
sending a "done" line (thus requiring the server to send a packfile),
360+
the acknowledgments sections MUST be omitted from the server's
361+
response.
357362

358363
* Always begins with the section header "acknowledgments"
359364

@@ -404,9 +409,6 @@ header.
404409
which the client has not indicated was shallow as a part of
405410
its request.
406411

407-
* This section is only included if a packfile section is also
408-
included in the response.
409-
410412
wanted-refs section
411413
* This section is only included if the client has requested a
412414
ref using a 'want-ref' line and if a packfile section is also

builtin/pack-objects.c

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ static unsigned long window_memory_limit = 0;
111111

112112
static struct list_objects_filter_options filter_options;
113113

114+
static struct string_list uri_protocols = STRING_LIST_INIT_NODUP;
115+
114116
enum missing_action {
115117
MA_ERROR = 0, /* fail if any missing objects are encountered */
116118
MA_ALLOW_ANY, /* silently allow ALL missing objects */
@@ -119,6 +121,14 @@ enum missing_action {
119121
static enum missing_action arg_missing_action;
120122
static show_object_fn fn_show_object;
121123

124+
struct configured_exclusion {
125+
struct oidmap_entry e;
126+
char *uri;
127+
};
128+
static struct oidmap configured_exclusions;
129+
130+
static struct oidset excluded_by_config;
131+
122132
/*
123133
* stats
124134
*/
@@ -833,6 +843,23 @@ static off_t write_reused_pack(struct hashfile *f)
833843
return reuse_packfile_offset - sizeof(struct pack_header);
834844
}
835845

846+
static void write_excluded_by_configs(void)
847+
{
848+
struct oidset_iter iter;
849+
const struct object_id *oid;
850+
851+
oidset_iter_init(&excluded_by_config, &iter);
852+
while ((oid = oidset_iter_next(&iter))) {
853+
struct configured_exclusion *ex =
854+
oidmap_get(&configured_exclusions, oid);
855+
856+
if (!ex)
857+
BUG("configured exclusion wasn't configured");
858+
write_in_full(1, ex->uri, strlen(ex->uri));
859+
write_in_full(1, "\n", 1);
860+
}
861+
}
862+
836863
static const char no_split_warning[] = N_(
837864
"disabling bitmap writing, packs are split due to pack.packSizeLimit"
838865
);
@@ -1126,6 +1153,25 @@ static int want_object_in_pack(const struct object_id *oid,
11261153
}
11271154
}
11281155

1156+
if (uri_protocols.nr) {
1157+
struct configured_exclusion *ex =
1158+
oidmap_get(&configured_exclusions, oid);
1159+
int i;
1160+
const char *p;
1161+
1162+
if (ex) {
1163+
for (i = 0; i < uri_protocols.nr; i++) {
1164+
if (skip_prefix(ex->uri,
1165+
uri_protocols.items[i].string,
1166+
&p) &&
1167+
*p == ':') {
1168+
oidset_insert(&excluded_by_config, oid);
1169+
return 0;
1170+
}
1171+
}
1172+
}
1173+
}
1174+
11291175
return 1;
11301176
}
11311177

@@ -2728,6 +2774,19 @@ static int git_pack_config(const char *k, const char *v, void *cb)
27282774
pack_idx_opts.version);
27292775
return 0;
27302776
}
2777+
if (!strcmp(k, "uploadpack.blobpackfileuri")) {
2778+
struct configured_exclusion *ex = xmalloc(sizeof(*ex));
2779+
const char *end;
2780+
2781+
if (parse_oid_hex(v, &ex->e.oid, &end) || *end != ' ')
2782+
die(_("value of uploadpack.blobpackfileuri must be "
2783+
"of the form '<sha-1> <uri>' (got '%s')"), v);
2784+
if (oidmap_get(&configured_exclusions, &ex->e.oid))
2785+
die(_("object already configured in another "
2786+
"uploadpack.blobpackfileuri (got '%s')"), v);
2787+
ex->uri = xstrdup(end + 1);
2788+
oidmap_put(&configured_exclusions, ex);
2789+
}
27312790
return git_default_config(k, v, cb);
27322791
}
27332792

@@ -3320,6 +3379,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
33203379
N_("do not pack objects in promisor packfiles")),
33213380
OPT_BOOL(0, "delta-islands", &use_delta_islands,
33223381
N_("respect islands during delta compression")),
3382+
OPT_STRING_LIST(0, "uri-protocol", &uri_protocols,
3383+
N_("protocol"),
3384+
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
33233385
OPT_END(),
33243386
};
33253387

@@ -3504,6 +3566,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
35043566
the_repository);
35053567
}
35063568

3569+
write_excluded_by_configs();
35073570
trace2_region_enter("pack-objects", "write-pack-file", the_repository);
35083571
write_pack_file();
35093572
trace2_region_leave("pack-objects", "write-pack-file", the_repository);

fetch-pack.c

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ static struct lock_file shallow_lock;
3838
static const char *alternate_shallow_file;
3939
static char *negotiation_algorithm;
4040
static struct strbuf fsck_msg_types = STRBUF_INIT;
41+
static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
4142

4243
/* Remember to update object flag allocation in object.h */
4344
#define COMPLETE (1U << 0)
@@ -1149,6 +1150,26 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
11491150
warning("filtering not recognized by server, ignoring");
11501151
}
11511152

1153+
if (server_supports_feature("fetch", "packfile-uris", 0)) {
1154+
int i;
1155+
struct strbuf to_send = STRBUF_INIT;
1156+
1157+
for (i = 0; i < uri_protocols.nr; i++) {
1158+
const char *s = uri_protocols.items[i].string;
1159+
1160+
if (!strcmp(s, "https") || !strcmp(s, "http")) {
1161+
if (to_send.len)
1162+
strbuf_addch(&to_send, ',');
1163+
strbuf_addstr(&to_send, s);
1164+
}
1165+
}
1166+
if (to_send.len) {
1167+
packet_buf_write(&req_buf, "packfile-uris %s",
1168+
to_send.buf);
1169+
strbuf_release(&to_send);
1170+
}
1171+
}
1172+
11521173
/* add wants */
11531174
add_wants(args->no_dependents, wants, &req_buf);
11541175

@@ -1325,6 +1346,32 @@ static void receive_wanted_refs(struct packet_reader *reader,
13251346
die(_("error processing wanted refs: %d"), reader->status);
13261347
}
13271348

1349+
static void receive_packfile_uris(struct packet_reader *reader)
1350+
{
1351+
process_section_header(reader, "packfile-uris", 0);
1352+
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
1353+
const char *p;
1354+
struct child_process cmd = CHILD_PROCESS_INIT;
1355+
1356+
1357+
if (!skip_prefix(reader->line, "uri ", &p))
1358+
die("expected 'uri <uri>', got: %s\n", reader->line);
1359+
1360+
argv_array_push(&cmd.args, "http-fetch");
1361+
argv_array_push(&cmd.args, "--packfile");
1362+
argv_array_push(&cmd.args, p);
1363+
cmd.git_cmd = 1;
1364+
cmd.no_stdin = 1;
1365+
cmd.no_stdout = 1;
1366+
if (start_command(&cmd))
1367+
die("fetch-pack: unable to spawn");
1368+
if (finish_command(&cmd))
1369+
die("fetch-pack: unable to finish");
1370+
}
1371+
if (reader->status != PACKET_READ_DELIM)
1372+
die("expected DELIM");
1373+
}
1374+
13281375
enum fetch_state {
13291376
FETCH_CHECK_LOCAL = 0,
13301377
FETCH_SEND_REQUEST,
@@ -1417,6 +1464,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
14171464
receive_wanted_refs(&reader, sought, nr_sought);
14181465

14191466
/* get the pack */
1467+
if (process_section_header(&reader, "packfile-uris", 1)) {
1468+
receive_packfile_uris(&reader);
1469+
}
14201470
process_section_header(&reader, "packfile", 0);
14211471
if (get_pack(args, fd, pack_lockfile))
14221472
die(_("git fetch-pack: fetch failed."));
@@ -1467,6 +1517,14 @@ static void fetch_pack_config(void)
14671517
git_config_get_bool("transfer.fsckobjects", &transfer_fsck_objects);
14681518
git_config_get_string("fetch.negotiationalgorithm",
14691519
&negotiation_algorithm);
1520+
if (!uri_protocols.nr) {
1521+
char *str;
1522+
1523+
if (!git_config_get_string("fetch.uriprotocols", &str) && str) {
1524+
string_list_split(&uri_protocols, str, ',', -1);
1525+
free(str);
1526+
}
1527+
}
14701528

14711529
git_config(fetch_pack_config_cb, NULL);
14721530
}

0 commit comments

Comments
 (0)