Skip to content

Commit 2067cba

Browse files
nasamuffingitster
authored andcommitted
doc: propose hooks managed by the config
Begin a design document for config-based hooks, managed via git-hook. Focus on an overview of the implementation and motivation for design decisions. Briefly discuss the alternatives considered before this point. Also, attempt to redefine terms to fit into a multihook world. Signed-off-by: Emily Shaffer <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent ae92ac8 commit 2067cba

File tree

2 files changed

+355
-0
lines changed

2 files changed

+355
-0
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ SP_ARTICLES += $(API_DOCS)
8080
TECH_DOCS += MyFirstContribution
8181
TECH_DOCS += MyFirstObjectWalk
8282
TECH_DOCS += SubmittingPatches
83+
TECH_DOCS += technical/config-based-hooks
8384
TECH_DOCS += technical/hash-function-transition
8485
TECH_DOCS += technical/http-protocol
8586
TECH_DOCS += technical/index-format
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
Configuration-based hook management
2+
===================================
3+
:sectanchors:
4+
5+
[[motivation]]
6+
== Motivation
7+
8+
Treat hooks as a first-class citizen by replacing the .git/hook/hookname path as
9+
the only source of hooks to execute, in a way which is friendly to users with
10+
multiple repos which have similar needs.
11+
12+
Redefine "hook" as an event rather than a single script, allowing users to
13+
perform unrelated actions on a single event.
14+
15+
Take a step closer to safety when copying zipped Git repositories from untrusted
16+
users by making it more apparent to users which scripts will be run during
17+
normal Git operations.
18+
19+
Make it easier for users to discover Git's hook feature and automate their
20+
workflows.
21+
22+
[[user-interfaces]]
23+
== User interfaces
24+
25+
[[config-schema]]
26+
=== Config schema
27+
28+
Hooks can be introduced by editing the configuration manually. There are two new
29+
sections added, `hook` and `hookcmd`.
30+
31+
[[config-schema-hook]]
32+
==== `hook`
33+
34+
Primarily contains subsections for each hook event. These order of these
35+
subsections defines the hook command execution order; hook commands can be
36+
specified by setting the value directly to the command if no additional
37+
configuration is needed, or by setting the value as the name of a `hookcmd`. If
38+
Git does not find a `hookcmd` whose subsection matches the value of the given
39+
command string, Git will try to execute the string directly. Hooks are executed
40+
by passing the resolved command string to the shell. Hook event subsections can
41+
also contain per-hook-event settings.
42+
43+
Also contains top-level hook execution settings, for example,
44+
`hook.warnHookDir`, `hook.runHookDir`, or `hook.disableAll`. (These settings are
45+
described more in <<library,Library>>.)
46+
47+
----
48+
[hook "pre-commit"]
49+
command = perl-linter
50+
command = /usr/bin/git-secrets --pre-commit
51+
52+
[hook "pre-applypatch"]
53+
command = perl-linter
54+
error = ignore
55+
56+
[hook]
57+
runHookDir = interactive
58+
----
59+
60+
[[config-schema-hookcmd]]
61+
==== `hookcmd`
62+
63+
Defines a hook command and its attributes, which will be used when a hook event
64+
occurs. Unqualified attributes are assumed to apply to this hook during all hook
65+
events, but event-specific attributes can also be supplied. The example runs
66+
`/usr/bin/lint-it --language=perl <args passed by Git>`, but for repos which
67+
include this config, the hook command will be skipped for all events to which
68+
it's normally subscribed _except_ `pre-commit`.
69+
70+
----
71+
[hookcmd "perl-linter"]
72+
command = /usr/bin/lint-it --language=perl
73+
skip = true
74+
pre-commit-skip = false
75+
----
76+
77+
[[command-line-api]]
78+
=== Command-line API
79+
80+
Users should be able to view, reorder, and create hook commands via the command
81+
line. External tools should be able to view a list of hooks in the correct order
82+
to run.
83+
84+
*`git hook list <hook-event>`*
85+
86+
*`git hook list (--system|--global|--local|--worktree)`*
87+
88+
*`git hook edit <hook-event>`*
89+
90+
*`git hook add <hook-command> <hook-event> <options...>`*
91+
92+
[[hook-editor]]
93+
=== Hook editor
94+
95+
The tool which is presented by `git hook edit <hook-command>`. Ideally, this
96+
tool should be easier to use than manually editing the config, and then produce
97+
a concise config afterwards. It may take a form similar to `git rebase
98+
--interactive`.
99+
100+
[[implementation]]
101+
== Implementation
102+
103+
[[library]]
104+
=== Library
105+
106+
`hook.c` and `hook.h` are responsible for interacting with the config files. In
107+
the case when the code generating a hook event doesn't have special concerns
108+
about how to run the hooks, the hook library will provide a basic API to call
109+
all hooks in config order with an `argv_array` provided by the code which
110+
generates the hook event:
111+
112+
*`int run_hooks(const char *hookname, struct argv_array *args)`*
113+
114+
This call includes the hook command provided by `run-command.h:find_hook()`;
115+
eventually, this legacy hook will be gated by a config `hook.runHookDir`. The
116+
config is checked against a number of cases:
117+
118+
- "no": the legacy hook will not be run
119+
- "interactive": Git will prompt the user before running the legacy hook
120+
- "warn": Git will print a warning to stderr before running the legacy hook
121+
- "yes" (default): Git will silently run the legacy hook
122+
123+
In case this list is expanded in the future, if a value for `hook.runHookDir` is
124+
given which Git does not recognize, Git should discard that config entry. For
125+
example, if "warn" was specified at system level and "junk" was specified at
126+
global level, Git would resolve the value to "warn"; if the only time the config
127+
was set was to "junk", Git would use the default value of "yes".
128+
129+
If the caller wants to do something more complicated, the hook library can also
130+
provide a callback API:
131+
132+
*`int for_each_hookcmd(const char *hookname, hookcmd_function *cb)`*
133+
134+
Finally, to facilitate the builtin, the library will also provide the following
135+
APIs to interact with the config:
136+
137+
----
138+
int set_hook_commands(const char *hookname, struct string_list *commands,
139+
enum config_scope scope);
140+
int set_hookcmd(const char *hookcmd, struct hookcmd options);
141+
142+
int list_hook_commands(const char *hookname, struct string_list *commands);
143+
int list_hooks_in_scope(enum config_scope scope, struct string_list *commands);
144+
----
145+
146+
`struct hookcmd` is expected to grow in size over time as more functionality is
147+
added to hooks; so that other parts of the code don't need to understand the
148+
config schema, `struct hookcmd` should contain logical values instead of string
149+
pairs.
150+
151+
----
152+
struct hookcmd {
153+
const char *name;
154+
const char *command;
155+
156+
/* for illustration only; not planned at present */
157+
int parallelizable;
158+
const char *hookcmd_before;
159+
const char *hookcmd_after;
160+
enum recovery_action on_fail;
161+
}
162+
----
163+
164+
[[builtin]]
165+
=== Builtin
166+
167+
`builtin/hook.c` is responsible for providing the frontend. It's responsible for
168+
formatting user-provided data and then calling the library API to set the
169+
configs as appropriate. The builtin frontend is not responsible for calling the
170+
config directly, so that other areas of Git can rely on the hook library to
171+
understand the most recent config schema for hooks.
172+
173+
[[migration]]
174+
=== Migration path
175+
176+
[[stage-0]]
177+
==== Stage 0
178+
179+
Hooks are called by running `run-command.h:find_hook()` with the hookname and
180+
executing the result. The hook library and builtin do not exist. Hooks only
181+
exist as specially named scripts within `.git/hooks/`.
182+
183+
[[stage-1]]
184+
==== Stage 1
185+
186+
`git hook list --porcelain <hook-event>` is implemented. Users can replace their
187+
`.git/hooks/<hook-event>` scripts with a trampoline based on `git hook list`'s
188+
output. Modifier commands like `git hook add` and `git hook edit` can be
189+
implemented around this time as well.
190+
191+
[[stage-2]]
192+
==== Stage 2
193+
194+
`hook.h:run_hooks()` is taught to include `run-command.h:find_hook()` at the
195+
end; calls to `find_hook()` are replaced with calls to `run_hooks()`. Users can
196+
opt-in to config-based hooks simply by creating some in their config; otherwise
197+
users should remain unaffected by the change.
198+
199+
[[stage-3]]
200+
==== Stage 3
201+
202+
The call to `find_hook()` inside of `run_hooks()` learns to check for a config,
203+
`hook.runHookDir`. Users can opt into managing their hooks completely via the
204+
config this way.
205+
206+
[[stage-4]]
207+
==== Stage 4
208+
209+
`.git/hooks` is removed from the template and the hook directory is considered
210+
deprecated. To avoid breaking older repos, the default of `hook.runHookDir` is
211+
not changed, and `find_hook()` is not removed.
212+
213+
[[caveats]]
214+
== Caveats
215+
216+
[[security]]
217+
=== Security and repo config
218+
219+
Part of the motivation behind this refactor is to mitigate hooks as an attack
220+
vector;footnote:[https://lore.kernel.org/git/[email protected]/]
221+
however, as the design stands, users can still provide hooks in the repo-level
222+
config, which is included when a repo is zipped and sent elsewhere. The
223+
security of the repo-level config is still under discussion; this design
224+
generally assumes the repo-level config is secure, which is not true yet. The
225+
goal is to avoid an overcomplicated design to work around a problem which has
226+
ceased to exist.
227+
228+
[[ease-of-use]]
229+
=== Ease of use
230+
231+
The config schema is nontrivial; that's why it's important for the `git hook`
232+
modifier commands to be usable. Contributors with UX expertise are encouraged to
233+
share their suggestions.
234+
235+
[[alternatives]]
236+
== Alternative approaches
237+
238+
A previous summary of alternatives exists in the
239+
archives.footnote:[https://lore.kernel.org/git/[email protected]]
240+
241+
[[status-quo]]
242+
=== Status quo
243+
244+
Today users can implement multihooks themselves by using a "trampoline script"
245+
as their hook, and pointing that script to a directory or list of other scripts
246+
they wish to run.
247+
248+
[[hook-directories]]
249+
=== Hook directories
250+
251+
Other contributors have suggested Git learn about the existence of a directory
252+
such as `.git/hooks/<hookname>.d` and execute those hooks in alphabetical order.
253+
254+
[[comparison]]
255+
=== Comparison table
256+
257+
.Comparison of alternatives
258+
|===
259+
|Feature |Config-based hooks |Hook directories |Status quo
260+
261+
|Supports multiple hooks
262+
|Natively
263+
|Natively
264+
|With user effort
265+
266+
|Safer for zipped repos
267+
|A little
268+
|No
269+
|No
270+
271+
|Previous hooks just work
272+
|If configured
273+
|Yes
274+
|Yes
275+
276+
|Can install one hook to many repos
277+
|Yes
278+
|No
279+
|No
280+
281+
|Discoverability
282+
|Better (in `git help git`)
283+
|Same as before
284+
|Same as before
285+
286+
|Hard to run unexpected hook
287+
|If configured
288+
|No
289+
|No
290+
|===
291+
292+
[[future-work]]
293+
== Future work
294+
295+
[[execution-ordering]]
296+
=== Execution ordering
297+
298+
We may find that config order is insufficient for some users; for example,
299+
config order makes it difficult to add a new hook to the system or global config
300+
which runs at the end of the hook list. A new ordering schema should be:
301+
302+
1) Specified by a `hook.order` config, so that users will not unexpectedly see
303+
their order change;
304+
305+
2) Either dependency or numerically based.
306+
307+
Dependency-based ordering is prone to classic linked-list problems, like a
308+
cycles and handling of missing dependencies. But, it paves the way for enabling
309+
parallelization if some tasks truly depend on others.
310+
311+
Numerical ordering makes it tricky for Git to generate suggested ordering
312+
numbers for each command, but is easy to determine a definitive order.
313+
314+
[[parallelization]]
315+
=== Parallelization
316+
317+
Users with many hooks might want to run them simultaneously, if the hooks don't
318+
modify state; if one hook depends on another's output, then users will want to
319+
specify those dependencies. If we decide to solve this problem, we may want to
320+
look to modern build systems for inspiration on how to manage dependencies and
321+
parallel tasks.
322+
323+
[[securing-hookdir-hooks]]
324+
=== Securing hookdir hooks
325+
326+
With the design as written in this doc, it's still possible for a malicious user
327+
to modify `.git/config` to include `hook.pre-receive.command = rm -rf /`, then
328+
zip their repo and send it to another user. It may be necessary to teach Git to
329+
only allow inlined hooks like this if they were configured outside of the local
330+
scope (in other words, only run hookcmds, and only allow hookcmds to be
331+
configured in global or system scope); or another approach, like a list of safe
332+
projects, might be useful. It may also be sufficient (or at least useful) to
333+
teach a `hook.disableAll` config or similar flag to the Git executable.
334+
335+
[[submodule-inheritance]]
336+
=== Submodule inheritance
337+
338+
It's possible some submodules may want to run the identical set of hooks that
339+
their superrepo runs. While a globally-configured hook set is helpful, it's not
340+
a great solution for users who have multiple repos-with-submodules under the
341+
same user. It would be useful for submodules to learn how to run hooks from
342+
their superrepo's config, or inherit that hook setting.
343+
344+
[[glossary]]
345+
== Glossary
346+
347+
*hook event*
348+
349+
A point during Git's execution where user scripts may be run, for example,
350+
_prepare-commit-msg_ or _pre-push_.
351+
352+
*hook command*
353+
354+
A user script or executable which will be run on one or more hook events.

0 commit comments

Comments
 (0)