-
Notifications
You must be signed in to change notification settings - Fork 8
Manual
Aaron Riekenberg edited this page Apr 27, 2024
·
79 revisions
- Command line options
- Commands from arguments
- Commands from stdin
- Command and initial arguments on command line
- Reading multiple inputs
- Parallelism
- Dry run
- Debug logging
- Error handling
- Timeout
- Path cache
- Progress bar
- Regular Expression
- Shell Commands
- Bash Function
$ rust-parallel --help
Execute commands in parallel
By Aaron Riekenberg <[email protected]>
https://github.com/aaronriekenberg/rust-parallel
https://crates.io/crates/rust-parallel
Usage: rust-parallel [OPTIONS] [COMMAND_AND_INITIAL_ARGUMENTS]...
Arguments:
[COMMAND_AND_INITIAL_ARGUMENTS]...
Optional command and initial arguments.
If this contains 1 or more ::: delimiters the cartesian product of arguments from all groups are run.
Options:
-d, --discard-output <DISCARD_OUTPUT>
Discard output for commands
Possible values:
- stdout: Redirect stdout for commands to /dev/null
- stderr: Redirect stderr for commands to /dev/null
- all: Redirect stdout and stderr for commands to /dev/null
-i, --input-file <INPUT_FILE>
Input file or - for stdin. Defaults to stdin if no inputs are specified
-j, --jobs <JOBS>
Maximum number of commands to run in parallel, defauts to num cpus
[default: 8]
-0, --null-separator
Use null separator for reading input files instead of newline
-p, --progress-bar
Display progress bar
-r, --regex <REGEX>
Apply regex pattern to inputs
-s, --shell
Use shell mode for running commands.
Each command line is passed to "<shell-path> <shell-argument>" as a single argument.
-t, --timeout-seconds <TIMEOUT_SECONDS>
Timeout seconds for running commands. Defaults to infinite timeout if not specified
--channel-capacity <CHANNEL_CAPACITY>
Input and output channel capacity, defaults to num cpus * 2
[default: 16]
--disable-path-cache
Disable command path cache
--dry-run
Dry run mode
Do not actually run commands just log.
--exit-on-error
Exit on error mode
Exit immediately when a command fails.
--no-run-if-empty
Do not run commands for empty buffered input lines
--shell-path <SHELL_PATH>
Path to shell to use for shell mode
[default: /bin/bash]
--shell-argument <SHELL_ARGUMENT>
Argument to shell for shell mode
[default: -c]
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
The :::
separator can be used to run the Cartesian Product of command line arguments. This is similar to the :::
behavior in GNU Parallel.
$ rust-parallel echo ::: A B ::: C D ::: E F G
A C F
A C E
A C G
B C E
A D F
A D E
A D G
B C F
B C G
B D E
B D F
B D G
$ rust-parallel echo hello ::: larry curly moe
hello larry
hello curly
hello moe
# run gzip -k on all *.html files in current directory
$ rust-parallel gzip -k ::: *.html
Variables {0}
, {1}
, etc are automatically available based on the number of arguments. {0}
will be replaced by the entire input line, and other groups match individual argument groups This is useful for building more complex command lines. For example:
$ rust-parallel echo group0={0} group1={1} group2={2} group3={3} group2again={2} ::: A B ::: C D ::: E F G
group0=A C G group1=A group2=C group3=G group2again=C
group0=A C E group1=A group2=C group3=E group2again=C
group0=A D E group1=A group2=D group3=E group2again=D
group0=A D F group1=A group2=D group3=F group2again=D
group0=B C E group1=B group2=C group3=E group2again=C
group0=A C F group1=A group2=C group3=F group2again=C
group0=A D G group1=A group2=D group3=G group2again=D
group0=B C F group1=B group2=C group3=F group2again=C
group0=B D E group1=B group2=D group3=E group2again=D
group0=B C G group1=B group2=C group3=G group2again=C
group0=B D G group1=B group2=D group3=G group2again=D
group0=B D F group1=B group2=D group3=F group2again=D
```
## Commands from stdin
Run complete commands from stdin.
```
$ cat >./test <<EOL
echo hi
echo there
echo how
echo are
echo you
EOL
$ cat test | rust-parallel
there
are
hi
how
you
```
## Command and initial arguments on command line
Here `md5 -s` will be prepended to each input line to form a command like `md5 -s aal`
```
$ head -100 /usr/share/dict/words | rust-parallel md5 -s | head -10
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("A") = 7fc56270e7a70fa81a5935b72eacbe29
MD5 ("aal") = ff45e881572ca2c987460932660d320c
MD5 ("Aani") = e9b22dd6213c3d29648e8ad7a8642f2f
MD5 ("aa") = 4124bc0a9335c27f086f24ba207a4912
MD5 ("aardvark") = 88571e5d5e13a4a60f82cea7802f6255
MD5 ("aalii") = 0a1ea2a8d75d02ae052f8222e36927a5
MD5 ("aam") = 35c2d90f7c06b623fe763d0a4e5b7ed9
MD5 ("Aaron") = 1c0a11cc4ddc0dbd3fa4d77232a4e22e
MD5 ("Aaronic") = 0390cf1718c4f2d76f770c7c35b40c50
```
## Reading multiple inputs
By default `rust-parallel` reads input from stdin only. The `-i` option can be used 1 or more times to override this behavior. `-i -` means read from stdin, `-i ./test` means read from the file `./test`:
```
$ cat >./test <<EOL
foo
bar
baz
EOL
$ head -5 /usr/share/dict/words | rust-parallel -i - -i ./test echo
A
a
aal
aalii
foo
aa
baz
bar
```
## Parallelism
By default the number of parallel jobs to run simulatenously is the number of cpus detected at run time.
This can be override with the `-j`/`--jobs` option.
With `-j5` all echo commands below run in parallel.
With `-j1` all jobs run sequentially.
```
$ rust-parallel -j5 echo ::: hi there how are you
hi
you
are
there
how
$ rust-parallel -j1 echo ::: hi there how are you
hi
there
how
are
you
```
## Dry run
Use option `--dry-run` for dry run mode.
In this mode the commands that would be run are ouput as info level logs.
No commands are actually run - this is useful for testing before running a job.
```
$ rust-parallel --dry-run echo ::: hi there how are you
2024-04-27T15:43:01.009840Z INFO rust_parallel::command: cmd="/bin/echo",args=["hi"],line=command_line_args:1
2024-04-27T15:43:01.009865Z INFO rust_parallel::command: cmd="/bin/echo",args=["there"],line=command_line_args:2
2024-04-27T15:43:01.009873Z INFO rust_parallel::command: cmd="/bin/echo",args=["how"],line=command_line_args:3
2024-04-27T15:43:01.009879Z INFO rust_parallel::command: cmd="/bin/echo",args=["are"],line=command_line_args:4
2024-04-27T15:43:01.009885Z INFO rust_parallel::command: cmd="/bin/echo",args=["you"],line=command_line_args:5
```
## Debug logging
Set environment variable `RUST_LOG=debug` to see debug output.
This logs structured information about command line arguments and commands being run.
Recommend enabling debug logging for all examples to understand what is happening in more detail.
```
$ RUST_LOG=debug rust-parallel echo ::: hi there how are you | grep command_line_args | head -1
2024-04-27T15:43:01.012293Z DEBUG try_main: rust_parallel::command_line_args: command_line_args = CommandLineArgs { discard_output: None, input_file: [], jobs: 8, null_separator: false, progress_bar: false, regex: None, shell: false, timeout_seconds: None, channel_capacity: 16, disable_path_cache: false, dry_run: false, exit_on_error: false, no_run_if_empty: false, shell_path: "/bin/bash", shell_argument: "-c", command_and_initial_arguments: ["echo", ":::", "hi", "there", "how", "are", "you"] }
$ RUST_LOG=debug rust-parallel echo ::: hi there how are you | grep command_line_args:1
2024-04-27T15:43:01.017700Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1}: rust_parallel::command: begin run
2024-04-27T15:43:01.017841Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: spawned child process, awaiting completion
2024-04-27T15:43:01.018481Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: command exit status = exit status: 0
2024-04-27T15:43:01.018493Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: end run
```
## Error handling
The following are considered command failures and error will be logged:
* Spawn error
* Timeout
* I/O error
* Command exits with non-0 status
By default rust-parallel runs all commands even if failures occur.
When rust-parallel terminates, if any command failed it logs failure metrics and exits with status 1.
Here we try to use `cat` to show non-existing files `A`, `B`, and `C`, so each command exits with status 1:
```
$ rust-parallel cat ::: A B C
cat: A: No such file or directory
2024-04-27T15:43:01.022848Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["A"],line=command_line_args:1 exit_status=1
cat: C: No such file or directory
2024-04-27T15:43:01.022921Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["C"],line=command_line_args:3 exit_status=1
cat: B: No such file or directory
2024-04-27T15:43:01.023007Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["B"],line=command_line_args:2 exit_status=1
2024-04-27T15:43:01.023038Z ERROR rust_parallel: fatal error in main: command failures: commands_run=3 total_failures=3 spawn_errors=0 timeouts=0 io_errors=0 exit_status_errors=3
$ echo $?
1
```
The `--exit-on-error` option can be used to exit after one command fails.
rust-parallel waits for in-progress commands to finish before exiting and then exits with status 1.
```
$ head -100 /usr/share/dict/words | rust-parallel --exit-on-error cat
cat: a: No such file or directory
2024-04-27T15:43:01.026706Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["a"],line=stdin:2 exit_status=1
cat: aam: No such file or directory
2024-04-27T15:43:01.026824Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aam"],line=stdin:6 exit_status=1
cat: A: No such file or directory
2024-04-27T15:43:01.026932Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["A"],line=stdin:1 exit_status=1
cat: aa: No such file or directory
2024-04-27T15:43:01.027216Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aa"],line=stdin:3 exit_status=1
cat: aal: No such file or directory
2024-04-27T15:43:01.027285Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aal"],line=stdin:4 exit_status=1
cat: aardvark: No such file or directory
2024-04-27T15:43:01.027390Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aardvark"],line=stdin:8 exit_status=1
cat: aalii: No such file or directory
2024-04-27T15:43:01.027460Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aalii"],line=stdin:5 exit_status=1
cat: Aani: No such file or directory
2024-04-27T15:43:01.027537Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["Aani"],line=stdin:7 exit_status=1
cat: aardwolf: No such file or directory
2024-04-27T15:43:01.027850Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aardwolf"],line=stdin:9 exit_status=1
2024-04-27T15:43:01.027893Z ERROR rust_parallel: fatal error in main: command failures: commands_run=9 total_failures=9 spawn_errors=0 timeouts=0 io_errors=0 exit_status_errors=9
$ echo $?
1
```
## Timeout
The `-t`/`--timeout-seconds` option can be used to specify a command timeout in seconds. If any command times out this is considered a command failure (see [error handling](#error-handling)).
```
$ rust-parallel -t 0.5 sleep ::: 0 3 5
2024-04-27T15:43:01.533003Z ERROR rust_parallel::command: child process error command: cmd="/bin/sleep",args=["5"],line=command_line_args:3 error: timeout: deadline has elapsed
2024-04-27T15:43:01.533011Z ERROR rust_parallel::command: child process error command: cmd="/bin/sleep",args=["3"],line=command_line_args:2 error: timeout: deadline has elapsed
2024-04-27T15:43:01.533457Z ERROR rust_parallel: fatal error in main: command failures: commands_run=3 total_failures=2 spawn_errors=0 timeouts=2 io_errors=0 exit_status_errors=0
$ echo $?
1
```
## Path Cache
By default as commands are run the full paths are resolved using [which](https://github.com/harryfei/which-rs). Resolved paths are stored in a cache to prevent duplicate resolutions. This is generally [good for performance](https://github.com/aaronriekenberg/rust-parallel/wiki/Benchmarks).
The path cache can be disabled using the `--disable-path-cache` option.
## Progress bar
The `-p`/`--progress-bar` option can be used to enable a graphical progress bar.
This is best used for commands which are running for at least a few seconds, and which do not produce output to stdout or stderr. In the below commands `-d all` is used to discard all output from commands run.
Progress styles can be chosen with the `PROGRESS_STYLE` environment variable. If `PROGRESS_STYLE` is not set it defaults to `light_bg`.
The following progress styles are available:
* `PROGRESS_STYLE=light_bg` good for light terminal background with colors, spinner, and steady tick enabled:

* `PROGRESS_STYLE=dark_bg` good for dark terminal background with colors, spinner, and steady tick enabled:

* `PROGRESS_STYLE=simple` good for simple or non-ansi terminals/jobs with colors, spinner, and steady tick disabled:

## Regular Expression
Regular expressions can be specified by the `-r` or `--regex` command line argument.
[Named or numbered capture groups](https://docs.rs/regex/latest/regex/#grouping-and-flags) are expanded with data values from the current input before the command is executed.
### Named Capture Groups
In these examples using command line arguments `{url}` and `{filename}` are named capture groups. `{0}` is a numbered capture group.
```
$ rust-parallel -r '(?P<url>.*),(?P<filename>.*)' echo got url={url} filename={filename} ::: URL1,filename1 URL2,filename2
got url=URL1 filename=filename1
got url=URL2 filename=filename2
$ rust-parallel -r '(?P<url>.*) (?P<filename>.*)' echo got url={url} filename={filename} full input={0} ::: URL1 URL2 ::: filename1 filename2
got url=URL1 filename=filename1 full input=URL1 filename1
got url=URL1 filename=filename2 full input=URL1 filename2
got url=URL2 filename=filename2 full input=URL2 filename2
got url=URL2 filename=filename1 full input=URL2 filename1
```
### Numbered Capture Groups
In the next example input file arguments `{0}` `{1}` `{2}` `{3}` are numbered capture groups, and the input is a csv file:
```
$ cat >./test <<EOL
foo,bar,baz
foo2,bar2,baz2
foo3,bar3,baz3
EOL
$ cat test | rust-parallel -r '(.*),(.*),(.*)' echo got arg1={1} arg2={2} arg3={3} full input={0}
got arg1=foo arg2=bar arg3=baz full input=foo,bar,baz
got arg1=foo2 arg2=bar2 arg3=baz2 full input=foo2,bar2,baz2
got arg1=foo3 arg2=bar3 arg3=baz3 full input=foo3,bar3,baz3
```
### Capture Group Special Characters
All occurrences of capture groups are replaced as exact strings. Surrounding characters have no effect on this.
This means capture groups can be nested with other `{` or `}` characters such as when building json:
```
$ cat >./test <<EOL
1,2,3
4,5,6
7,8,9
EOL
$ cat test | rust-parallel -r '(.*),(.*),(.*)' echo '{"one":{1},"two":{2},"nested_object":{"three":{3}}}'
{"one":1,"two":2,"nested_object":{"three":3}}
{"one":4,"two":5,"nested_object":{"three":6}}
{"one":7,"two":8,"nested_object":{"three":9}}
```
## Shell Commands
Shell commands can be written using `-s` shell mode.
Multiline commands can be written using `;`.
Environment variables, `$` characters, nested commands and much more are possible:
```
$ rust-parallel -s -r '(?P<arg1>.*) (?P<arg2>.*)' 'FOO={arg1}; BAR={arg2}; echo "FOO = ${FOO}, BAR = ${BAR}, shell pid = $$, date = $(date)"' ::: A B ::: CAT DOG
FOO = B, BAR = DOG, shell pid = 85989, date = Sat Apr 27 10:43:01 CDT 2024
FOO = A, BAR = DOG, shell pid = 85991, date = Sat Apr 27 10:43:01 CDT 2024
FOO = B, BAR = CAT, shell pid = 85990, date = Sat Apr 27 10:43:01 CDT 2024
FOO = A, BAR = CAT, shell pid = 85988, date = Sat Apr 27 10:43:01 CDT 2024
```
## Bash Function
`-s` shell mode can be used to invoke an arbitrary bash function.
Similar to normal commands bash functions can be called using stdin, input files, or from command line arguments.
### Function Setup
Define a bash fuction `logargs` that logs all arguments and make visible with `export -f`:
```
$ logargs() {
echo "logargs got $@"
}
$ export -f logargs
```
### Demo of command line arguments:
```
$ rust-parallel -s logargs ::: A B C ::: D E F
logargs got A D
logargs got B D
logargs got B E
logargs got A E
logargs got C E
logargs got B F
logargs got C D
logargs got A F
logargs got C F
```
### Demo of function and command line arguments from stdin:
```
$ cat >./test <<EOL
logargs hello alice
logargs hello bob
logargs hello charlie
EOL
$ cat test | rust-parallel -s
logargs got hello alice
logargs got hello bob
logargs got hello charlie
```
### Demo of function and initial arguments on command line, additional arguments from stdin:
```
$ cat >./test <<EOL
alice
bob
charlie
EOL
$ cat test | rust-parallel -s logargs hello
logargs got hello alice
logargs got hello bob
logargs got hello charlie
```