Skip to content
Aaron Riekenberg edited this page Apr 27, 2024 · 79 revisions

Manual for rust-parallel 1.18.0

  1. Command line options
  2. Commands from arguments
  3. Commands from stdin
  4. Command and initial arguments on command line
  5. Reading multiple inputs
  6. Parallelism
  7. Dry run
  8. Debug logging
  9. Error handling
  10. Timeout
  11. Path cache
  12. Progress bar
  13. Regular Expression
    1. Named Capture Groups
    2. Numbered Capture Groups
    3. Capture Group Special Characters
  14. Shell Commands
  15. Bash Function
    1. Function Setup
    2. Demo of command line arguments
    3. Demo of function and command line arguments from stdin
    4. Demo of function and initial arguments on command line, additional arguments from stdin

Command line options

$ rust-parallel --help
Execute commands in parallel

By Aaron Riekenberg <[email protected]>

https://github.com/aaronriekenberg/rust-parallel
https://crates.io/crates/rust-parallel

Usage: rust-parallel [OPTIONS] [COMMAND_AND_INITIAL_ARGUMENTS]...

Arguments:
  [COMMAND_AND_INITIAL_ARGUMENTS]...
          Optional command and initial arguments.
          
          If this contains 1 or more ::: delimiters the cartesian product of arguments from all groups are run.

Options:
  -d, --discard-output <DISCARD_OUTPUT>
          Discard output for commands

          Possible values:
          - stdout: Redirect stdout for commands to /dev/null
          - stderr: Redirect stderr for commands to /dev/null
          - all:    Redirect stdout and stderr for commands to /dev/null

  -i, --input-file <INPUT_FILE>
          Input file or - for stdin.  Defaults to stdin if no inputs are specified

  -j, --jobs <JOBS>
          Maximum number of commands to run in parallel, defauts to num cpus
          
          [default: 8]

  -0, --null-separator
          Use null separator for reading input files instead of newline

  -p, --progress-bar
          Display progress bar

  -r, --regex <REGEX>
          Apply regex pattern to inputs

  -s, --shell
          Use shell mode for running commands.
          
          Each command line is passed to "<shell-path> <shell-argument>" as a single argument.

  -t, --timeout-seconds <TIMEOUT_SECONDS>
          Timeout seconds for running commands.  Defaults to infinite timeout if not specified

      --channel-capacity <CHANNEL_CAPACITY>
          Input and output channel capacity, defaults to num cpus * 2
          
          [default: 16]

      --disable-path-cache
          Disable command path cache

      --dry-run
          Dry run mode
          
          Do not actually run commands just log.

      --exit-on-error
          Exit on error mode
          
          Exit immediately when a command fails.

      --no-run-if-empty
          Do not run commands for empty buffered input lines

      --shell-path <SHELL_PATH>
          Path to shell to use for shell mode
          
          [default: /bin/bash]

      --shell-argument <SHELL_ARGUMENT>
          Argument to shell for shell mode
          
          [default: -c]

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Commands from arguments

The ::: separator can be used to run the Cartesian Product of command line arguments. This is similar to the ::: behavior in GNU Parallel.

$ rust-parallel echo ::: A B ::: C D ::: E F G
A C F
A C E
A C G
B C E
A D F
A D E
A D G
B C F
B C G
B D E
B D F
B D G

$ rust-parallel echo hello ::: larry curly moe
hello larry
hello curly
hello moe

# run gzip -k on all *.html files in current directory
$ rust-parallel gzip -k ::: *.html

Variables {0}, {1}, etc are automatically available based on the number of arguments. {0} will be replaced by the entire input line, and other groups match individual argument groups This is useful for building more complex command lines. For example:

$ rust-parallel echo group0={0} group1={1} group2={2} group3={3} group2again={2} ::: A B ::: C D ::: E F G
group0=A C G group1=A group2=C group3=G group2again=C
group0=A C E group1=A group2=C group3=E group2again=C
group0=A D E group1=A group2=D group3=E group2again=D
group0=A D F group1=A group2=D group3=F group2again=D
group0=B C E group1=B group2=C group3=E group2again=C
group0=A C F group1=A group2=C group3=F group2again=C
group0=A D G group1=A group2=D group3=G group2again=D
group0=B C F group1=B group2=C group3=F group2again=C
group0=B D E group1=B group2=D group3=E group2again=D
group0=B C G group1=B group2=C group3=G group2again=C
group0=B D G group1=B group2=D group3=G group2again=D
group0=B D F group1=B group2=D group3=F group2again=D
```
## Commands from stdin

Run complete commands from stdin.


```
$ cat >./test <<EOL
echo hi
echo there
echo how
echo are
echo you
EOL

$ cat test | rust-parallel
there
are
hi
how
you
```
## Command and initial arguments on command line

Here `md5 -s` will be prepended to each input line to form a command like `md5 -s aal`

```
$ head -100 /usr/share/dict/words | rust-parallel md5 -s | head -10
MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661
MD5 ("A") = 7fc56270e7a70fa81a5935b72eacbe29
MD5 ("aal") = ff45e881572ca2c987460932660d320c
MD5 ("Aani") = e9b22dd6213c3d29648e8ad7a8642f2f
MD5 ("aa") = 4124bc0a9335c27f086f24ba207a4912
MD5 ("aardvark") = 88571e5d5e13a4a60f82cea7802f6255
MD5 ("aalii") = 0a1ea2a8d75d02ae052f8222e36927a5
MD5 ("aam") = 35c2d90f7c06b623fe763d0a4e5b7ed9
MD5 ("Aaron") = 1c0a11cc4ddc0dbd3fa4d77232a4e22e
MD5 ("Aaronic") = 0390cf1718c4f2d76f770c7c35b40c50
```

## Reading multiple inputs

By default `rust-parallel` reads input from stdin only.  The `-i` option can be used 1 or more times to override this behavior.  `-i -` means read from stdin, `-i ./test` means read from the file `./test`:

```
$ cat >./test <<EOL
foo
bar
baz
EOL

$ head -5 /usr/share/dict/words | rust-parallel -i - -i ./test echo
A
a
aal
aalii
foo
aa
baz
bar
```

## Parallelism

By default the number of parallel jobs to run simulatenously is the number of cpus detected at run time.

This can be override with the `-j`/`--jobs` option.

With `-j5` all echo commands below run in parallel.

With `-j1` all jobs run sequentially.

```
$ rust-parallel -j5 echo ::: hi there how are you
hi
you
are
there
how

$ rust-parallel -j1 echo ::: hi there how are you
hi
there
how
are
you
```
## Dry run

Use option `--dry-run` for dry run mode.

In this mode the commands that would be run are ouput as info level logs.

No commands are actually run - this is useful for testing before running a job.

```
$ rust-parallel --dry-run echo ::: hi there how are you
2024-04-27T15:43:01.009840Z  INFO rust_parallel::command: cmd="/bin/echo",args=["hi"],line=command_line_args:1
2024-04-27T15:43:01.009865Z  INFO rust_parallel::command: cmd="/bin/echo",args=["there"],line=command_line_args:2
2024-04-27T15:43:01.009873Z  INFO rust_parallel::command: cmd="/bin/echo",args=["how"],line=command_line_args:3
2024-04-27T15:43:01.009879Z  INFO rust_parallel::command: cmd="/bin/echo",args=["are"],line=command_line_args:4
2024-04-27T15:43:01.009885Z  INFO rust_parallel::command: cmd="/bin/echo",args=["you"],line=command_line_args:5
```
## Debug logging

Set environment variable `RUST_LOG=debug` to see debug output.

This logs structured information about command line arguments and commands being run.

Recommend enabling debug logging for all examples to understand what is happening in more detail.

```
$ RUST_LOG=debug rust-parallel echo ::: hi there how are you | grep command_line_args | head -1
2024-04-27T15:43:01.012293Z DEBUG try_main: rust_parallel::command_line_args: command_line_args = CommandLineArgs { discard_output: None, input_file: [], jobs: 8, null_separator: false, progress_bar: false, regex: None, shell: false, timeout_seconds: None, channel_capacity: 16, disable_path_cache: false, dry_run: false, exit_on_error: false, no_run_if_empty: false, shell_path: "/bin/bash", shell_argument: "-c", command_and_initial_arguments: ["echo", ":::", "hi", "there", "how", "are", "you"] }

$ RUST_LOG=debug rust-parallel echo ::: hi there how are you | grep command_line_args:1
2024-04-27T15:43:01.017700Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1}: rust_parallel::command: begin run
2024-04-27T15:43:01.017841Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: spawned child process, awaiting completion
2024-04-27T15:43:01.018481Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: command exit status = exit status: 0
2024-04-27T15:43:01.018493Z DEBUG Command::run{cmd="/bin/echo" args=["hi"] line=command_line_args:1 child_pid=85941}: rust_parallel::command: end run
```
## Error handling

The following are considered command failures and error will be logged:
* Spawn error
* Timeout
* I/O error
* Command exits with non-0 status

By default rust-parallel runs all commands even if failures occur.

When rust-parallel terminates, if any command failed it logs failure metrics and exits with status 1.

Here we try to use `cat` to show non-existing files `A`, `B`, and `C`, so each command exits with status 1:

```
$ rust-parallel cat ::: A B C
cat: A: No such file or directory
2024-04-27T15:43:01.022848Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["A"],line=command_line_args:1 exit_status=1
cat: C: No such file or directory
2024-04-27T15:43:01.022921Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["C"],line=command_line_args:3 exit_status=1
cat: B: No such file or directory
2024-04-27T15:43:01.023007Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["B"],line=command_line_args:2 exit_status=1
2024-04-27T15:43:01.023038Z ERROR rust_parallel: fatal error in main: command failures: commands_run=3 total_failures=3 spawn_errors=0 timeouts=0 io_errors=0 exit_status_errors=3

$ echo $?
1
```
The `--exit-on-error` option can be used to exit after one command fails.

rust-parallel waits for in-progress commands to finish before exiting and then exits with status 1.
```
$ head -100 /usr/share/dict/words | rust-parallel --exit-on-error cat
cat: a: No such file or directory
2024-04-27T15:43:01.026706Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["a"],line=stdin:2 exit_status=1
cat: aam: No such file or directory
2024-04-27T15:43:01.026824Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aam"],line=stdin:6 exit_status=1
cat: A: No such file or directory
2024-04-27T15:43:01.026932Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["A"],line=stdin:1 exit_status=1
cat: aa: No such file or directory
2024-04-27T15:43:01.027216Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aa"],line=stdin:3 exit_status=1
cat: aal: No such file or directory
2024-04-27T15:43:01.027285Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aal"],line=stdin:4 exit_status=1
cat: aardvark: No such file or directory
2024-04-27T15:43:01.027390Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aardvark"],line=stdin:8 exit_status=1
cat: aalii: No such file or directory
2024-04-27T15:43:01.027460Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aalii"],line=stdin:5 exit_status=1
cat: Aani: No such file or directory
2024-04-27T15:43:01.027537Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["Aani"],line=stdin:7 exit_status=1
cat: aardwolf: No such file or directory
2024-04-27T15:43:01.027850Z ERROR rust_parallel::output::task: command failed: cmd="/bin/cat",args=["aardwolf"],line=stdin:9 exit_status=1
2024-04-27T15:43:01.027893Z ERROR rust_parallel: fatal error in main: command failures: commands_run=9 total_failures=9 spawn_errors=0 timeouts=0 io_errors=0 exit_status_errors=9

$ echo $?
1
```

## Timeout

The `-t`/`--timeout-seconds` option can be used to specify a command timeout in seconds.  If any command times out this is considered a command failure (see [error handling](#error-handling)).

```
$ rust-parallel -t 0.5 sleep ::: 0 3 5
2024-04-27T15:43:01.533003Z ERROR rust_parallel::command: child process error command: cmd="/bin/sleep",args=["5"],line=command_line_args:3 error: timeout: deadline has elapsed
2024-04-27T15:43:01.533011Z ERROR rust_parallel::command: child process error command: cmd="/bin/sleep",args=["3"],line=command_line_args:2 error: timeout: deadline has elapsed
2024-04-27T15:43:01.533457Z ERROR rust_parallel: fatal error in main: command failures: commands_run=3 total_failures=2 spawn_errors=0 timeouts=2 io_errors=0 exit_status_errors=0

$ echo $?
1
```

## Path Cache

By default as commands are run the full paths are resolved using [which](https://github.com/harryfei/which-rs).  Resolved paths are stored in a cache to prevent duplicate resolutions.  This is generally [good for performance](https://github.com/aaronriekenberg/rust-parallel/wiki/Benchmarks).

The path cache can be disabled using the `--disable-path-cache` option.

## Progress bar

The `-p`/`--progress-bar` option can be used to enable a graphical progress bar.

This is best used for commands which are running for at least a few seconds, and which do not produce output to stdout or stderr.  In the below commands `-d all` is used to discard all output from commands run.

Progress styles can be chosen with the `PROGRESS_STYLE` environment variable.  If `PROGRESS_STYLE` is not set it defaults to `light_bg`.

The following progress styles are available:
* `PROGRESS_STYLE=light_bg` good for light terminal background with colors, spinner, and steady tick enabled:
![light_bg](https://github.com/aaronriekenberg/rust-parallel/blob/main/screenshots/light_background_progress_bar.png)

* `PROGRESS_STYLE=dark_bg` good for dark terminal background with colors, spinner, and steady tick enabled:
![dark_bg](https://github.com/aaronriekenberg/rust-parallel/blob/main/screenshots/dark_background_progress_bar.png)

* `PROGRESS_STYLE=simple` good for simple or non-ansi terminals/jobs with colors, spinner, and steady tick disabled:
![simple](https://github.com/aaronriekenberg/rust-parallel/blob/main/screenshots/simple_progress_bar.png)

## Regular Expression

Regular expressions can be specified by the `-r` or `--regex` command line argument.

[Named or numbered capture groups](https://docs.rs/regex/latest/regex/#grouping-and-flags) are expanded with data values from the current input before the command is executed.

### Named Capture Groups

In these examples using command line arguments `{url}` and `{filename}` are named capture groups.  `{0}` is a numbered capture group.

```
$ rust-parallel -r '(?P<url>.*),(?P<filename>.*)' echo got url={url} filename={filename} ::: URL1,filename1 URL2,filename2
got url=URL1 filename=filename1
got url=URL2 filename=filename2

$ rust-parallel -r '(?P<url>.*) (?P<filename>.*)' echo got url={url} filename={filename} full input={0} ::: URL1 URL2 ::: filename1 filename2
got url=URL1 filename=filename1 full input=URL1 filename1
got url=URL1 filename=filename2 full input=URL1 filename2
got url=URL2 filename=filename2 full input=URL2 filename2
got url=URL2 filename=filename1 full input=URL2 filename1
```
### Numbered Capture Groups

In the next example input file arguments `{0}` `{1}` `{2}` `{3}` are numbered capture groups, and the input is a csv file:
```
$ cat >./test <<EOL
foo,bar,baz
foo2,bar2,baz2
foo3,bar3,baz3
EOL

$ cat test | rust-parallel -r '(.*),(.*),(.*)' echo got arg1={1} arg2={2} arg3={3} full input={0}
got arg1=foo arg2=bar arg3=baz full input=foo,bar,baz
got arg1=foo2 arg2=bar2 arg3=baz2 full input=foo2,bar2,baz2
got arg1=foo3 arg2=bar3 arg3=baz3 full input=foo3,bar3,baz3
```
### Capture Group Special Characters

All occurrences of capture groups are replaced as exact strings.  Surrounding characters have no effect on this.

This means capture groups can be nested with other `{` or `}` characters such as when building json:
```
$ cat >./test <<EOL
1,2,3
4,5,6
7,8,9
EOL

$ cat test | rust-parallel -r '(.*),(.*),(.*)' echo '{"one":{1},"two":{2},"nested_object":{"three":{3}}}'
{"one":1,"two":2,"nested_object":{"three":3}}
{"one":4,"two":5,"nested_object":{"three":6}}
{"one":7,"two":8,"nested_object":{"three":9}}
```
## Shell Commands

Shell commands can be written using `-s` shell mode.

Multiline commands can be written using `;`.

Environment variables, `$` characters, nested commands and much more are possible:
```
$ rust-parallel -s -r '(?P<arg1>.*) (?P<arg2>.*)' 'FOO={arg1}; BAR={arg2}; echo "FOO = ${FOO}, BAR = ${BAR}, shell pid = $$, date = $(date)"' ::: A B ::: CAT DOG
FOO = B, BAR = DOG, shell pid = 85989, date = Sat Apr 27 10:43:01 CDT 2024
FOO = A, BAR = DOG, shell pid = 85991, date = Sat Apr 27 10:43:01 CDT 2024
FOO = B, BAR = CAT, shell pid = 85990, date = Sat Apr 27 10:43:01 CDT 2024
FOO = A, BAR = CAT, shell pid = 85988, date = Sat Apr 27 10:43:01 CDT 2024
```
## Bash Function

`-s` shell mode can be used to invoke an arbitrary bash function.

Similar to normal commands bash functions can be called using stdin, input files, or from command line arguments.
### Function Setup

Define a bash fuction `logargs` that logs all arguments and make visible with `export -f`:

```
$ logargs() {
  echo "logargs got $@"
}

$ export -f logargs
```
### Demo of command line arguments:

```
$ rust-parallel -s logargs ::: A B C ::: D E F
logargs got A D
logargs got B D
logargs got B E
logargs got A E
logargs got C E
logargs got B F
logargs got C D
logargs got A F
logargs got C F
```
### Demo of function and command line arguments from stdin:
```
$ cat >./test <<EOL
logargs hello alice
logargs hello bob
logargs hello charlie
EOL

$ cat test | rust-parallel -s
logargs got hello alice
logargs got hello bob
logargs got hello charlie
```

### Demo of function and initial arguments on command line, additional arguments from stdin:
```
$ cat >./test <<EOL
alice
bob
charlie
EOL

$ cat test | rust-parallel -s logargs hello
logargs got hello alice
logargs got hello bob
logargs got hello charlie
```
Clone this wiki locally