Skip to content

Examples

Aaron Riekenberg edited this page Sep 17, 2023 · 38 revisions
  1. Find largest directories
  2. Running diff on all files in 2 directories
  3. Processing CSV inputs using regular expression
  4. Convert CSV input to json with regex
  5. Computation on list of files
  6. Rename files in directory
  7. Compress files from find command

Find largest directories

Display subdirectories at max depth 1 and display from smallest to largest disk usage with du:

$ find .  -maxdepth 1 -type d | rust-parallel du -sh  | sort -h

Running diff on all files in 2 directories

Suppose we have two directories dir1 and dir2 within the current directory.

This command finds all files within dir1 and diffs with the same path in dir2.

Since -s shell mode is used, the entire quoted string is run with bash as a single command. The -r regular expression extracts everything after the first / character into the {1} capture group:

$ find dir1 -type f | rust-parallel -s -r '^[^/]*\/(.*)$' 'echo diffing {1} ; diff --color=always dir1/{1} dir2/{1} ; echo diff {1} returned $? '

Processing CSV inputs using regular expression

Suppose we have an input CSV file of http method and URL, this could be used to make parallel calls with curl. Here {method} and {url} are named regular expression capture groups, -j3 is maximum 3 parallel jobs, -t5 is a 5 second timeout:

$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1
PUT,http://example.com/endpoint2
POST,http://example.com/endpoint3
EOL

$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*)' -j3 -t5 curl -X {method} {url}

Convert CSV input to json with regex

Regular expression capture groups may be nested inside json structures, for example:

$ cat >./test.csv <<EOL
GET,http://example.com/endpoint1
PUT,http://example.com/endpoint2
POST,http://example.com/endpoint3
EOL

$ cat test.csv | rust-parallel --regex '(?P<method>.*),(?P<url>.*)' echo '{"method":"{method}","url":"{url}","id":12345}'

Computation on list of files

Suppose we have a bash function analyze_file and a list of *.txt files to analyze in current directory. This example uses --jobs 4 to control max parallel jobs, --shell to call a bash function, --progress-bar to display a graphical progress bar, and --timeout-seconds to kill each job if not finished after 5 minutes.

$ analyze_file() {
  # do some expensive analysis of file $1 parameter
}

$ export -f analyze_file

$ rust-parallel --jobs 4 --shell --progress-bar --timeout-seconds $((5*60)) analyze_file ::: *.txt

Rename files in directory

Rename files in current directory from from *.txt to *.csv.

{0} capture group is entire *.txt file name, {1} capture group is prefix of file name before .txt:

$ rust-parallel -r '(.*)\.(.*)' mv {0} {1}.csv ::: *.txt

Compress files from find command

Use find to find all files in current directory and subdirectories. The -0 option works nicely with find -print0 to handle filenames that may have whitespace characters. Call gzip -f -k on each file from find command:

$ find . -type f -print0 | rust-parallel -0 gzip -f -k
Clone this wiki locally