tar


Download Large Jupyter Workspace files

Recently, we were working on a Jupyter Workspace at anyscale-training.com/jupyter/lab. As there was no option to download all files of the workspace nor there was a way to create an archive from the GUI, we followed the procedure below (that we also use on Coursera.org and works like a charm):

First, we clicked on the blue button with the + sign in it.
That opened the Launcher tab that is visible on the image above.
From there, we clicked on the Terminal button under the Other category.

In the terminal, we executed the following command to create a compressed archive of all the files we needed to download:

tar -czf Ray-RLLib-Tutorials.tar.gz ray_tutorial/ Ray-Tutorial/ rllib_tutorials/;

After the command completed its execution, we could see our archive on the left list of files. By right-clicking it we we are able to initiate its download. Unfortunately, after the first 20MB the download would always crash! To fix this issue, we split the archive to multiple archives of 10MB each, then downloaded them individually and finally merged them back together on our PC. The command to split the compressed archive to multiple smaller archives of fixed size was the following:

tar -czf - ray_tutorial/ Ray-Tutorial/ rllib_tutorials/ | split --bytes=10MB - Ray-RLLib-Tutorials.tar.gz.;

After downloading those files one by one by right-clicking on them and then selecting the Download option we recreated the original structure on our PC using the following command:

cat Ray-RLLib-Tutorials.tar.gz.* | tar xzvf -;

To clean up both the remote Server and our Local PC, we issued the following command:

rm Ray-RLLib-Tutorials.tar.gz.*;

This is a guide on how to download a very big Jupyter workspace by splitting it to multiple smaller files using the console.


Create a .tar file with different compression methods

The following commands will create .tar archives  and compress them using the different methods that are available. We provide multiple solutions, each one for a different type of .tar archive depending on the compression method that is desired.

For .tar archives

tar -c -f archive.tar $FILES_TO_ARCHIVE;

For .tar.bz2 archives

tar -c -j -f archive.tar.bz2 $FILES_TO_ARCHIVE;

For .tar.xz archives

tar -c -J -f archive.tar.xz $FILES_TO_ARCHIVE;

For .tar.gz and .tgz archives

tar -c -z -f archive.tar.gz $FILES_TO_ARCHIVE;

tar Parameters Legend

  • -z or --gzip instructs tar to filter the archive through gzip
  • -j or --bzip2 filters the archive through bzip2
  • -J or --xz filters the archive through xz
  • -f or --file=OUTPUT uses the archive file OUTPUT
  • -c or --create a new archive

Bonus Example: Create a tar.xz archive using the current date in the archive name

The following command will create an archive out of the folders Folder1 and Folder2 and then it will compress it to the .tar.xz format.
The filename of the archive will contain the current date in the format YYYY-MM-DD.

tar -c -J  -f archive.`date +%F`.tar.xz Folder1 Folder2;

The above command will result in something similar to:

archive.2017-06-04.tar.xz

How to search for specific filenames in .tar archives

The following commands will search in the .tar archives found in the specified folder and print on screen all files that their paths or filenames match our search token. We provide multiple solutions, each one for a different type of .tar archive depending on the compression used.

For .tar archives

find /media/repository/packages/ -type f -iname "*.tar" -exec tar -t -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.bz2 archives

find /media/repository/packages/ -type f -iname "*.tar.bz2" -exec tar -t -j -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.xz archives

find /media/repository/packages/ -type f -iname "*.tar.xz" -exec tar -t -J -f '{}' \; | grep "configurations/arm-cortexa9";

For .tar.gz and .tgz archives

Please note that this commands uses the -o (which is the logical or) parameter on find to search for multiple filename extensions.

find /media/repository/packages/ -type f \( -iname "*.tar.gz" -o -iname "*.tgz" \) -exec tar -t -z -f '{}' \; | grep "configurations/arm-cortexa9";

find Parameters Legend

  • -type f filters out any result which is not a regular file
  • -exec command '{}' \; runs the specified command on the results of find. The string '{}' is replaced by the current file name being processed.
  • -o is the logical Or operator. The second expression  is not evaluated if the first expression is true.

tar Parameters Legend

  • -z or --gzip instructs tar to filter the archive through gzip
  • -j or --bzip2 filters the archive through bzip2
  • -J or --xz filters the archive through xz
  • -t or --list lists the contents of an archive
  • -f or --file=INPUT uses the archive file or device named INPUT

List the contents of a tar or tar.gz file

List the contents of a tar file

<code>tar -tvf file.tar</code>

List the contents of a tar.gz file

<code>tar -ztvf file.tar.gz</code>

List the contents of a tar.bz2 file

<code>tar -jtvf file.tar.bz2</code>

Options:
-t
List the contents of an archive
-v
Verbose mode
-z
Use gzip so that you can process a compressed (.gz) tar file
-j
Use bzip2, use to decompress .bz2 files
-f
filename Use archive file called filename