download


Download Large Jupyter Workspace files

Recently, we were working on a Jupyter Workspace at anyscale-training.com/jupyter/lab. As there was no option to download all files of the workspace nor there was a way to create an archive from the GUI, we followed the procedure below (that we also use on Coursera.org and works like a charm):

First, we clicked on the blue button with the + sign in it.
That opened the Launcher tab that is visible on the image above.
From there, we clicked on the Terminal button under the Other category.

In the terminal, we executed the following command to create a compressed archive of all the files we needed to download:

tar -czf Ray-RLLib-Tutorials.tar.gz ray_tutorial/ Ray-Tutorial/ rllib_tutorials/;

After the command completed its execution, we could see our archive on the left list of files. By right-clicking it we we are able to initiate its download. Unfortunately, after the first 20MB the download would always crash! To fix this issue, we split the archive to multiple archives of 10MB each, then downloaded them individually and finally merged them back together on our PC. The command to split the compressed archive to multiple smaller archives of fixed size was the following:

tar -czf - ray_tutorial/ Ray-Tutorial/ rllib_tutorials/ | split --bytes=10MB - Ray-RLLib-Tutorials.tar.gz.;

After downloading those files one by one by right-clicking on them and then selecting the Download option we recreated the original structure on our PC using the following command:

cat Ray-RLLib-Tutorials.tar.gz.* | tar xzvf -;

To clean up both the remote Server and our Local PC, we issued the following command:

rm Ray-RLLib-Tutorials.tar.gz.*;

This is a guide on how to download a very big Jupyter workspace by splitting it to multiple smaller files using the console.


Some amazing IEEE magazines for good reads

ComputingEdge magazine by IEEE Computer Society

At the following link (https://www.computer.org/web/computingedge/current-issue) you will find all of the issues of the ComputingEdge magazine of IEEE.

This publication is produced by the Computer Society and we believe you will find it to be very interesting to read.

ComputingEdge curates the hot technology knowledge from the 13 leading technology publications of Computer Society, plus adds unique original content, and makes it available in a single experience. The features and columns in ComputingEdge always emphasize the newest developments and current trends. ComputingEdge keeps you up to date by showing you what’s hot and what you need to know across the technology spectrum. It’s both an informative and an enjoyable read. ComputingEdge caters to your need-to-know key information about all aspects of the technology arena so you can make integrated decisions regarding your areas of specialty.

Modified from: https://www.computer.org/web/computingedge/

We’ve been reading this magazine for some time now and there is always a couple of articles that get our attention.

 


Fedora 27: Setup stackskills-dl

A couple of days ago we were asked to setup stackskills-dl on a Fedora 27 (x64).
Apparently stackskills-dl is a Ruby script that allows a registered user to download the StackSkills tutorials for which the user has access to.

Following the instructions at https://github.com/yoonwaiyan/stackskills-dl are not enough to get the application running as the json gem and the Ruby development files appear to be missing from the filesystem.

Solution: Below are the steps we followed to setup stackskills-dl and make it operational:


sudo dnf install gem ruby-devel youtube-dl wget;
gem install json;
gem install bundler;
git clone https://github.com/yoonwaiyan/stackskills-dl.git;
cd stackskills-dl/;
bundle install;

After the above steps were completed, we were able to use stackskills-dl from the clone/installation folder normally:


[george@banana stackskills-dl]$ ruby stackskills_dl.rb -u "[email protected]" -p "e#rf54HTw3se!fe678f." -s https://stackskills.com/courses/enrolled/007;
Loaded login credentials from environment variables.
Login Successfully.
Finding https://stackskills.com/courses/enrolled/007 from your list of courses
Number of courses found: 1
...

[george@banana stackskills-dl]$ ruby stackskills_dl.rb --help
Usage: ruby stackskills_dl.rb [options]
-u, --email NAME Email
-p, --password PASSWORD Password
-c, --course COURSE_URL Course URL in ID.
-s, --course-slug COURSE_SLUG Course URL in slug.

With out the Ruby json gem you would get the following error:


[george@banana stackskills-dl]$ ruby stackskills_dl.rb --help;
/usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- json (LoadError)
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:226:in `load_from_json'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:63:in `block in load_json'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:62:in `each'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:62:in `load_json'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:88:in `load'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types/loader.rb:113:in `load'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types.rb:296:in `load_default_mime_types'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types.rb:323:in `<class:Types>'
from /home/george/.gem/ruby/2.4.0/gems/mime-types-2.99.1/lib/mime/types.rb:63:in `<top (required)>'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /home/george/.gem/ruby/2.4.0/gems/mechanize-2.7.4/lib/mechanize/pluggable_parsers.rb:5:in `<top (required)>'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /home/george/.gem/ruby/2.4.0/gems/mechanize-2.7.4/lib/mechanize.rb:1361:in `<top (required)>'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:133:in `require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:133:in `rescue in require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:40:in `require'
from /home/george/Videos/stackskills-dl/lib/course_finder.rb:1:in `<top (required)>'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:55:in `require'
from stackskills_dl.rb:4:in `<main>'


Automatically download possibly a whole public website using wget recursively

wget -r -k -np --user-agent="Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X; en-us) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53" --wait=2 --limit-rate=200K --recursive --no-clobber --page-requisites --convert-links --domains bytefreaks.net https://bytefreaks.net/;

Introduction:

The “wget” command is a powerful tool used to download files and web pages from the internet. It is commonly used in Linux/Unix environments but can also be used on other operating systems. The command comes with various options and parameters that can be customized to suit your specific download requirements. In this post, we will discuss the wget command with a breakdown of its various options, and how to use it to download files and web pages.

Command Explanation:

Here is a detailed explanation of the options used in the command:

  1. “-r” : This option is used to make the download recursive, which means that it will download the entire website.
  2. “-k” : This option is used to convert the links in the downloaded files so that they point to the local files. This is necessary to ensure that the downloaded files can be viewed offline.
  3. “-np” : This option prevents wget from ascending to the parent directory when downloading. This is helpful when you want to limit the download to a specific directory.
  4. “–user-agent” : This option allows you to specify the user agent string that wget will use to identify itself to the server. In this case, the user agent string is set to a mobile device (iPhone).
  5. “–wait” : This option adds a delay (in seconds) between requests. This is useful to prevent the server from being overloaded with too many requests at once.
  6. “–limit-rate” : This option is used to limit the download speed to a specific rate (in this case, 200K).
  7. “–recursive” : This option is used to make the download recursive, which means that it will download the entire website.
  8. “–no-clobber” : This option prevents wget from overwriting existing files.
  9. “–page-requisites” : This option instructs wget to download all the files needed to display the webpage, including images, CSS, and JavaScript files.
  10. “–convert-links” : This option is used to convert the links in the downloaded files so that they point to the local files. This is necessary to ensure that the downloaded files can be viewed offline.
  11. “–domains” : This option allows you to specify the domain name(s) that you want to download.
  12. https://bytefreaks.net/” : This is the URL of the website that you want to download.

Conclusion:

The wget command is a powerful tool that can be used to download files and web pages from the internet. By using the various options and parameters available, you can customize your download to suit your specific requirements. In this post, we have discussed the wget command and its various options, and how to use it to download files and web pages. We hope that this post has been helpful and informative, and that it has given you a better understanding of the wget command.

Same command without setting the user agent:

The following command will try to download a full website with all pages it can find through public links.

wget --wait=2 --limit-rate=200K --recursive --no-clobber --page-requisites --convert-links --domains example.com http://example.com/;

Parameters:

  • --wait Wait the specified number of seconds between the retrievals.  We use this option to lighten the server load by making the requests less frequent.
  • --limit-rate Limit the download speed to amount bytes per second. We use this option to lighten the server load and to reduce the bandwidth we consume on our own network.
  • --recursive Turn on recursive retrieving.
  • --no-clobber If a file is downloaded more than once in the same directory, we prevent multiple version saving.
  • --page-requisites This option causes Wget to download all the files that are necessary to properly display a given HTML page.
  • --convert-links After the download is complete, convert the links in the document to make them suitable for local viewing.
  • --domains Set domains to be followed.  It accepts a domain-list as a comma-separated list of domains.