static


Deep Dive into Wget: Mirroring Websites for Offline Access

In the realm of command-line utilities, wget stands out as a versatile tool for downloading files and websites from the internet. Whether you’re a developer, a researcher, or just someone looking to have offline access to web resources, understanding how to use effectively wget can greatly enhance your workflow. Today, we’re exploring a potent combination of flags: -mpEk, applied to mirroring the European Cyber Security Challenge (ECSC) website.

Understanding Wget

wget is a non-interactive network downloader that allows you to download web files. It supports HTTP, HTTPS, and FTP protocols and retrieval through HTTP proxies. It’s designed to be robust in handling transient network issues and can resume interrupted downloads, making it a reliable tool for comprehensive tasks like mirroring entire websites.

Breaking Down the Command: wget -mpEk https://challenges.ecsc.eu/

Let’s dissect the command wget -mpEk https://challenges.ecsc.eu/ to understand the role of each option:

  • -m (--mirror): This option turns on options suitable for mirroring websites, which includes infinite recursion depth, timestamping, and keeping the server’s directory listing, among other settings. It’s designed to make a replica of the site for offline viewing.
  • -p (--page-requisites): This tells wget to download all the files that are necessary to properly display a given HTML page. This includes such things as in-page images, stylesheets, and scripts.
  • -E (--adjust-extension): When saving files, wget will automatically adjust the extensions of HTML/HTML-like files (.html or .htm) to .html if they don’t already have one. This ensures that locally saved web pages are easily identifiable and accessible.
  • -k (--convert-links): After the download is complete, this option converts the links in the downloaded website, making them suitable for offline viewing. It adjusts links to images, stylesheets, and other web page components to point to local files.
  • https://challenges.ecsc.eu/: This is the URL of the website you want to mirror. In this example, it’s the homepage of the European Cyber Security Challenge, a notable event in the cybersecurity field.

Practical Applications

Why would someone want to use wget with these specific options? Here are a few scenarios:

  • Offline Viewing: For individuals who want to access the ECSC challenge website without an internet connection, perhaps for educational purposes or to ensure they have access to the content during travel.
  • Web Development: Developers might mirror a website to test website migration, analyze the structure of a website, or archive content before a major update.
  • Research and Archiving: Researchers or archivists may use wget to preserve digital content that’s at risk of being updated or removed.

Conclusion

The wget -mpEk https://challenges.ecsc.eu/ command showcases the power of wget for downloading and mirroring web content for offline use. By understanding and utilizing these options, users can efficiently archive entire websites, ensuring content is accessible regardless of their internet connectivity. Whether for professional use, educational purposes, or personal archiving, mastering wget commands like these opens up a world of possibilities for accessing and preserving online content.


This blog post aims to provide a comprehensive overview of the wget -mpEk command, making it accessible and understandable for readers who might not be familiar with command-line tools or the specific nuances of website mirroring.


Rough notes on setting up an Ubuntu server with docker

Static IP

First, we set up a static IP to our Ubuntu server using netplan. To do so, we created the following file:

/etc/netplan/01-netcfg.yaml

using the following command

sudo nano /etc/netplan/01-netcfg.yaml;

and added the following content to it:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp3s0f0:
      dhcp4: no
      addresses: [192.168.45.13/24]
      gateway4: 192.168.45.1
      nameservers:
          addresses: [1.1.1.1,8.8.8.8]

To apply the changes, we executed the following:

sudo netplan apply;

Update everything (the operating system and all packages)

Usually, it is a good idea to update your system before making significant changes to it:

sudo apt update -y; sudo apt upgrade -y; sudo apt autoremove -y;

Install docker

In this setup we did not use the docker version that is available on the Ubuntu repositories, we went for the official ones from docker.com. To install it, we used the following commands:

sudo apt-get install ca-certificates curl gnupg lsb-release;
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg;
echo   "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null;
sudo apt-get update;
sudo apt-get install docker-ce docker-ce-cli containerd.io;

Install docker-compose

Again, we installed the official docker-compose from github.com instead of the one available in the Ubuntu repositories. At the time that this post was created, version 1.29.2 was the recommended one:

sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose;
sudo chmod +x /usr/local/bin/docker-compose;

Increase network pool for docker daemon

To handle the following problem:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

We created the following file,

/etc/docker/daemon.json

using the command:

sudo nano /etc/docker/daemon.json;

and added the following content to it:

{
  "default-address-pools": [
    {
      "base": "172.80.0.0/16",
      "size": 24
    },
    {
      "base": "172.90.0.0/16",
      "size": 24
    }
  ]
}

We executed the following command to restart the docker daemon and get the network changes applied:

sudo systemctl restart docker;

Gave access to our user to manage docker

We added our user to the docker group so that we could manage the docker daemon without sudo rights.

sudo usermod -aG docker $USER;

Deploying

After we copied everything in place, we executed the following command to create our containers and start them with the appropriate networks and volumes:

export COMPOSE_HTTP_TIMEOUT=120;
docker-compose up -d --remove-orphans;

We had to increase the timeout as we were getting the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Stopping all containers using a filter on the name

docker container stop $(docker container ls -q --filter name=_web);

The above command will find all containers whose names contain _web and stop them. That command is actually two commands where one is nested inside the other.

#This command finds all containers that their name contains _web, using the -q parameter, we only get back the container ID and not all information about them.
docker container ls -q --filter name=_web;
#The second command takes as input the output of the nested command and stops all containers that are returned.
docker container stop $(docker container ls -q --filter name=_web);

Manually set the CMake output folder 1

If you want to manually set the global output folder for you whole CMake project and depending on the output you expect add the following configuration lines in the root CMakeLists.txt file of your project:

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

In case you wan to specify those folders per target, you can update them as follows:

set_target_properties( target_or_targets
  PROPERTIES
  ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib"
  LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib"
  RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
)

Please note that we are setting the same properties using different variables.

The CMAKE_ARCHIVE_OUTPUT_DIRECTORY variable is used to initialize the ARCHIVE_OUTPUT_DIRECTORY property on all the targets.
ARCHIVE_OUTPUT_DIRECTORY property specifies the directory into which archive target files should be built.
An archive output artifact of a buildsystem target may be:

  • The static library file (e.g. .lib or .a) of a static library target created by the add_library() command with the STATIC option.
  • On DLL platforms: the import library file (e.g. .lib) of a shared library target created by the add_library() command with the SHARED option.
  • On DLL platforms: the import library file (e.g. .lib) of an executable target created by the add_executable() command when its ENABLE_EXPORTS target property is set.

 

The CMAKE_LIBRARY_OUTPUT_DIRECTORY variable is used to initialize the LIBRARY_OUTPUT_DIRECTORY property on all the targets.
LIBRARY_OUTPUT_DIRECTORY property specifies the directory into which library target files should be built.
A library output artifact of a buildsystem target may be:
The loadable module file (e.g. .dll or .so) of a module library target created by the add_library() command with the MODULE option.
On non-DLL platforms: the shared library file (e.g. .so or .dylib) of a shared shared library target created by the add_library() command with the SHARED option.

 

The CMAKE_RUNTIME_OUTPUT_DIRECTORY variable is used to initialize the RUNTIME_OUTPUT_DIRECTORY property on all the targets.
RUNTIME_OUTPUT_DIRECTORY property specifies the directory into which runtime target files should be built.
A runtime output artifact of a buildsystem target may be:

  • The executable file (e.g. .exe) of an executable target created by the add_executable() command.
  • On DLL platforms: the executable file (e.g. .dll) of a shared library target created by the add_library() command with the SHARED option.

From: https://cmake.org/documentation/


CentOS 7: C++: static linking cannot find -lstdc++ -lm and -lc

Recently, we were trying to compile a C++ application with the following compilation command on a CentOS 7 64bit :


g++ -static -O2 -lm -Wall -Wno-unused-result -std=c++11 -DCS_ACADEMY -DONLINE_JUDGE 510152025.cpp -o 510152025;

unfortunately, we got the following errors:

 /usr/bin/ld: cannot find -lstdc++
 /usr/bin/ld: cannot find -lm
 /usr/bin/ld: cannot find -lc
 collect2: error: ld returned 1 exit status

To resolve the issues, we performed the following installations to install the static versions of the glibc and libstdc libraries:


sudo yum install glibc-static libstdc++-static -y;