COMPOSE_HTTP_TIMEOUT


Rough notes on setting up an Ubuntu 22.04LTS server with docker and snap 1

IP allocations

First, we set up a static IP on the network device that would handle all external traffic and a DHCP on the network device that would access the management network, which is connected for maintenance.

To do so, we created the following file:

/etc/netplan/01-netcfg.yaml

using the following command:

sudo nano /etc/netplan/01-netcfg.yaml;

and added the following content to it:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: no
      addresses: [192.168.45.13/24]
      gateway4: 192.168.45.1
      nameservers:
          addresses: [1.1.1.1,8.8.8.8]
    eth1:
      dhcp4: yes

To apply the changes, we executed the following:

sudo netplan apply;

Update everything (the operating system and all packages)

Usually, it is a good idea to update your system before making significant changes to it:

sudo apt update -y; sudo apt upgrade -y; sudo apt autoremove -y;

Install docker via snap

In this setup, we did not use the docker version available on the Ubuntu repositories, we went for the ones from the snap. To install it, we used the following commands:

sudo apt install snapd;
sudo snap install docker;

Increase network pool for docker daemon

To handle the following problem:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

We modified the following file

/var/snap/docker/current/config/daemon.json

using the command:

sudo nano /var/snap/docker/current/config/daemon.json;

and set the content to be as follows:

{
    "log-level":        "error",
    "storage-driver":   "overlay2",
    "default-address-pools": [
        {
            "base": "172.80.0.0/16",
            "size": 24
        },
        {
            "base": "172.90.0.0/16",
            "size": 24
        }
    ]
}

We executed the following command to restart the docker daemon and get the network changes applied:

sudo snap disable docker;
sudo snap enable docker;

Gave access to our user to manage the docker

We added our user to the docker group so that we could manage the docker daemon without sudo rights.

sudo addgroup --system docker;
sudo adduser $USER docker;
newgrp docker;
sudo snap disable docker;
sudo snap enable docker;

After that, we made sure that the access rights to the volumes were correct:

sudo chown -R www-data:www-data /volumes/*
sudo chown -R tux:tux /volumes/letsencrypt/ /volumes/reverse/private/

Deploying

After we copied everything in place, we executed the following command to create our containers and start them with the appropriate networks and volumes:

export COMPOSE_HTTP_TIMEOUT=600;
docker-compose up -d --remove-orphans;

We had to increase the timeout as we were getting the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Updating the databases and performing any repairs

First, we connected to a terminal of the database container using the following command:

docker exec -it mariadb_c1 /bin/bash;

From there, we executed the following commands:

mysql_upgrade --user=root --password;
mysqlcheck -p -o --all-databases;

Bulk / Batch stopping docker containers

The following commands will help you stop many docker containers simultaneously. Of course, you can change the command stop to another, for example rm or whatever suits your needs.

You need to keep in mind that if you have dependencies between containers, you might need to execute the commands below more than once.

Stop all docker containers.

docker container stop $(docker container ls -q);
#This command creates a list of all containers.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

Stop specific docker containers using a filter on their name.

docker container stop $(docker container ls -q --filter name=_web);
#This command finds all containers that their name contains _web.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

A personal note

Check the system for things you might need to configure, like a crontab or other services.

A script that handles privileges on the docker volumes

To avoid access problems with the various external volumes we created the mysql user and group on the host machine as follows:

sudo groupadd -g 999 mysql;
sudo useradd -u 999 mysql -g mysql;

Then we execute the following to repair ownership issues with our containers. Please note that this script is custom to a particular installation and might not meet your needs.

#!/bin/bash

sudo chown -R www-data:www-data ~/volumes/*;
sudo chown -R bob:bob ~/volumes/letsencrypt/ ~/volumes/reverse/private/;
find ~/volumes/ -maxdepth 2 -type d -name mysql -exec sudo chown -R mysql:mysql '{}' \;;

ERROR: for container_a UnixHTTPConnectionPool(host=’localhost’, port=None): Read timed out. (read timeout=60)

There is this docker server that we have access to, which probably due to lousy planning, we put way too many containers on it. The server does not have SSD disks, and for that reason, whenever there are too many IO operations, it becomes unresponsive. When we mass update all containers by updating the images using the following command and then issuing a fresh docker-compose, we get a lot of time-out errors.

The commands we use to update the images and recreate our containers using the new images are the following:
(Please note that these commands need to execute from the folder where the file docker-compose.yml resides)

#Update all docker images that have the 'latest' tag
docker images --format "{{.Repository}}:{{.Tag}}" | grep ':latest' | xargs -L1 docker pull;
#Rebuild all containers using the new images.
docker-compose up -d;

After executing the second command, we often get many copies of the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

This error indicates that the recreate command was waiting for too long for the docker daemon to respond with no success. At the end of the output, we can see that it was waiting for 60 seconds.

At the end of the output, we get the following information and advice:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Following the advice, we used the following command to overwrite the value of the COMPOSE_HTTP_TIMEOUT variable to a more significant number.

#Increase timeout period to 120 seconds.
export COMPOSE_HTTP_TIMEOUT=120;
#Rebuild all containers using the new images.
docker-compose up -d;

Doing so, we were able to rebuild all containers without reissuing many times the up command.

Sidenote

This server really does have a lot of containers, we had to create the file /etc/docker/daemon.json so that we would have enough network addressing space to handle all the bridges and sub-networks.

The contents of /etc/docker/daemon.json are:

{
  "default-address-pools": [
    {
      "base": "172.80.0.0/16",
      "size": 24
    },
    {
      "base": "172.90.0.0/16",
      "size": 24
    }
  ]
}

The above configuration solved the following problem for us:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network