docker-compose


Troubleshooting Access Rights and Memory Allocation for Elastic Enterprise Search using Docker Compose

Introduction

Setting up Elastic Enterprise Search using Docker Compose provides a convenient and scalable solution for deploying Elasticsearch’s Enterprise Search. However, during our implementation of this setup, we encountered issues related to access rights and memory allocation. In this blog post, we will walk you through the steps we took to resolve these challenges and successfully configure the system.

Problem: Access Rights

The initial hurdle we encountered was related to access rights when running Elastic Enterprise Search using Docker Compose. While following the official guide provided by Elastic, we realized that certain permissions were causing problems. As a result, we could not access the files and directories required for proper functioning.

Solution: Modifying the YML and Adjusting Access Rights

We modified the YAML file used in the Docker Compose setup to overcome the access rights issue. Specifically, we relocated all volumes to a folder our user owns, ensuring we had complete control and ownership over these files. Additionally, we adjusted the access rights to 777, granting all users full read, write, and execute permissions. Please note that this is used for a development environment that will be destroyed later.

version: '2.2'

services:

  setup:
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/elasticsearch/config/certs
    user: "0"
    command: >
      bash -c '
        if [ x${ELASTIC_PASSWORD} == x ]; then
          echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
          exit 1;
        elif [ x${KIBANA_PASSWORD} == x ]; then
          echo "Set the KIBANA_PASSWORD environment variable in the .env file";
          exit 1;
        fi;
        if [ ! -f certs/ca.zip ]; then
          echo "Creating CA";
          bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
          unzip config/certs/ca.zip -d config/certs;
        fi;
        if [ ! -f certs/certs.zip ]; then
          echo "Creating certs";
          echo -ne \
          "instances:\n"\
          "  - name: es01\n"\
          "    dns:\n"\
          "      - es01\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;
          bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
          unzip config/certs/certs.zip -d config/certs;
        fi;
        echo "Setting file permissions"
        chown -R root:root config/certs;
        find . -type d -exec chmod 750 \{\} \;;
        find . -type f -exec chmod 640 \{\} \;;
        echo "Waiting for Elasticsearch availability";
        until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Setting kibana_system password";
        until curl -s -X POST --cacert config/certs/ca/ca.crt -u elastic:${ELASTIC_PASSWORD} -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
        echo "All done!";
      '
    healthcheck:
      test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
      interval: 1s
      timeout: 5s
      retries: 120

  es01:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/elasticsearch/config/certs
      - /home/tux/docker/volumes/es/esdata01:/usr/share/elasticsearch/data
    ports:
      - ${ES_PORT}:9200
    environment:
      - node.name=es01
      - cluster.name=${CLUSTER_NAME}
      - cluster.initial_master_nodes=es01
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es01/es01.key
      - xpack.security.http.ssl.certificate=certs/es01/es01.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es01/es01.key
      - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  kibana:
    depends_on:
      es01:
        condition: service_healthy
    image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/kibana/config/certs
      - /home/tux/docker/volumes/es/kibanadata:/usr/share/kibana/data
    ports:
      - ${KIBANA_PORT}:5601
    environment:
      - SERVERNAME=kibana
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - ENTERPRISESEARCH_HOST=http://enterprisesearch:${ENTERPRISE_SEARCH_PORT}
      - xpack.reporting.kibanaServer.hostname=localhost
    mem_limit: ${MEM_LIMIT}
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  enterprisesearch:
    depends_on:
      es01:
        condition: service_healthy
      kibana:
        condition: service_healthy
    image: docker.elastic.co/enterprise-search/enterprise-search:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/enterprise-search/config/certs
      - /home/tux/docker/volumes/es/enterprisesearchdata:/usr/share/enterprise-search/config
    ports:
      - ${ENTERPRISE_SEARCH_PORT}:3002
    environment:
      - SERVERNAME=enterprisesearch
      - secret_management.encryption_keys=[${ENCRYPTION_KEYS}]
      - allow_es_settings_modification=true
      - elasticsearch.host=https://es01:9200
      - elasticsearch.username=elastic
      - elasticsearch.password=${ELASTIC_PASSWORD}
      - elasticsearch.ssl.enabled=true
      - elasticsearch.ssl.certificate_authority=/usr/share/enterprise-search/config/certs/ca/ca.crt
      - kibana.external_url=http://kibana:5601
    mem_limit: ${MEM_LIMIT}
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s -I http://localhost:3002 | grep -q 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

volumes:
  certs:
    driver: local
  enterprisesearchdata:
    driver: local
  esdata01:
    driver: local
  kibanadata:
    driver: local

Problem: Memory Allocation

In addition to the access rights challenge, we also encountered an issue related to memory allocation. The default settings for the virtual memory map count (vm.max_map_count) were insufficient for our requirements, leading to errors during the deployment.

Solution: Increasing Memory Allocation

To address the memory allocation problem, we followed the guide provided by Elastic on how to increase the vm.max_map_count. By executing the command sysctl -w vm.max_map_count=262144, we set the required value for optimal memory allocation. This adjustment allowed Elastic Enterprise Search to function properly and utilize the necessary resources.

Problem: Version 8 does not deploy correctly and fails to authenticate

When we tried to bring up the docker containers, we got the following problems with access rights:

From ES01:
{"@timestamp":"2023-05-22T05:27:10.294Z", "log.level":"ERROR", "message":"failed to retrieve password hash for reserved user [kibana_system]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][transport_worker][T#5]","log.logger":"org.elasticsearch.xpack.security.authc.esnative.ReservedRealm","trace.id":"8639207dbfa64551d7a302a1df3bda26","elasticsearch.cluster.uuid":"Zxd3v00YQyqJtFTsDw143w","elasticsearch.node.id":"v1WIc0TUReqqZFcTRzsyOw","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"es-cluster","error.type":"org.elasticsearch.action.UnavailableShardsException",

From Kibana:
[2023-05-21T13:56:30.497+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. security_exception
	Root causes:
		security_exception: unable to authenticate user [kibana_system] for REST request [/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip]
[2023-05-21T13:56:30.868+00:00][INFO ][plugins.screenshotting.chromium] Browser executable: /usr/share/kibana/x-pack/plugins/screenshotting/chromium/headless_shell-linux_x64/headless_shell

Solution: Downgrade to version 7

It appears that version 7 handles correctly the access management issues, so we updated our .env file to the following:

STACK_VERSION=7.17.10
ELASTIC_PASSWORD=7XQ2aR7q2g2Q994msd942sLJoAJ7LUvD
KIBANA_PASSWORD=2M7nl2hCH11b7NLht2l8A8ZHBjAKTkxI
ES_PORT=9200
CLUSTER_NAME=es-cluster
LICENSE=basic
MEM_LIMIT=1073741824
KIBANA_PORT=5601
ENTERPRISE_SEARCH_PORT=3002
ENCRYPTION_KEYS=5df1f9f9590ac53b29db6f5eeb9f63be04a43018af3095ef2bfc8a043d38fc7c

Conclusion

While setting up Run Enterprise Search using Docker Compose can be a straightforward process, it is not uncommon to encounter challenges along the way. In this blog post, we discussed the problems we faced regarding access rights and memory allocation and the steps we took to resolve them.

By modifying the YAML file to ensure proper ownership and permissions for the required volumes and adjusting the memory allocation using the recommended command, we successfully overcame these issues. With our configuration now in place, we enjoy the benefits of Elastic Enterprise Search, leveraging its powerful capabilities for our organization’s search and discovery needs.

We hope that sharing our experience and solutions will assist others in overcoming similar obstacles and make their journey toward implementing Elastic Enterprise Search using Docker Compose smoother.


Rough notes on setting up an Ubuntu 22.04LTS server with docker and snap 1

IP allocations

First, we set up a static IP on the network device that would handle all external traffic and a DHCP on the network device that would access the management network, which is connected for maintenance.

To do so, we created the following file:

/etc/netplan/01-netcfg.yaml

using the following command:

sudo nano /etc/netplan/01-netcfg.yaml;

and added the following content to it:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: no
      addresses: [192.168.45.13/24]
      gateway4: 192.168.45.1
      nameservers:
          addresses: [1.1.1.1,8.8.8.8]
    eth1:
      dhcp4: yes

To apply the changes, we executed the following:

sudo netplan apply;

Update everything (the operating system and all packages)

Usually, it is a good idea to update your system before making significant changes to it:

sudo apt update -y; sudo apt upgrade -y; sudo apt autoremove -y;

Install docker via snap

In this setup, we did not use the docker version available on the Ubuntu repositories, we went for the ones from the snap. To install it, we used the following commands:

sudo apt install snapd;
sudo snap install docker;

Increase network pool for docker daemon

To handle the following problem:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

We modified the following file

/var/snap/docker/current/config/daemon.json

using the command:

sudo nano /var/snap/docker/current/config/daemon.json;

and set the content to be as follows:

{
    "log-level":        "error",
    "storage-driver":   "overlay2",
    "default-address-pools": [
        {
            "base": "172.80.0.0/16",
            "size": 24
        },
        {
            "base": "172.90.0.0/16",
            "size": 24
        }
    ]
}

We executed the following command to restart the docker daemon and get the network changes applied:

sudo snap disable docker;
sudo snap enable docker;

Gave access to our user to manage the docker

We added our user to the docker group so that we could manage the docker daemon without sudo rights.

sudo addgroup --system docker;
sudo adduser $USER docker;
newgrp docker;
sudo snap disable docker;
sudo snap enable docker;

After that, we made sure that the access rights to the volumes were correct:

sudo chown -R www-data:www-data /volumes/*
sudo chown -R tux:tux /volumes/letsencrypt/ /volumes/reverse/private/

Deploying

After we copied everything in place, we executed the following command to create our containers and start them with the appropriate networks and volumes:

export COMPOSE_HTTP_TIMEOUT=600;
docker-compose up -d --remove-orphans;

We had to increase the timeout as we were getting the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Updating the databases and performing any repairs

First, we connected to a terminal of the database container using the following command:

docker exec -it mariadb_c1 /bin/bash;

From there, we executed the following commands:

mysql_upgrade --user=root --password;
mysqlcheck -p -o --all-databases;

Bulk / Batch stopping docker containers

The following commands will help you stop many docker containers simultaneously. Of course, you can change the command stop to another, for example rm or whatever suits your needs.

You need to keep in mind that if you have dependencies between containers, you might need to execute the commands below more than once.

Stop all docker containers.

docker container stop $(docker container ls -q);
#This command creates a list of all containers.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

Stop specific docker containers using a filter on their name.

docker container stop $(docker container ls -q --filter name=_web);
#This command finds all containers that their name contains _web.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

A personal note

Check the system for things you might need to configure, like a crontab or other services.

A script that handles privileges on the docker volumes

To avoid access problems with the various external volumes we created the mysql user and group on the host machine as follows:

sudo groupadd -g 999 mysql;
sudo useradd -u 999 mysql -g mysql;

Then we execute the following to repair ownership issues with our containers. Please note that this script is custom to a particular installation and might not meet your needs.

#!/bin/bash

sudo chown -R www-data:www-data ~/volumes/*;
sudo chown -R bob:bob ~/volumes/letsencrypt/ ~/volumes/reverse/private/;
find ~/volumes/ -maxdepth 2 -type d -name mysql -exec sudo chown -R mysql:mysql '{}' \;;

Bind for 0.0.0.0:443 failed: port is already allocated

On a Docker installation that we have, we updated the image files for our containers using the following command:

docker images --format "{{.Repository}}:{{.Tag}}" | grep ':latest' | xargs -L1 docker pull;

Then we tried to update our container, as usual, using the docker-compose command.

export COMPOSE_HTTP_TIMEOUT=180; # We extend the timeout to ensure there is enough time for all containers to start
docker-compose up -d --remove-orphans;

Unfortunately, we got the following error:

export COMPOSE_HTTP_TIMEOUT=180;
docker-compose up -d --remove-orphans;

Starting entry ... 
Starting entry ... error

ERROR: for entry  Cannot start service entry: driver failed programming external connectivity on endpoint entry (d3a5d95f55c4e872801e92b1f32d9693553bd553c414a371b8ba903cb48c2bd5): Bind for 0.0.0.0:443 failed: port is already allocated

ERROR: for entry  Cannot start service entry: driver failed programming external connectivity on endpoint entry (d3a5d95f55c4e872801e92b1f32d9693553bd553c414a371b8ba903cb48c2bd5): Bind for 0.0.0.0:443 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

We used the docker container ls command to check which container was hoarding port 443, but none was doing so. Because of this, we assumed that docker ran into a bug. The first step we took (and the last) which solved the problem was to restart the docker service as follows:

sudo service docker restart;

This command was enough to fix our problem without messing with docker further.


Rough notes on setting up an Ubuntu server with docker

Static IP

First, we set up a static IP to our Ubuntu server using netplan. To do so, we created the following file:

/etc/netplan/01-netcfg.yaml

using the following command

sudo nano /etc/netplan/01-netcfg.yaml;

and added the following content to it:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    enp3s0f0:
      dhcp4: no
      addresses: [192.168.45.13/24]
      gateway4: 192.168.45.1
      nameservers:
          addresses: [1.1.1.1,8.8.8.8]

To apply the changes, we executed the following:

sudo netplan apply;

Update everything (the operating system and all packages)

Usually, it is a good idea to update your system before making significant changes to it:

sudo apt update -y; sudo apt upgrade -y; sudo apt autoremove -y;

Install docker

In this setup we did not use the docker version that is available on the Ubuntu repositories, we went for the official ones from docker.com. To install it, we used the following commands:

sudo apt-get install ca-certificates curl gnupg lsb-release;
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg;
echo   "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null;
sudo apt-get update;
sudo apt-get install docker-ce docker-ce-cli containerd.io;

Install docker-compose

Again, we installed the official docker-compose from github.com instead of the one available in the Ubuntu repositories. At the time that this post was created, version 1.29.2 was the recommended one:

sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose;
sudo chmod +x /usr/local/bin/docker-compose;

Increase network pool for docker daemon

To handle the following problem:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

We created the following file,

/etc/docker/daemon.json

using the command:

sudo nano /etc/docker/daemon.json;

and added the following content to it:

{
  "default-address-pools": [
    {
      "base": "172.80.0.0/16",
      "size": 24
    },
    {
      "base": "172.90.0.0/16",
      "size": 24
    }
  ]
}

We executed the following command to restart the docker daemon and get the network changes applied:

sudo systemctl restart docker;

Gave access to our user to manage docker

We added our user to the docker group so that we could manage the docker daemon without sudo rights.

sudo usermod -aG docker $USER;

Deploying

After we copied everything in place, we executed the following command to create our containers and start them with the appropriate networks and volumes:

export COMPOSE_HTTP_TIMEOUT=120;
docker-compose up -d --remove-orphans;

We had to increase the timeout as we were getting the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Stopping all containers using a filter on the name

docker container stop $(docker container ls -q --filter name=_web);

The above command will find all containers whose names contain _web and stop them. That command is actually two commands where one is nested inside the other.

#This command finds all containers that their name contains _web, using the -q parameter, we only get back the container ID and not all information about them.
docker container ls -q --filter name=_web;
#The second command takes as input the output of the nested command and stops all containers that are returned.
docker container stop $(docker container ls -q --filter name=_web);