Docker

Troubleshooting Access Rights and Memory Allocation for Elastic Enterprise Search using Docker Compose

22 May 2023 in Applications / Docker tagged docker / Docker Compose / docker-compose / elastic / Enterprise Search / GNU/Linux / kibana / max_map_count / sysctl / ubuntu / vm.max_map_count by Tux

Introduction

Setting up Elastic Enterprise Search using Docker Compose provides a convenient and scalable solution for deploying Elasticsearch’s Enterprise Search. However, during our implementation of this setup, we encountered issues related to access rights and memory allocation. In this blog post, we will walk you through the steps we took to resolve these challenges and successfully configure the system.

Problem: Access Rights

The initial hurdle we encountered was related to access rights when running Elastic Enterprise Search using Docker Compose. While following the official guide provided by Elastic, we realized that certain permissions were causing problems. As a result, we could not access the files and directories required for proper functioning.

Solution: Modifying the YML and Adjusting Access Rights

We modified the YAML file used in the Docker Compose setup to overcome the access rights issue. Specifically, we relocated all volumes to a folder our user owns, ensuring we had complete control and ownership over these files. Additionally, we adjusted the access rights to 777, granting all users full read, write, and execute permissions. Please note that this is used for a development environment that will be destroyed later.

version: '2.2'

services:

  setup:
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/elasticsearch/config/certs
    user: "0"
    command: >
      bash -c '
        if [ x${ELASTIC_PASSWORD} == x ]; then
          echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
          exit 1;
        elif [ x${KIBANA_PASSWORD} == x ]; then
          echo "Set the KIBANA_PASSWORD environment variable in the .env file";
          exit 1;
        fi;
        if [ ! -f certs/ca.zip ]; then
          echo "Creating CA";
          bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
          unzip config/certs/ca.zip -d config/certs;
        fi;
        if [ ! -f certs/certs.zip ]; then
          echo "Creating certs";
          echo -ne \
          "instances:\n"\
          "  - name: es01\n"\
          "    dns:\n"\
          "      - es01\n"\
          "      - localhost\n"\
          "    ip:\n"\
          "      - 127.0.0.1\n"\
          > config/certs/instances.yml;
          bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
          unzip config/certs/certs.zip -d config/certs;
        fi;
        echo "Setting file permissions"
        chown -R root:root config/certs;
        find . -type d -exec chmod 750 \{\} \;;
        find . -type f -exec chmod 640 \{\} \;;
        echo "Waiting for Elasticsearch availability";
        until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
        echo "Setting kibana_system password";
        until curl -s -X POST --cacert config/certs/ca/ca.crt -u elastic:${ELASTIC_PASSWORD} -H "Content-Type: application/json" https://es01:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
        echo "All done!";
      '
    healthcheck:
      test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
      interval: 1s
      timeout: 5s
      retries: 120

  es01:
    depends_on:
      setup:
        condition: service_healthy
    image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/elasticsearch/config/certs
      - /home/tux/docker/volumes/es/esdata01:/usr/share/elasticsearch/data
    ports:
      - ${ES_PORT}:9200
    environment:
      - node.name=es01
      - cluster.name=${CLUSTER_NAME}
      - cluster.initial_master_nodes=es01
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es01/es01.key
      - xpack.security.http.ssl.certificate=certs/es01/es01.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.http.ssl.verification_mode=certificate
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es01/es01.key
      - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  kibana:
    depends_on:
      es01:
        condition: service_healthy
    image: docker.elastic.co/kibana/kibana:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/kibana/config/certs
      - /home/tux/docker/volumes/es/kibanadata:/usr/share/kibana/data
    ports:
      - ${KIBANA_PORT}:5601
    environment:
      - SERVERNAME=kibana
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca/ca.crt
      - ENTERPRISESEARCH_HOST=http://enterprisesearch:${ENTERPRISE_SEARCH_PORT}
      - xpack.reporting.kibanaServer.hostname=localhost
    mem_limit: ${MEM_LIMIT}
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s -I http://localhost:5601 | grep -q 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

  enterprisesearch:
    depends_on:
      es01:
        condition: service_healthy
      kibana:
        condition: service_healthy
    image: docker.elastic.co/enterprise-search/enterprise-search:${STACK_VERSION}
    volumes:
      - /home/tux/docker/volumes/es/certs:/usr/share/enterprise-search/config/certs
      - /home/tux/docker/volumes/es/enterprisesearchdata:/usr/share/enterprise-search/config
    ports:
      - ${ENTERPRISE_SEARCH_PORT}:3002
    environment:
      - SERVERNAME=enterprisesearch
      - secret_management.encryption_keys=[${ENCRYPTION_KEYS}]
      - allow_es_settings_modification=true
      - elasticsearch.host=https://es01:9200
      - elasticsearch.username=elastic
      - elasticsearch.password=${ELASTIC_PASSWORD}
      - elasticsearch.ssl.enabled=true
      - elasticsearch.ssl.certificate_authority=/usr/share/enterprise-search/config/certs/ca/ca.crt
      - kibana.external_url=http://kibana:5601
    mem_limit: ${MEM_LIMIT}
    healthcheck:
      test:
        [
            "CMD-SHELL",
            "curl -s -I http://localhost:3002 | grep -q 'HTTP/1.1 302 Found'",
        ]
      interval: 10s
      timeout: 10s
      retries: 120

volumes:
  certs:
    driver: local
  enterprisesearchdata:
    driver: local
  esdata01:
    driver: local
  kibanadata:
    driver: local

Problem: Memory Allocation

In addition to the access rights challenge, we also encountered an issue related to memory allocation. The default settings for the virtual memory map count (vm.max_map_count) were insufficient for our requirements, leading to errors during the deployment.

Solution: Increasing Memory Allocation

To address the memory allocation problem, we followed the guide provided by Elastic on how to increase the vm.max_map_count. By executing the command sysctl -w vm.max_map_count=262144, we set the required value for optimal memory allocation. This adjustment allowed Elastic Enterprise Search to function properly and utilize the necessary resources.

Problem: Version 8 does not deploy correctly and fails to authenticate

When we tried to bring up the docker containers, we got the following problems with access rights:

From ES01:
{"@timestamp":"2023-05-22T05:27:10.294Z", "log.level":"ERROR", "message":"failed to retrieve password hash for reserved user [kibana_system]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[es01][transport_worker][T#5]","log.logger":"org.elasticsearch.xpack.security.authc.esnative.ReservedRealm","trace.id":"8639207dbfa64551d7a302a1df3bda26","elasticsearch.cluster.uuid":"Zxd3v00YQyqJtFTsDw143w","elasticsearch.node.id":"v1WIc0TUReqqZFcTRzsyOw","elasticsearch.node.name":"es01","elasticsearch.cluster.name":"es-cluster","error.type":"org.elasticsearch.action.UnavailableShardsException",

From Kibana:
[2023-05-21T13:56:30.497+00:00][ERROR][elasticsearch-service] Unable to retrieve version information from Elasticsearch nodes. security_exception
	Root causes:
		security_exception: unable to authenticate user [kibana_system] for REST request [/_nodes?filter_path=nodes.*.version%2Cnodes.*.http.publish_address%2Cnodes.*.ip]
[2023-05-21T13:56:30.868+00:00][INFO ][plugins.screenshotting.chromium] Browser executable: /usr/share/kibana/x-pack/plugins/screenshotting/chromium/headless_shell-linux_x64/headless_shell

Solution: Downgrade to version 7

It appears that version 7 handles correctly the access management issues, so we updated our .env file to the following:

STACK_VERSION=7.17.10
ELASTIC_PASSWORD=7XQ2aR7q2g2Q994msd942sLJoAJ7LUvD
KIBANA_PASSWORD=2M7nl2hCH11b7NLht2l8A8ZHBjAKTkxI
ES_PORT=9200
CLUSTER_NAME=es-cluster
LICENSE=basic
MEM_LIMIT=1073741824
KIBANA_PORT=5601
ENTERPRISE_SEARCH_PORT=3002
ENCRYPTION_KEYS=5df1f9f9590ac53b29db6f5eeb9f63be04a43018af3095ef2bfc8a043d38fc7c

Conclusion

While setting up Run Enterprise Search using Docker Compose can be a straightforward process, it is not uncommon to encounter challenges along the way. In this blog post, we discussed the problems we faced regarding access rights and memory allocation and the steps we took to resolve them.

By modifying the YAML file to ensure proper ownership and permissions for the required volumes and adjusting the memory allocation using the recommended command, we successfully overcame these issues. With our configuration now in place, we enjoy the benefits of Elastic Enterprise Search, leveraging its powerful capabilities for our organization’s search and discovery needs.

We hope that sharing our experience and solutions will assist others in overcoming similar obstacles and make their journey toward implementing Elastic Enterprise Search using Docker Compose smoother.

Rough notes on setting up an Ubuntu 22.04LTS server with docker and snap 1

5 December 2022 in Docker / GNU/Linux tagged addgroup / adduser / bash / chown / COMPOSE_HTTP_TIMEOUT / container / DHCP / docker / docker exec / docker-compose / exec / find / GNU/Linux / group / groupadd / mysql_upgrade / netcfg / netplan / network pool / newgrp / server / snap / snapd / static ip / timeout / ubuntu / useradd by Tux

IP allocations

First, we set up a static IP on the network device that would handle all external traffic and a DHCP on the network device that would access the management network, which is connected for maintenance.

To do so, we created the following file:

/etc/netplan/01-netcfg.yaml

using the following command:

sudo nano /etc/netplan/01-netcfg.yaml;

and added the following content to it:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: no
      addresses: [192.168.45.13/24]
      gateway4: 192.168.45.1
      nameservers:
          addresses: [1.1.1.1,8.8.8.8]
    eth1:
      dhcp4: yes

To apply the changes, we executed the following:

sudo netplan apply;

Update everything (the operating system and all packages)

Usually, it is a good idea to update your system before making significant changes to it:

sudo apt update -y; sudo apt upgrade -y; sudo apt autoremove -y;

Install docker via snap

In this setup, we did not use the docker version available on the Ubuntu repositories, we went for the ones from the snap. To install it, we used the following commands:

sudo apt install snapd;
sudo snap install docker;

Increase network pool for docker daemon

To handle the following problem:

ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network

We modified the following file

/var/snap/docker/current/config/daemon.json

using the command:

sudo nano /var/snap/docker/current/config/daemon.json;

and set the content to be as follows:

{
    "log-level":        "error",
    "storage-driver":   "overlay2",
    "default-address-pools": [
        {
            "base": "172.80.0.0/16",
            "size": 24
        },
        {
            "base": "172.90.0.0/16",
            "size": 24
        }
    ]
}

We executed the following command to restart the docker daemon and get the network changes applied:

sudo snap disable docker;
sudo snap enable docker;

Gave access to our user to manage the docker

We added our user to the docker group so that we could manage the docker daemon without sudo rights.

sudo addgroup --system docker;
sudo adduser $USER docker;
newgrp docker;
sudo snap disable docker;
sudo snap enable docker;

After that, we made sure that the access rights to the volumes were correct:

sudo chown -R www-data:www-data /volumes/*
sudo chown -R tux:tux /volumes/letsencrypt/ /volumes/reverse/private/

Deploying

After we copied everything in place, we executed the following command to create our containers and start them with the appropriate networks and volumes:

export COMPOSE_HTTP_TIMEOUT=600;
docker-compose up -d --remove-orphans;

We had to increase the timeout as we were getting the following error:

ERROR: for container_a  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Updating the databases and performing any repairs

First, we connected to a terminal of the database container using the following command:

docker exec -it mariadb_c1 /bin/bash;

From there, we executed the following commands:

mysql_upgrade --user=root --password;
mysqlcheck -p -o --all-databases;

Bulk / Batch stopping docker containers

The following commands will help you stop many docker containers simultaneously. Of course, you can change the command stop to another, for example rm or whatever suits your needs.

You need to keep in mind that if you have dependencies between containers, you might need to execute the commands below more than once.

Stop all docker containers.

docker container stop $(docker container ls -q);
#This command creates a list of all containers.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

Stop specific docker containers using a filter on their name.

docker container stop $(docker container ls -q --filter name=_web);
#This command finds all containers that their name contains _web.
#Using the -q parameter, we only get back the container ID and not all information about them.
#Then it will stop each container one by one.

A personal note

Check the system for things you might need to configure, like a crontab or other services.

A script that handles privileges on the docker volumes

To avoid access problems with the various external volumes we created the mysql user and group on the host machine as follows:

sudo groupadd -g 999 mysql;
sudo useradd -u 999 mysql -g mysql;

Then we execute the following to repair ownership issues with our containers. Please note that this script is custom to a particular installation and might not meet your needs.

#!/bin/bash

sudo chown -R www-data:www-data ~/volumes/*;
sudo chown -R bob:bob ~/volumes/letsencrypt/ ~/volumes/reverse/private/;
find ~/volumes/ -maxdepth 2 -type d -name mysql -exec sudo chown -R mysql:mysql '{}' \;;

Bind for 0.0.0.0:443 failed: port is already allocated

30 August 2022 in Docker tagged bind / docker / docker-compose / images / pull / restart / service / timeout / update by Tux

On a Docker installation that we have, we updated the image files for our containers using the following command:

docker images --format "{{.Repository}}:{{.Tag}}" | grep ':latest' | xargs -L1 docker pull;

Then we tried to update our container, as usual, using the docker-compose command.

export COMPOSE_HTTP_TIMEOUT=180; # We extend the timeout to ensure there is enough time for all containers to start
docker-compose up -d --remove-orphans;

Unfortunately, we got the following error:

export COMPOSE_HTTP_TIMEOUT=180;
docker-compose up -d --remove-orphans;

Starting entry ... 
Starting entry ... error

ERROR: for entry  Cannot start service entry: driver failed programming external connectivity on endpoint entry (d3a5d95f55c4e872801e92b1f32d9693553bd553c414a371b8ba903cb48c2bd5): Bind for 0.0.0.0:443 failed: port is already allocated

ERROR: for entry  Cannot start service entry: driver failed programming external connectivity on endpoint entry (d3a5d95f55c4e872801e92b1f32d9693553bd553c414a371b8ba903cb48c2bd5): Bind for 0.0.0.0:443 failed: port is already allocated
ERROR: Encountered errors while bringing up the project.

We used the docker container ls command to check which container was hoarding port 443, but none was doing so. Because of this, we assumed that docker ran into a bug. The first step we took (and the last) which solved the problem was to restart the docker service as follows:

sudo service docker restart;

This command was enough to fix our problem without messing with docker further.

Incorrect definition of table mysql.column_stats: expected column 1

7 June 2022 in Docker tagged docker / MariaDB / mysql / mysql_upgrade by Tux

2022-06-07  7:17:02 3051 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'hist_type' at position 9 to have type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB','JSON_HB'), found type enum('SINGLE_PREC_HB','DOUBLE_PREC_HB').
2022-06-07  7:17:02 3051 [ERROR] Incorrect definition of table mysql.column_stats: expected column 'histogram' at position 10 to have type longblob, found type varbinary(255).

While checking the logs of a MariaDB docker container, we found the above error lines repeating thousands of times. It appears that there was an issue during the migration of the database to a newer version. The solution was to manually execute the command mysql_upgrade. To execute it, we first had to gain access to a shell inside the container, we did that using docker exec -it CONTAINER_NAME /bin/bash as below:

# Gain shell access to the database container
docker exec -it mariadb_alpha /bin/bash;
# In the shell of the container, we executed the following to automatically fix a variety of problems/errors
mysql_upgrade --user=root --password;

Bytefreaks.net – a place for hacks