Just a bunch of snippets that we used to deploy a local Ray cluster. We couldn’t get the second node to connect to the cluster even though no error was given.
From https://www.anaconda.com/products/individual-b
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
bash ./Anaconda3-2021.05-Linux-x86_64.sh
source ~/anaconda3/bin/activate
conda create --name ray.3.7 python=3.7.10;
conda activate ray.3.7;
conda install --name ray.3.7 pip;
pip install ray==1.1.0
pip install gym pandas torch ray ray[default] ray[rllib] ray[serve] ray[tune];
ON WORKERS:
From https://docs.docker.com/engine/install/ubuntu/
sudo apt-get remove docker docker-engine docker.io containerd runc;
sudo apt-get update;
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release;
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
sudo addgroup --system docker
sudo adduser $USER docker
newgrp docker
sudo systemctl restart docker
Uninstall Docker Engine
Uninstall the Docker Engine, CLI, and Containerd packages:
sudo apt-get purge docker-ce docker-ce-cli containerd.io
Images, containers, volumes, or customized configuration files on your host are not automatically removed. To delete all images, containers, and volumes:
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
You must delete any edited configuration files manually.
2021-06-09 03:56:34,453 WARNING services.py:1740 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 7963275264 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
ON MASTER:
ray up office.yaml
ray up -vvvvvv office.yaml
ray exec office.yaml 'ray status'
The yaml
file that we used was the following
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | cluster_name: default
docker:
image: "rayproject/ray-ml:1.1.0"
container_name: "ray_container"
pull_before_run: True
run_options: [ "--shm-size=10.24gb" ]
provider:
type: local
head_ip: 192.168.1.14
worker_ips: [ 192.168.1.70 ]
auth:
ssh_user: tux
ssh_private_key: ~ /.ssh/id_rsa
min_workers: 20
max_workers: 20
initial_workers: 20
upscaling_speed: 1.0
idle_timeout_minutes: 5
file_mounts: {
}
cluster_synced_files: [ ]
file_mounts_sync_continuously: False
rsync_exclude:
- "**/.git"
- "**/.git/**"
rsync_filter:
- ".gitignore"
initialization_commands: [ ]
setup_commands: [ ]
head_setup_commands: [ ]
worker_setup_commands: [ ]
head_start_ray_commands:
- ray stop
- ulimit -c unlimited && ray start --head --port=6379 --autoscaling-config= ~ /ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- ray start --address=$RAY_HEAD_IP : 6379
|