JOIN NOW

Troubleshooting Docker (Linux & Windows)

GPU-Enabled Docker Platform Troubleshooting Guide

πŸ“˜

This guide is designed to assist users in troubleshooting their GPU-enabled Docker platform setup on both Linux and Windows operating systems. It provides a structured approach to ensure the platform is correctly configured and functioning.


Initial Setup for Linux

  • Proper Installation Process:
    • Run the setup from the website for the first time. This step installs the necessary drivers.
    • After the first installation, reboot your Linux system.
    • Perform the setup process a second time, which will install Docker.

Verifying the Setup on Linux and Windows

  • Confirmation Command:
    • To verify if your setup is correctly functioning, execute:
    docker run --gpus all nvidia/cuda:11.0.3-base-ubuntu18.04 nvidia-smi
    
    • The output should be similar to that of nvidia-smi .
    • This command checks if Docker is properly utilizing your GPU.

Troubleshooting Failed Setup on Linux

  • Using the Reset Script(end of page):

    • If the confirmation command fails, use the reset_drivers_and_docker script:

    • chmod +x reset_drivers_and_docker.sh  
      ./reset_drivers_and_docker.sh
      
    • After running the script, restart your device.

    • Rerun the setup from the website. After an automatic restart, rerun the setup to complete the installation.

    • If the confirmation command fails again after these steps, seek assistance on the community support channel.


Stopping the Platform

  • Windows (Using PowerShell):
    • To stop and remove all containers, execute:
      docker ps -a -q | ForEach { docker rm $\_ }
      
  • Linux (Using Terminal):
    • To stop and remove all containers, use:
      sudo docker stop $(sudo docker ps -a -q); sudo docker rm $(sudo docker ps -q)
      

Restarting the Platform After Reboot

  • After rebooting your computer or server, the platform will require a restart.
  • Rerun the same command provided on the website during the initial setup. (looks like docker run -d ....)

❗️

NOTE (IMPORTANT) :

🚧

Make sure you're not running two instances of io-worker-vc

How to check this :

 docker ps

πŸ“˜

If there are 2 containers running the same image io-worker-vc , the platform will malfunction - output looks like the following

~$ docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS         PORTS     NAMES
87b1b066bdfa   ionetcontainers/io-worker-monitor   "tail -f /dev/null"      3 seconds ago    Up 2 seconds             agitated_hawking
7033c1b8feba   ionetcontainers/io-worker-vc        "sudo -E /srp/invoke…"   8 seconds ago    Up 8 seconds             friendly_ritchie
67f699e12c2e   ionetcontainers/io-worker-vc        "sudo -E /srp/invoke…"   10 seconds ago   Up 8 seconds             sleepy_feynman

πŸ“˜

How to fix this?

πŸ‘

Run the stop all docker containers (check troubleshooting guide) and run the (docker run -d ...) command from website only ONCE to run the platform normally


Facing unstable uptime on Windows?

πŸ“˜

To ensure that the DHCP lease time on the router is set to a duration exceeding 24 hours, access the group policy editor within the Windows operating system. Proceed by enabling the specified settings in the following sequence:

  1. Navigate to "Computer Configuration" in the group policy editor.
  2. Within "Computer Configuration" locate the "Administrative Templates" section.
  3. Within the "Administrative Templates" section, navigate to "System".
  4. Within the "System" menu, select "Power Management".
  5. Lastly, access the "Sleep Settings" subsection within "Power Management".
  6. In the "Sleep Settings" submenu, activate both "Allow network connectivity during connected-standby (on battery)" and "Allow network connectivity during connected-standby (plugged in)" options.

Please ensure these configurations are adjusted accordingly for the desired outcome.

Extra Guides

πŸ“˜

Which ports need to be exposed on firewall for proper functioning of platform: (both linux and windows)

  • TCP 443 25061 5432 80
  • UDP 80 443 41641 3478

πŸ“˜

How can I verify that program has started successfully?

  • on running the following command on powershell(windows) or terminal(linux) you should have 2 docker containers running at all times:

     docker ps
    
  • Incase there are no containers or only 1 container running after docker run -d ... command from website:

    • stop the platform (check guide above for the command) and restart the platform with command from website again

reset_drivers_and_docker.sh :

Create a new file called "reset_drivers_and_docker.sh", and copy paste the code snippet below

#!/bin/bash

# Stop all running Docker containers
echo "Stopping all running Docker containers..."
docker stop $(docker ps -a -q)

# Remove all Docker containers
echo "Removing all Docker containers..."
docker rm $(docker ps -a -q)

# Remove all Docker images
echo "Removing all Docker images..."
docker rmi $(docker images -q)

# Uninstall Docker Engine, CLI, and Containerd
echo "Uninstalling Docker..."
sudo apt-get purge -y docker-engine docker docker.io docker-ce docker-ce-cli containerd containerd.io

# Remove Docker's storage volumes
echo "Removing Docker storage volumes..."
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd

# Remove Docker group
sudo groupdel docker

# Remove Docker's configuration files
echo "Removing Docker configuration files..."
sudo rm -rf /etc/docker

# Remove any leftover Docker files
sudo find / -name '*docker*' -exec rm -rf {} \;

# Uninstall NVIDIA Docker
echo "Uninstalling NVIDIA Docker..."
sudo apt-get purge -y nvidia-docker

# Uninstall NVIDIA drivers
echo "Uninstalling NVIDIA drivers..."
sudo apt-get purge -y '*nvidia*'

# Remove any remaining NVIDIA directories
sudo rm -rf /usr/local/nvidia/

# Update the package lists
echo "Updating package lists..."
sudo apt-get update

# Autoremove any orphaned packages
echo "Removing unused packages and cleaning up..."
sudo apt-get autoremove -y
sudo apt-get autoclean

# Rebuild the kernel module dependencies
echo "Rebuilding kernel module dependencies..."
sudo depmod

# Inform the user that a reboot is required
echo "Uninstallation complete. Please reboot your system."

πŸ“˜

If you're still having trouble connecting your device, Contact Us or refer to our Discord for further assistance. We are here to help!