JOIN NOW

Troubleshooting Guide

GPU-Enabled Docker Platform Troubleshooting Guide

This guide is designed to assist users in troubleshooting their GPU-enabled Docker platform setup on both Linux and Windows operating systems. It provides a structured approach to ensure the platform is correctly configured and functioning.


Initial Setup for Linux

  • Proper Installation Process:
    • Run the setup from the website for the first time. This step installs the necessary drivers.
    • After the first installation, reboot your Linux system.
    • Perform the setup process a second time, which will install Docker.

Verifying the Setup on Linux and Windows

  • Confirmation Command:
    • To verify if your setup is correctly functioning, execute:
    docker run --gpus all nvidia/cuda:11.0.3-base-ubuntu18.04 nvidia-smi
    
    • The output should be similar to that of nvidia-smi .
    • This command checks if Docker is properly utilizing your GPU.

Troubleshooting Failed Setup on Linux

  • Using the Reset Script(end of page):

    • If the confirmation command fails, use the reset_drivers_and_docker script:

    • chmod +x reset_drivers_and_docker.sh  
      ./reset_drivers_and_docker.sh
      
    • After running the script, restart your device.

    • Rerun the setup from the website. After an automatic restart, rerun the setup to complete the installation.

    • If the confirmation command fails again after these steps, seek assistance on the community support channel.


Stopping the Platform

  • Windows (Using PowerShell):
    • To stop and remove all containers, execute:
      docker ps -a -q | ForEach { docker rm $\_ }
      
  • Linux (Using Terminal):
    • To stop and remove all containers, use:
      sudo docker stop $(sudo docker ps -a -q); sudo docker rm $(sudo docker ps -q)
      

Restarting the Platform After Reboot

  • After rebooting your computer or server, the platform will require a restart.
  • Rerun the same command provided on the website during the initial setup. (looks like docker run -d ....)

📘

NOTE (IMPORTANT) :

Make sure you're not running two instances of io-worker-vc
How to check this :

 docker ps

If there are 2 containers running the same image io-worker-vc , the platform will malfunction - output looks like the following

~$ docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED          STATUS         PORTS     NAMES
87b1b066bdfa   ionetcontainers/io-worker-monitor   "tail -f /dev/null"      3 seconds ago    Up 2 seconds             agitated_hawking
7033c1b8feba   ionetcontainers/io-worker-vc        "sudo -E /srp/invoke…"   8 seconds ago    Up 8 seconds             friendly_ritchie
67f699e12c2e   ionetcontainers/io-worker-vc        "sudo -E /srp/invoke…"   10 seconds ago   Up 8 seconds             sleepy_feynman

How to fix this?
Run the stop all docker containers (check troubleshooting guide) and run the (docker run -d ...) command from website only ONCE to run the platform normally


Extra Guides

Which ports need to be exposed on firewall for proper functioning of platform: (both linux and windows)

  • TCP 443 25061 5432 80
  • UDP 80 443 41641 3478

How can I verify that program has started successfully?

  • on running the following command on powershell(windows) or terminal(linux) you should have 2 docker containers running at all times:

     docker ps
    
  • Incase there are no containers or only 1 container running after docker run -d ... command from website:

    • stop the plaform (check guide above for the command) and restart the platform with command from website again
  • If this still doesn't work - reach out to our discord community for help : https://discord.com/invite/kqFzFK7fg2

reset_drivers_and_docker.sh :

Create a new file called "reset_drivers_and_docker.sh", and copy paste the code snippet below

#!/bin/bash

# Stop all running Docker containers
echo "Stopping all running Docker containers..."
docker stop $(docker ps -a -q)

# Remove all Docker containers
echo "Removing all Docker containers..."
docker rm $(docker ps -a -q)

# Remove all Docker images
echo "Removing all Docker images..."
docker rmi $(docker images -q)

# Uninstall Docker Engine, CLI, and Containerd
echo "Uninstalling Docker..."
sudo apt-get purge -y docker-engine docker docker.io docker-ce docker-ce-cli containerd containerd.io

# Remove Docker's storage volumes
echo "Removing Docker storage volumes..."
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd

# Remove Docker group
sudo groupdel docker

# Remove Docker's configuration files
echo "Removing Docker configuration files..."
sudo rm -rf /etc/docker

# Remove any leftover Docker files
sudo find / -name '*docker*' -exec rm -rf {} \;

# Uninstall NVIDIA Docker
echo "Uninstalling NVIDIA Docker..."
sudo apt-get purge -y nvidia-docker

# Uninstall NVIDIA drivers
echo "Uninstalling NVIDIA drivers..."
sudo apt-get purge -y '*nvidia*'

# Remove any remaining NVIDIA directories
sudo rm -rf /usr/local/nvidia/

# Update the package lists
echo "Updating package lists..."
sudo apt-get update

# Autoremove any orphaned packages
echo "Removing unused packages and cleaning up..."
sudo apt-get autoremove -y
sudo apt-get autoclean

# Rebuild the kernel module dependencies
echo "Rebuilding kernel module dependencies..."
sudo depmod

# Inform the user that a reboot is required
echo "Uninstallation complete. Please reboot your system."