Troubleshooting Process

A comprehensive list of things to check when trying to figure out if a Pliant instance might be unhealthy, starting from the ground up. If you find anything that is not as it should be, contact Pliant Support

Host(s)

This section is only applicable if you are hosting your own Pliant instance on your own infrastructure

  1. Network Connectivity

    1. Is there connectivity to the Pliant host(s) via the following methods?

      1. ICMP ping(if permitted)

        # Example commands
        ping 172.27.13.4
        ping mypliant.pliant.io
      2. SSH

        # Example commands
        ssh -i private_ssh_key.pem myuser@mypliant.pliant.io
        ssh myuser@172.27.13.4
      3. https://<instance-ip>

      4. https://<instance-hostname>

    2. Can you reach your intended target from the Pliant host? (ping, wget, etc)

    3. If this is a multi-node cluster, can each node reach each other node?

    4. Can nodes reach the Internet (if not a purposefully air-gapped environment)?

  2. Memory Usage - top

    1. Is total memory on each node 16GB or greater?

    2. Is there 0 Swap used?

    3. Is there more than 1GB free?

  3. Disk Usage - df

    1. Does df -h / show more than 10GB free on each node?

    2. Are there any other disks to consider (is Pliant mounted on /pliant or another non-standard place)?

    3. Do any non standard disks have <10GB free?

Kubernetes

This section is only applicable if you are hosting your own Pliant instance on your own infrastructure

  1. Check nodes - kubectl get nodes -o wide

    1. Are they all “Ready”?

  2. Check pods – kubectl get pods

    1. Are all pods running?

    2. Are all pods showing all containers as ready? (2/2,1/1 etc)

  3. Check storage for StatefulSet Persistent Volumes, make sure no more than 80% utilized

    1. kubectl exec mysqldb-0 -- df -h /var/lib/mysql

    2. kubectl exec object-storage-0 -- df -h /data

    3. kubectl exec pliant-stats-0 -- df -h /var/lib/influxdb

  4. Getting application and container information

    1. Pods

      1. kubectl get pods -o wide

      2. kubectl describe pod <pod name>

      3. kubectl logs <pod name>

    2. Services

      1. kubectl get services -o wide

      2. kubectl describe service <service name>

    3. Deployments

      1. kubectl get deployments -o wide

      2. kubectl describe deployment <deployment name>

Pliant

  1. Can you log in to Pliant?

  2. Check the workers (admin level login required)

    Does each worker group have the expected number of active workers (default typically has 2)?
    Does each worker have a recent heartbeat?

  3. Run a test flow. Does it run successfully?

  4. Check the Logs. Are any flow completing with errors?