Doggo.Ninja - Pat (API) outage – Incident details

Pat (API) outage

Resolved
Major outage
Started over 3 years agoLasted about 9 hours

Affected

Dashboard & Tools

Major outage from 10:27 AM to 7:39 PM

Updates
  • Resolved
    Resolved

    Everything is now operational! The major factor in this incident was a Docker container which hung with 100% CPU usage, and started filling up the disk and memory. A couple reconfigurations and restarts later, everything is nominal.

    We will be setting up better monitoring systems so we're aware of this earlier next time.

  • Monitoring
    Monitoring

    We implemented a fix and currently monitoring the result. All the services should be coming back up now.

  • Identified
    Identified

    The issue seems to be caused by a server misconfiguration, we're currently applying a temporary fix.

  • Investigating
    Investigating

    Dashboard & Tools cannot be accessed at the moment. This incident was created by an automated monitoring service.