Currently down

Incident Report for Agency Revolution - old one

Postmortem

The root cause of the outage was a series of events that occurred in the right order to disrupt things. Fortunately, there is a solution.

At 6:15 AM Pacific time, one of our server automatically installed Windows Updates. During the process it went down for an automatic reboot. Fortunately, this is normal behavior and should never cause any downtime for us.

However, due to a configuration issue, the failover cluster was unable to detect the down node and roll the application over to the other node.

When we realized the situation - the cluster not failing over - we were able to quickly recover by manually forcing a failover.

The configuration will be worked on today so this does not happen again.

Posted Apr 15, 2014 - 15:32 PDT

Resolved

And we're back.

Posted Apr 15, 2014 - 06:42 PDT

Monitoring

We will be back momentarily. Sorry for the interruption.

Posted Apr 15, 2014 - 06:42 PDT

Investigating

We're investigating and will have an update soon.

Posted Apr 15, 2014 - 06:29 PDT