Currently down
Incident Report for Agency Revolution - old one
Postmortem

The root cause of the outage was a series of events that occurred in the right order to disrupt things. Fortunately, there is a solution.

At 6:15 AM Pacific time, one of our server automatically installed Windows Updates. During the process it went down for an automatic reboot. Fortunately, this is normal behavior and should never cause any downtime for us.

However, due to a configuration issue, the failover cluster was unable to detect the down node and roll the application over to the other node.

When we realized the situation - the cluster not failing over - we were able to quickly recover by manually forcing a failover.

The configuration will be worked on today so this does not happen again.

Posted Apr 15, 2014 - 15:32 PDT

Resolved
And we're back.
Posted Apr 15, 2014 - 06:42 PDT
Monitoring
We will be back momentarily. Sorry for the interruption.
Posted Apr 15, 2014 - 06:42 PDT
Investigating
We're investigating and will have an update soon.
Posted Apr 15, 2014 - 06:29 PDT