Equipment transfer for better performance and uptime
Scheduled Maintenance Report for Agency Revolution - old one
Completed
Success. Over the next 10 days we'll have more maintenance as we continue to implement our plan.
Posted Dec 21, 2013 - 02:09 PST
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Dec 20, 2013 - 22:00 PST
Scheduled
Maximizing Uptime

I will be the first to admit that lately the reliability of our server farm has been a challenge. It has been a combination of issues all happening at the same time. I want you to know that we have an action plan and are implementing it as quickly as possible.

I will address these in the order that impact our services.

Internet Access
========

In 2013, about 50% of our downtime has been associated poor internet access.

We've been with the same company for more than 10 years. The service during this time has been quite good. In fact, in all of 2011 our internet was down for only 40 minutes and nearly the same number in 2012. In October of this year, it has become remarkably worse. The Thanksgiving-eve outage caused more downtime than when we physically had to move our server 150 miles.

Fortunately, in this area we have options.

Sometime this week we will be switching internet service providers. This provider peers with 3 other providers. Their service has been tracked for the last 8 years and in that time they have 99.996% up time. Which means that they are down for 21 minutes a year, on average. The last outage they had was in October 2011. Not even Microsoft Azure or Amazon S3 can claim that.

That being said, one provider is a single point of failure. And the case with our previous provider, we tried to bring in others but they added significant surcharges and complications to the process so it made sense to part ways.

To avoid any single point of failures, we will be multi-homing with multiple providers. Multi-homing is the unique structure that makes the internet resilient. If one link goes down, you shouldn't notice, another link will pick it back up.

To become multi-homed we've applied for our own IP address block with the internet governing body, ARIN. Our application was approved yesterday! This provides our company with the additional security that if we ever need to change service providers, we won't have to update thousands of customer records at the same time. We now have unique real estate that will always follow us around the internet.

The first step happens Friday night at 10:00 PM Pacific. We will move our equipment to the new facility. In the coming days, we'll have a new IP address that we'll request customers change to. This new IP address will provide a more reliable route to our servers.

To recap:

1) Clients who want a more reliable connection today can switch IP addresses as soon as next week.
2) As time goes on, we'll be adding additional service providers to provide an always-on fast connection.


Application Performance and Uptime
=======


Application performance, your web sites uptime and speed, has not met our own quality standards this year. We have a Quarter 1 goal to hit 99.9% uptime every week for the first 13 weeks of 2014. That just 10 minutes down per week.

In order to hit that mark we're going to focus on two strategies.

Maintenance
-----------

One thing that can cause a complex application to go down is the simple updating of software code or addition of hardware to systems. You know those Windows Updates with valuable security fixes? We get those, too.

When we purchased our new equipment in June 2013, we did so with this in mind. We're excited to finally get close to implementing a dual hardware failover system. Anytime we need to perform maintenance, another virtual copy of the same computer will take over. When the maintenance is complete, both systems resync and we're back online.


Redundancy
----------

In addition to eliminating maintenance windows, the dual system also gives us another layer of protection against catastrophic system failure. Should one system completely die, you won't even notice, as the mirror image will already be up and running within seconds.

Another impact that we've seen in 2013 is a slow responding web server. Sometimes a server application would get stalled as it recompiles the code to load the website. The problem was we were stuck with a single web server application due to licensing and technology constraints. Fortunately the pieces have fallen in place and we can now overcome this. In the coming days or weeks, we'll be online with multiple web servers load balanced and server your web sites. If one should go down for a few minutes, another server will take over automatically. When the first server heals itself, it can start sharing the load again.


Speed/Performance
-----------------

In addition to the high availability of your websites provided by the multiple web servers, we'll also have faster performance.

We're also switching to an in-memory caching systems so that to website visitors the page can load very quickly.


Summary
======

I do appreciate everyone's patience in solving these issues. I have worked tirelessly to find the ideal solution for us and will continue to do so this quarter as we achieve our 99.9% mark.

The first maintenance will be Friday December 20th at 10:00 PM Pacific, 1:00 AM Eastern. Our window is 4 hours, but will try to do it much faster than that!
Posted Dec 19, 2013 - 21:27 PST