Infrastructure Improvements

We’ve had a few outages these past few days, so just wanted to say (1) I’m sorry if you were affected by them, and more importantly, (2) I’ve addressed the underlying issue.

I’ve doubled the capacity of our underlying server, as it was just running out of memory. We’ve been seeing a lot more traffic lately, and it just couldn’t handle it. A good problem to have, I suppose!

Let’s turn this into an educational moment though, because that is what I do 🙂

Although I teach complex system designs and the use of web services, a site like this can often make do with a single monolithic server. Our current level of traffic doesn’t require anything more, and you should always opt for the simplest solution that meets your needs. This site runs on an Amazon Lightsail instance, running WordPress with Bitnami. The issue wasn’t with that architectural choice, it was just in choosing an instance type that didn’t keep up with our growth in an attempt to minimize costs.

But, I have automated alerts set up when the site becomes unresponsive – so as long as it happens while I’m awake, I can respond quickly. And I have automated snapshots of the site stored daily, so recovering from a hardware failure or something can be done quickly.

In this case, a few brief outages that fixed themselves after a few minutes were a warning sign I should have paid more attention to. I assumed it was some transient hacking attempt that was quickly thwarted by the security software this site runs. But a couple of days ago, the site went down, and stayed down. When I saw the alert, I examined the apache error logs which plainly pointed to PHP running out of memory. The memory limit in php.ini didn’t appear to be the problem, but running “top” on the server showed that free memory on our virtual machine was perilously low. A simple reboot freed up some resources, and bought me some time to pursue a longer-term solution.

Fortunately, in Lightsail it’s pretty easy to upgrade a server. I just restored our most recent snapshot to a new, larger instance type, and switched over our DNS entry. Done.

There are still a couple of larger instance types to choose from before we need to think about a more complex architecture, but if that day does come, Lightsail does offer a scheme for using a load balancer in front of multiple servers. We’d have to move off our underlying database to something centralized at the same time, but that’s not that hard either.

Published by

Frank Kane

Frank spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to hundreds of millions of customers, all the time. Frank holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. In 2012, Frank left to start his own successful company, Sundog Software, which focuses on virtual reality environment technology, and teaching others about big data analysis.

Leave a Reply