The Mean, Lean, Green Mocavo Machine

28 Feb 2014

Here’s a riddle for you: What runs on clean burning natural gas, is cooled by ice cold mountain air, has 99.9% reliability, and processes 40 teraflops per second? Why it’s nothing less than Mocavo’s primary datacenter.

Mean

Reading the world’s genealogical records one at a time and making them searchable is no small feat. It requires a fine tuned infrastructure with plenty of processing power, storage, and redundancy.

With over 500 multi-core Dell Datacenter grade servers under the hood we have the ability to perform OCR on over 1 million documents per day. In fact, we’re in final stages of re-engineering our OCR process to increase that number to over 5 million, all without affecting the performance of the website whatsoever!

The processed documents have to go somewhere, and we’re pleased to announce that we have increased our storage capacity to over 1 Petabyte! That’s a lot of spinning platters, check out below how we keep them all spinning!

What good is all that power and processed documents if there is a fire, flood, or zombie apocalypse that destroys it in one fell swoop? We have an off-site datacenter connected via a 10Gb dedicated fiber link that keeps all of our (and your) precious records safe and available instantly for recovery. We like being able to sleep at night, the backup cluster makes that possible.

Lean

The most expensive part of running a datacenter isn’t power or cooling, it’s the labor to keep it running all the time. When you’re working with 500 servers, seconds count. Even spending just 30 seconds per server puts you over 4 hours in labor. Out in the wild you’ll find the server to administrator ratio ranges from about 15:1 to 100:1. So for 500 physical machines, what do we consider lean? Try 500:1, which is plenty- if you have the right tools.

Enter Puppet, Icinga, and Fabric.

Puppet is enterprise level configuration management. It works seamlessly in our DevOps workflow. Every 30 minutes every single physical server in our datacenter checks in with the puppet-master asking for updates. Last week I added a new subnet and needed to add a route to about 100 machines. So I opened up our nodes.pp file and added this:


exec { "route add -net 10.10.108.0/22 gw 10.10.109.2 dev eth0":
   unless => "route | grep 10.10.108",
}

So in 5 minutes I added a static route to 100 machines. That equates to about 3 seconds per machine. Not too bad, but I can do better.

Lets say I wanted to add that route to all 500 machines, and I couldn’t wait for the half hour puppet update. Let’s get Fabric involved.

Fabric is a python module that sends pre-defined (or on the fly) commands over SSH to hosts in hostgroups or roles. In my fabfile.py I already have a function to restart puppet:


@parallel
def kick_puppet():
    sudo('service puppet restart')

So after I add the route in puppet, I’ll restart the puppet client on all machines with fab -R All_Machines kick_puppet I have now touched 500 machines in less than 6 minutes, which takes me to less than a second per machine. I’m sure you see where this is going… but you can’t automate everything, can you? What if you have to reinstall a server from scratch?

In case of a corrupted OS drive, or a new server that has never been on the network, (re)building from scratch is quick and easy. Power on the server, press F12 to boot from the network, and PXE takes over. The OS gets installed, it reboots, then puppet takes it from vanilla OS to production ready, all without being touched again. One touch installs, try it. You’ll be glad you did.

I suppose there are a few things I’ll never be able to automate, like changing out a hard drive or a bad stick of ram. I don’t have time to run tests on each machine to see how it’s doing, but Icinga has 24 hours each day to do just that, and it never gets bored or tired of it.

Icinga is a fork of Nagios, and right now it makes over 3000 individual checks for us every 10 minutes, and it’s not even breaking a sweat. We use puppet to automate the creation of the checks, and Icinga will holler when a hard drive fails, puppet stops running, a web server stalls, or a machine becomes unresponsive. It can even perform actions based on an alert through a handler (like restarting puppet if it’s not running or rebooting the unresponsive machine)

So on the occasion that we must physically touch a machine, Icinga narrows it down for us so we can get in and get out, because contrary to what you see in the movies, datacenters are LOUD and generally uncomfortable to work in for long periods of time.

Green

Mocavo is concerned with being efficient and taking care of our natural resources. Often those two goals work very well together, here are some initiatives we have at Mocavo to lower our footprint while providing an excellent product:

The power we use here at the datacenter comes from clean burning natural gas, which we like because it’s less expensive and better for the environment.

We don’t run redundant power supplies on each server and instead rely on a redundant infrastructure. The load is distributed so if a server drops out, the application can continue to run smoothly until it can be repaired.

We run the datacenter at a balmy 82° F. With adequate airflow for heat removal, our equipment runs comfortably when warm, saving energy from cooling. To give us extra heat ballast for thermal load changes and to prevent static build up, we run a humidifier to boost the ambient humidity above 30%.

We’ve engineered a free-cooling air exchanger to make use of the cold and arid mountain air to cool the datacenter. When running at capacity it saves 4 tons (14kW) of cooling, which annually saves 75 tons of CO2 from the atmosphere, bringing our PUE down to around 1.27. According to the Uptime Institute’s 2012 Data Center Survey, our PUE is 32% less than the respondents’ largest data centers that average between 1.8 and 1.89 and is quickly approaching Google’s internal datacenter PUE of 1.12.

Technology makes genealogy possible, in a lean, mean, green, BIG way!