Project Arkham, Part 1 - Data centres are like Tribbles

Data centres can be a little like tribbles. You start with one, then expand to two, then five, then ten. You expand into new data centres for good reasons but soon they can become overwhelming.

Project Arkham is our 2018/2019 Data Centre consolidation project. There are a few outstanding network related items, however, it is now essentially completed. This gives me an opportunity to reflect on the why, what and how. Here I will cover the 'why' and expand on the 'what' and 'how' in future blog articles.

We've just inherited another two data centres as part of our Silverain acquisition so we may need a successor to Arkham. In reality, we do foresee these types of projects continuing as we acquire more businesses.

(Yes we are IT nerds at Zettagrid. Internal projects seem to be named after something in Star Wars/Gate/Trek, DC or MCU. This project is no different.)

Zettagrid Project Arkham

The Why

Over the years of operating our ISP and then Zettagrid, our data centre presence ballooned. This comes with a lot of tech baggage that starts to weigh you down. In the heady years of 2013-2016 it felt great to be expanding into new and wonderful data centres. We peaked at a rack count of 140 across 15 DCs.

As we matured, we realised that a high rack count and the associated networking became inversely proportional to the agility of our business and therefore a huge anchor to making significant change. Bigger doesn't always mean better or faster. In IT circles, egos can feel fulfilled by rows and rows of servers, petabytes of data and the sense of exclusivity afforded by a swipe card giving access to a highly secure data centre.

Our cloud business is infrastructure intensive. We also provide co-location services for customers which means that we will always require a reasonable data centre footprint. However, we began to understand that we needed to streamline our operations admin overhead and reduce our supporting costs. Our goal of simplifying the cloud is what we sell to our customers and partners so it was really time to take a good look at our own operation and see if we could do the same.

At our peak, we existed in 15 data centres. New South Wales: 5 (Equinix SY1 and SY4, Globalswitch, Vocus Doody St, NextDC S1), Victoria: 2 (NextDC M1, Pipe/TPG), Queensland: 1 (NextDC B1), Western Australia: 5 (NextDC P1, Vocus DC1, Vocus DC2, Vocus DC3, Vocus PerthIX), Indonesia: 2 (IDC3D, Equinix JK1)

We had good reasons for each of these PoPs to exist. However, over time it become difficult to justify, especially when operating models and equipment (compute, storage, networking) evolves to allow you to do something better. At a basic level, having this number of DCs with so many different providers comes with complexities:

When we created our first zones, way back in 2010, we chose providers with whom we already had relationships rather than dedicated co-location providers. This approach let us leverage our existing spending power but meant we weren't in the best interconnect locations.
We use A LOT of remote hands data centre services and so we found ourselves with different processes and charging rates with many different DC providers.
We have to maintain significant infrastructure in all locations which means capital expense in networking without significant customer benefit. How would you feel putting in redundant 10Gb routers and switches in locations with only a couple of customers for the sake of operational consistency? :(
We seemingly get charged for every variance of rack and power, sometimes individual, sometimes aggregated power.
Maintaining contracts for lots of data centres and racks is not really difficult if you do the basics of putting them into a contracts system or a spreadsheet but ensuring the executive, management and the projects team understand the window of opportunity to make a major change in the environment is the really important aspect.
At scale, a major cost input becomes POWER to compute density. We had generations of compute that consumed more power than 4 times the latest generation hosts. Quick back of napkin maths indicated that it was much cheaper to get lower power hosts. The same power/density equation arrived with storage.
Existing SANs took up 2-3 racks per instance and now we can fit the same storage in half a rack.

There are a few more background reasons but I think this illustrates the main problems.

The next part of this blog series will revolve around 'what' and 'how' we consolidated.

Blog

Project Arkham, Part 1 - Data centres are like Tribbles