Beyond a datacenter and hardware fault tolerant strategy (part 2/3)

Ok, I’ve brought in my previous note a short overview how Tier IV datacenter are designed to deliver the best in class technology to take care of your IT stuff against any electrical or physical failure into the facility.

Later on, I’ve mentioned if you want to get instances from any of our clouds, you bet that any of our configurations has got the required redundancy to support any hardware failure (more info at

cloud stack design fault tolerant storage compute network pinrojas

All our shared clouds’ infrastructures are installed in our datacenters with a minimal redundancy as it’s showing in the previous picture.

Again, starting from left. We have the disk arrays configured into multiple disk groups. Traditional converged stacks use RAID-6. RAID-6 helps to protect your data against the simultaneous failure of any two spindles inside the group. When a disk fails, this disk is replace it by the spare, the storage system start to rebuild the RAID disk group, and this could last hours or days depending on the storage process capacity and the density of the disk – 1TB ATA Disks take to be rebuilt almost five times versus 600GB SAS ones – During the rebuild process, you are only protected against other disk failure thanks to the double-parity conditions of RAID-6 – In case of RAID-5, if other disk fails during this process then you will lose your data, and you will have to start recovering from the last backup copy  –

I will take advantage at this point to say that our Koolfit is using only SSD for block storage and everything is configured in RAID-1 to be protected against the failure of multiple disks.

Also there are uncommon cases where you can have multiple disk failures. all disks are enclosed in different shelves. Shelves, like disks, have redundancy to connect disks and storage controllers and use specific-purpose devices with special firmwares to handle it. You could get failures in shelves cause to a broken connection, or any bug in firmware, etc… Disks use firmware, since 1999, when I’ve started to work in this industry, I’ve got three cases where a bug or a bad release in disks’ firmware cause a multiple disk failure turning out in losing a big portion of data that took days to recover from tape, any RAID configuration is useless in cases like this – and there is chance that you don’t have other storage to replace this as fast as closing a “normally open tie” component in a datacenter’s electro-mechanic line –

Nobody can tell you are 100% protected against a firmware failure, it’s a very uncommon event, but it’s software, and there humans behind, and you can try to have everything under control, but, things fails anyway!

Probably, you can use two storage system from different vendors to avoid firmware issues…. but, also you’ll be adding more costs and complexity to the solution…

Also, there are other issues related to how to define your datastores in the Hyper-Visor level. Usually you can spread your datastore among different raid groups in order to get better performance. If one of the raid groups fails, you will affected in multiple datastores, and indeed, your virtual instances will crash, even if your datastores are using a small portion from this raid group. You can have a relation of one-to-one between datastore and raid-goup, but you will be sacrificing performance and agility to scale and solve throughput issues.

… I will bring you the honor to choose among price, performance and availability 😉

Going next to storage controllers, we need at least two, of course, for redundancy, there are other solution where you can get more than two. But, like the datacenter, you will require at least four if you want to avoid any kind of risk during any kind of maintenance. On the other hand, you will require that all the load could be manage at least from just one controller. And of course, storage controllers have firmwares into their motherboards, PCI cards and memories that need to be updated every quarter 😉

I have to say that storage is the most critical resource in any cloud hardware stack. The big different when you have to manage this sort of redundancy in comparison with the facility’s systems like UPSs, generators or chillers, is the amount of TBs of customer’s data. However, the facility’s system are not exception to any uncommon firmware failure, indeed, facility’s power/cooling systems use advanced firmwares to automate tasks, the most of the time.

Data means everything for our customer – or at least the 99% of everything- You can recover any storage system, switch, server or even any facility’s heavy duty system from failure, but it’s useless if you couldn’t restore the customer’s data.

sticking back to our review, after storage, you have switches and servers. Switches today manage storage, user and application data in a unified fabric concept. With the new solutions of Software Defined Network, you can get much more agility and flexibility. The complexity of the solution is well-understood when you think that you are managing a lot of network critical information regards of every customer’s tenant. Of course, you need at least two appliances for redundancy. One switch need to at least support the all stack traffic in case of any appliance’s failure. You can have more than two switches for better protection, but it requires skilled settings that are not easy to maintain, and a cost that need to be charge to somebody.

You can be amazed seeing the beauty of the design of any Tier IV facility, but without a strong network core design, it’s so useless like to have the most powerful laptop without a robust network connection. Core switching manages hundred of times more complexity that any cloud unified fabric. You will face several tons of configuration lines into the core switches to support the aggregation components to communicate every rack, inter-site or inter-cluster datacenter connections, multihomed internet services and private customer’s links. Changes and maintenance tasks requires a constant exhaustive review and cross authorizations between different interested parts to avoid any mistake that could turn out in a complete disconnection of the facility, and indeed, of the thousands of servers.

Hyper-Visor is other vital component. You will build a cluster between different physical servers in order to support any failure of them and keep most of instances up and running. Usually an hyper-visor fail-over means that some instances need to be automatically restarted in other physical server, whatever that means to your applications, you will have to take the precautions to automate your service to hopefully survive after this event. We’ll see this topic in the next note much deeper.

Hyper-Visor also use an external management application to distribute performance between compute physical nodes, and even to manage failure and trigger the associated recovery process. Careful, A failure at this component could turn it out into a serious issue if you don’t take the required precautions bringing enough redundancy to it.

Well, the next picture will show you the most common configuration of instances into a cloud service.

cloud instances web app achitecture pinrojas

Looks ridiculous! right?

But, it’s the ugly true. Most of the customers defines their application instances based on this non-redundant logical architecture.

You can tell me: what about all the redundancy that we have from the facility and hardware?. And I will tell you, how will you feel when your web service stop to work quick after one of your most trusted operators has deleted a DB instance by mistake?

Don’t worry, I will give you my answer: All the redundancy at the facility/hardware means nothing! It’s like to have a safety box to secure your most precious things and not get it locked because you’ve just forgotten to do it.

See you next Tuesday with the last part.

One thought on “Beyond a datacenter and hardware fault tolerant strategy (part 2/3)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s