Beyond a datacenter and hardware fault tolerant strategy (part 3/3)

Finally, we are in the last part.

My previous note has almost finished with the sentence: “Most of the customers defines their application instances based on this non-redundant logical architecture”. The next picture illustrate the exact meaning of this. All the redundancy at the facility/hardware means nothing when you lose a virtual instance cause to a human mistake. You will have to recover it from the last backup copy and it could take hours or days depending how much data you’ve lost.

cloud instances web app achitecture pinrojas

Data is the most important assess and the most difficult to care. Data is an untouchable resource, like a virtual instance. However, being logical and untouchable, it doesn’t mean it won’t need a logical solution for redundancy and protection.

Backup copies must be your last resource to recover. But usually it seems to be the only one – and the slowest and less trusted one –

Let’s figure out how to build a redundant solution to any web application on the cloud. You just need to think that any of the instances can fail, at the less expected time and day, and no matter what happened with it, your application must be to stay online…

Cloud brings some additional risks, but also brings agility and flexibility to build a much robust fault-tolerant application, even facing failures into the cloud hardwares stack – involving storage system failures – and issues at the facility itself – like a failure at the datacenter core switching service –

Things fail my friend, this is because nobody has stated their solutions are 100% available.

Please, see the next picture and compare it with the previous one.

design to fail pinrojas kio networks redundant cloud instances web application design architecture

Oh Yeah! it’s so much better! Right?

Of course, you don’t need to create three copies of objects and databases by yourself into different HA Zones. Most of the providers has already done it providing just the connection interface or the APIs to communicate the rest of the instances with them. And they will take every copy of it into different HA Zones (High Availability Zones) or Sites for you. HA Zones are independent clusters or sites or at least independent cloud hardware stacks – and different storage systems –

You will add load balancers with advanced features that help you to remove the failed instances until you can recover them. Content Delivery Networks can bring some kind of help on this area.

You are protected against facility, storage, server, hyper-visor failures. But also, against human mistakes. Most of the cases your users won’t notice that you’ve got a failure – And you won’t have them breathing over your neck-

The objective is simple: don’t put all your eggs in just one basket.

And this last statement goes also to your data protection strategy – your application will be in a highly redundant solution, but all this redundancy will not avoid to get disrupted because somebody accidentally delete a table in your production database -. You need more than one option to recover your data: snapshots spread on different sites, backup copies at external disk systems.

… don’t forget to consider that you will need to dedicate enough bandwidth between your platform and your backup solution to meet your Recovery Time/Point Objectives.

Not all the applications could support this kind of architecture. However, If you can change or adapt it into this, you will be rewarded with sweet dreams removing any concern of getting undesirable calls through out the night.

Well, thanks very much for reading this note, and I will see you in the next topic.

Happy new 2015!

1 reply »

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: