#CRUSH: Distributing the low-level block allocation problem

I remember when I started to work with some Block Storage Arrays – or SAN storage systems – and how vendors those days were committed to bring savings through a disk consolidation project. A solution that caused some confusion when you realized that you had to pay three times more for no so too much performance and capacity – consolidation was the trend on those days –

I remember when those storage vendors offered me copy licenses – that required much more capacity and money of course – to speed a database’s clone creation or an image recovery process. Despite having all these important mentioned features, we ran with a lot of problems to make it works on time when we really needed it, usually because they were so complex to use that we couldn’t afford to hire PhD to take care of it.

I remember we were pretty excited to use SAN Storage replication features between sites. However, they were so limited by bandwidth, distance, and very unstable and difficult to manage – you couldn’t miss any LUN or storage device to get a successful and consistent service recovery, if you missed to replicate one of these, because the DB Admin had extended a tablespace’s capacity and he didn’t acknowledge you of it… then all those millions spent on software/hardware went straight to a “non-recycle” bin –

Anyway, those SAN storage solutions were the only way to get that level of data management capabilities those days.

ceph openstack kio mauricio rojas crush cloud storage cinder

Before continuing, you should take a look to my previous note to be up-to-date about How CRUSH, based on our experience, helps you to build the storage that fits your business requirements.

Usually block storages have an index for every stored block at their corresponding disk pool. In old days, this indexing used to be managed at low level by specific pieces of hardware or controllers. Today it’s just pre-loaded software in a more standard piece of hardware with a reduced and optimized version of some known operating system in order to lower costs and get more flexibility. IMHO, the change from specific to a standard hardware was mostly due to the lower cost to acquire a much higher power to compute directly from the market against keeping manufacturing them.

Commonly object-based storage solution are condemned to store static data that probably will be archived forever. This data rarely changes and is focused to publish via REST photos, docs, videos… and is easily cached and accelerated through Content Delivery Systems. This data is non-transactional – I know Chad will like this part – and it could be stored in SATA Disks and replicated geographically among different sites.

However, #Ceph has come up to break this paradigm about object versus block, about non-transaction versus transactional data. #Ceph uses #CRUSH, and #CRUSH allocates objects instead blocks in order to simplify the data layout distributed along the nodes. in case of #Ceph and its version of a Block Storage system that works through RBD (RADOS Block Devices), blocks are striped inside the objects, then CRUSH is kept as the responsible to distribute the data among OSDs (Object-Base Storage Devices ).

I strongly suggest you to read the note “OpenStack Foundation Survey Cites Ceph as a Leading Distribution for Block Storage” at Ceph’s Blog Site. They are mentioning that 1780 OpenStack users responded the survey worldwide and “20 percent of respondents indicate Ceph is their block storage driver of choice for production clouds”…Amazing! We are not the only ones that believe that!

Despite the fact there are many block storage options with integration to #OpenStack #Cinder and”facing a similar data distribution challenge” (I’ve got this quote from Weil’s Doc), CRUSH have completely changed the way to deal with that: CRUSH is not using any metadata directory at all, to query and locate the data that it has been requested, then CRUSH wipes out from your mind any concern about how metadata can affect the response time of any storage transaction – again, see my previous note for more details-

Thanks to CRUSH, block devices are faster and their associated workload is evenly distributed among drives to get a better throughput. Also, through a right bucket definition and configuration, and its corresponding replication rules, you can assure an high reliability with a much lower operation cost.

Objects or blocks themselves don’t solve most data management issues, you need a clever low-level algorithm like CRUSH to start as the foundation to deal with them and turn your storage into a real scale-out, rapid and simple solution…

However, CRUSH is not bringing a solution to all data management requirements that include storage efficiency features like deduplication, compression and thin-provisioning, or data copy/protection tools like clones and snapshots. These requirements have been covered by #Ceph through other pieces of software that I will develop in my next note.

See you around!

One thought on “#CRUSH: Distributing the low-level block allocation problem

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s