Ceph is not just a JBODD (Just a Bunch of Dumb Disk) Technology with an amazing algorithm – see my previous posts for more information about CRUSH- to manage data location among nodes. You can get much interesting features to help you saving money or being more productive with your time.
Let’s start from the file system used as the most basic component by the Object Storage Deamons – OSDs are in charge to manage and store data objects into file systems at every pre-assigned disk devices in every storage node. Also, I will use this opportunity to strongly suggest you to split disk to store OS’s files and Ceph’s user data in every storage node -. OSDs’ filesystems could be xfs, ext4 or btrfs – these file systems are pretty know at the linux’s community and they are used for different purposes – Ceph suggest to use first two (xfs,ext4) to store data for production applications for their stability and maturity. However, besides btrfs is a pretty new guy in the team, btrfs supports features like transparent compression and writable copy-on-write snapshots, also btrfs will support deduplication and encryption in a near future making it the right choice to offer a more compelling storage service – you can start playing with non-critical applications – btrfs and xfs bring better and faster data recovery functions against failures and system’s crashes thanks to their journal area/log to save user data changes before to commit them to the main filesystem area.
Well, Ceph could extend its features depending on the OSDs’ file systems of choice.
As you can see, Ceph is not only defined by CRUSH algorithm, there are a lot of components that helps Ceph to be what is now and what will be later – Ceph is adding more users and features over the time-
Ceph is adding more logic components like the Placement Groups in order to bring more flexibility to add, remove and change hardware components with a minimal impact to its entropy. The smart move was to avoid any direct relationship or tight coupling between Clients and Hardware, to be more specific, between Clients and every Disk’s file system (or OSD Deamon). Placement Groups can work on more than a one OSD and bring the enough abstraction to the Client and make it easier to rebalance, grow, shrink and recover OSDs with not or minimal impact to the stored object at the time of the change comes up. Therefore, Ceph uses CRUSH to assign Placement Groups to Client’s Data and to assign OSD Deamons to Placement Groups.
It’s important to say that It’s almost impossible to address every hardware disk’s sector to user’s data and also be able to scale-up to PetaBytes keeping an amazing performance in a Cluster like this, even for CRUSH.
Ceph also brings deep and light scrubbing. Deep scrubbing check OSDs looking for physical failures at disks and Light scrubbing looks for inconsistencies between OSDs among different PGs comparing replicated data objects’ metadata. This process is doing daily and helps to bring better availability identifying failures and fix them up in advance.
Cache tiering is something also very important for block storage solutions. Ceph is not the exception. Ceph is offering also transparent tiering Cache for Ceph Clients. You can add SSD or Flash Cards to cache IO and bring an awesome difference to perform transaction data – we’ve got almost 2x for read IO performance bringing cache at every node according our lab tests at every virtual server connected through OpenStack Cinder – Believe me, It’s totally worth the money.
Well, see you next!