#Ceph: The most affordable flash storage to speed your Cloud (our labs results with #OpenStack)

I want to share with you some results of our work with #Ceph with #OpenStack Cinder. We are working with three servers with SSD drives, and also Flash Cards as our journal and cache. We’ve executed three write tests with different sizes (1024MB, 2048MB and 4096MB) to Ceph devices from a virtual machine provisioned on KVM through OpenStack – the disk was provisioned through Cinder – using  “dd if=/dev/zero bs=1M”. As you can see in the below picture, we’ve got results from 529 MB/s to 782 MB/s… and again, we’ve got it from a virtual server. How much should you have to pay to a traditional storage vendors to get such performance results? I will let you figure it out by yourself.

test performance with DD ceph openstack virtual server

Below you can see a status of our Ceph’s cluster and the Placement Groups versions that we’ve got using the command “ceph -w”. We are using 44 OSDs – OSD is the object storage daemon for the Ceph distributed file system. It is responsible for storing objects on a local file system and providing access to them over the network (more info at “Ceph-OSD – Ceph Object Storage Daemon–  and three monitors for redundancy.

status ceph-w openstack mauricio rojas kio

Also we have set two copies for any stored data to provide a better protection against hardware failures. What is A placement groups? well, I will share a couple of sentences with the definition and its advantage (more info at “Monitoring OSDs and PGs”):

  • “A Placement Group (PG) aggregates a series of objects into a group, and maps the group to a series of OSDs”.
  • “Placement groups reduce the number of processes and the amount of per-object metadata Ceph must track when storing and retrieving data”

Next picture shows a OSD’s tree and its weight and status into the Cluster

osd ceph status weight openstack mauricio rojas kio

with this speed in your cloud, users will be more than happy.

Well, see you around

10 replies »

    • 10gbe. What you mean local disks? are you sharing local disks on your compute? share your architecture, I don’t think this depends on Hyper-Visor. This is an external storage array. thanks for comment

  1. I’m not sure that 530MB/sec against 44 SSDs should be considered particularly good performance. Also, given that the test is writing all zeros, most storage systems will optimize that to a null operation and be insanely fast… that Ceph isn’t optimizing writing of zeros is surprising.
    For a real VM-type workload, try smaller block sizes – 4K / 8K / 16K. Ceph’s results there aren’t a strong as it’s really been optimized for large IO / object type workloads.
    As for the comment on the cost… even if you consider the software “free” (which it only is if you don’t pay for support), Ceph is still significantly more expensive when running on SSD than a leading-edge all-flash storage system. Why? Because it lacks inline data reduction, which can give you 4X or more effective capacity on the same hardware.
    Ceph certainly has some good use-cases in a cloud environment, but as a SSD-based block-storage system, it really don’t have very good performance or economics today.

    • Hi Dave, it is normal to perceive less Mb/sec because of the conv=fdatasync where we want to guarantee to sync and write to the disk before dd exits, otherwise we obtain values more than 800Mb/s because the data is stored in memory.
      Also i agree the real workload and testing procedure needs to include smaller block sizes, in a further we will show you in a new post these results, including iozone and sequential read and write depending on the number of VMs and leveraging the journals with the PCI Flash Cards with XFS filesystem.
      Regarding the costs, i can guarantee you the cost per Gb in SSD with Ceph including raw and replicas it is one of the lower around the market, at least the latin american market, i agree Ceph does not provide compression or data reduction but maybe trying btrfs with LZO or Zlib for compression you can get 80-90% more usable space, but as you know is not ready for production purposes so depends on the risks you want to take.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: