DSSD: Don’t try to attach a horse hitch to a Tesla

@sakacc shared a glance at what DSSD is doing during the last EMC World: DSSD Tech Preview (EMC World Day 3). This post is named based on the post’s lines”.. and if you REALLY need it to dress up as a storage target (a LUN/filesystem) that’s possible too… but frankly that’s like attaching a horse hitch to a Tesla.” DSSD is optimized to get the best performance through a direct HDFS interface for analytics tools. Don’t try to use it as SAN traditional storage array, you will waste your money.

DSSD is an amazing amount of NAND pooled in one system and optimized to bring the lowest latency (tens of microseconds). Also, bring as much as redundancy as any other storage array.

DSSD is not GA yet, seems that it’s being tested for some customers under EMC’s guidance.

One year ago, I’ve brought some notes about DSSD at my post: Commodity hardware it’s not he only option for the future. A post that I’ve shared after @sakacc visit us in Mexico. He let me share the following picture:

Mauricio Rojas PaaS EMC HDFS Hadoop ViPR ECS Isilon DSSD Python Cloud Foundry

HANA vs DRAM failures

Personally, I like SAP HANA and how it has changed the way to try business intelligence in real-time (or almost real-time). Creating virtual cubes of tens of dimensions in seconds at almost every user’s query. Some companies will not survive without assets like these among their most strategic ones. Competition is being much harder over the time in any industry vertical. Access to the right information ate the right time is the most important competitive advantage today.

You can use a scale-out interconnect cluster architecture to run HANA on. This will get you a nearly high level of availability (at least, better than having just one appliance, you would struggle badly to try replace it after a hardware failure ) and keep an amazing performance.

hana dssd emc mauricio rojas in-memory hadoop hdfs1

Storage will bring you data persistency in case of a compute failure. The failed host will be replaced by the stand-by one. the stand-by host’s memory will be loaded with the missed data from the storage. Of course, it’s not perfect, you will lose a small piece of data, storage can not run at the same speed as memory. However, most of the cases you won’t notice that you’ve suffered a failure.

This architecture is perfect for Agile Datamart or even BW/HANA configurations. You will get much better availability than trust your business in just one appliance; and an amazing response time at any BI dashboard shared to C-levels that want to know real-time CHURN for example.

On the other hand, Suite/HANA requires availability over performance. You will implement two node replicating memories between them to not lose any piece of data or get offline more than a couple of seconds. You should run your ERP on that.

hana dssd emc mauricio rojas in-memory hadoop hdfs 2

Replicate memory synchronously cost a lot of performance. Every block of data need to be copied through a high speed network to the other node’s memory; and bring the transaction acknowledge back later, before the latter, you won’t get any transaction committed to the application. Sadly that process turn nano-seconds into mili-seconds and HANA won’t be so fast like the previous architecture.

If DRAM fails, then all the data in it will be lost. There is the main reason that you need to choose among different architectures depending on availability and performance. Also, memory is extremely expensive to store big volumes of data.

DSSD: don’t sacrifice performance for availability

As you probably get it. DSSD brings highest non-volatile storage density at lowest latency. That brings a new dimension to analyze big volumes of data. Also NAND brings a much lower cost than DRAM. DSSD could be used with less complexity because its HDFS interface.

And the most important thing, you don’t have to sacrifice performance over redundancy choosing between different architectures. DSSD solves storage redundancy and caching through its technology and internal architecture.

hana dssd emc mauricio rojas in-memory hadoop hdfs 3

DSSD is still very new, it’s not GA yet. Specifications are still hidden.

According @sakacc some work is being doing with SAP HANA. Today, HANA could be connected to HDFS to store cold or inactive data. I’m sure that customers will not be wasting an amazing solution as DSSD for that. You should be able to run primary HANA’s data on HDFS. That will be awesome! You would run SAP DataMart and SAP Suite on DSSD with the reliability and the speed that your business would need.

Hopefully, DSSD should run HDFS 2.0 and let applications interact directly with elements like YARN. DSSD would be the state of the art of the next generation of hardware defined solutions. Maybe It would restart the new cycle of hardware over software. Well, let’s see.

See you next time!

Categories: big data, innovation, storage, technology

Tagged as: , , , , ,

2 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: