Two pieces of advice: Don’t let Cluster Sprawl happen to you… and, Don’t drown in the Data Ocean, ride the wave
Another exciting conference week in Las Vegas has come to an end. (Unless you count the flight home… then it still has a few more hours to go.) This time it was the 1st IBM Interconnect conference, combining 3 previous events into a mega event focused on innovations for the Cloud era. From what I saw and heard from others it was a terrific event – kudos to all the teams involved.
The conference was a great opportunity for me to speak with clients, partners, prospects and IBMers about Spectrum Storage and Platform Computing. It also turned out to be a great opportunity to brainstorm a bit with our new marketing VP, Eric Herzog , about how to communicate the value of this Software Defined Infrastructure portfolio in a way that is clear and concise.
Ideas crystalized and were tested live with press and analysts, clients and sellers. My top 2 are condensed in the title of this post.
IBM Software Defined Infrastructure provides a unique set of capabilities that help our clients:
- Avoid cluster sprawl by using all compute and storage resources as a single pool that is efficiently shared among a broad set of large scale, high performance applications and analytics
- Safely ride the wave in an ocean of data by deploying highly efficient and deeply integrated storage solutions, with unprecedented deployment flexibility: as software, as a service, as a system – on-premises, in the Cloud and across hybrid environments
What the heck is Cluster Sprawl and who cares?
Remember the days when folks put each new application on its own physical server with its own storage? Remember how large those inefficient server and storage “farms” grew? And how much money was spent on underutilized resources? And how much more money was spent on evolving them to a virtualized compute and storage environment where a physical resource could be shared among many apps? (Following the well proven lead of IBM z Systems, a.k.a. the mainframe)
Well it is starting to happen again. New generation apps and analytics are increasingly looking like traditional high performance / supercomputing workloads that rely on compute and storage clusters to handle large volumes of data at high speed through parallel processing. As each new scale-out application appears, a new cluster appears to run it. With the expected result: a growing number of underutilized clusters that are costing their owners more than they should. What’s worse is that the apps and analytics are often running slower than they could if unused resources in other clusters could temporarily pitch in to help.
One of our clients, a global financial services provider, has prevented cluster sprawl by implementing a software defined infrastructure with IBM Platform Computing and Spectrum Storage – reducing costs and increasing performance of some workloads by 100x (That is not a type-o, 100 times… not 100% which = only 2x)
In a world where faster business processes and deeper business insights are increasingly run on platforms such as Hadoop, Spark, Cassandra as well as traditional scale-out databases and data warehouses, maybe the better follow-on question to “What the heck is cluster sprawl?” should be: “Is there any organization that can afford to not care?”
When did data pools overflow data lakes and become a Data Ocean?
I will save this answer for my next blog entry. If you can’t wait, watch Eric Herzog, Live from Interconnect 2015.
@Bernie Spang, this blog is a reflection of what we heard everyday in the field about customers challenges managing clusters. Many customers run their own workload schedulers, and operate on the assumption that they are running on their own dedicated infrastructure. The end result is the be “cluster sprawl” challenge that you described (multiple underutilized and costly infrastructure silos, each dedicated to its own set of applications). Thanks for the blog.