In the immediate aftermath of an enthralling World Series (go Giants!), it’s not surprising that it’s taking a while to get out of baseball mode.
I did not have the benefit of growing up with America’s pastime, but I do know there’s quite a bit of strategy in the game. The value of getting the matchup part of the strategy right is evidenced by the cutting-edge research in this space, but that’s a topic for another day. Another concept brought up at various points in the World Series that I found intriguing was batters “going with the pitch” to increase their chances of a positive result on the play. This means recognizing the pitch and hitting the ball to right or left field depending on what kind of pitch was thrown, in the direction the pitch most naturally wants to go.
It has occurred to me that “going with the pitch” is applicable to the world of big data analytics as well. In other words, watch intently for the trends in big-data analytics, and adapt your approach based on where analytics is headed. In our conversations with customers and analysts, and as we keep our finger on the pulse of the market, we definitely see a move toward in-memory analytics in the Hadoop® ecosystem.
The Rise of In-Memory Analytics
Hadoop started out targeting batch-oriented processing, such as the construction of web indexes at well-known Internet companies, but more recently we have seen a demand for and an emergence of low-latency processing as Hadoop adoption grows. It is in fact the movement of Hadoop into the mainstream that necessitates the need for lower latency. For every data scientist who is accustomed to longer analytic cycles, there are potentially hundreds of business analysts who have expectations of short query times, interactive exploration and instantaneous visualizations.
Another factor is the proliferation of streaming data that has to be processed as soon as it arrives. Consider the growth in data from sources such as social media, mobile devices, and sensors. If there is an area that is as hot as big data right now, it would be the Internet of Things, which promises to demand even more complex analytics on vast amounts of high-velocity data. Decisions often have to be made in short timeframes based on data arriving in real time in order to capture opportunities as they occur, driving the need for low-latency performance.
The economics of in-memory computing have certainly helped speed this trend. It is now quite realistic for organizations to purchase multiterabyte memory for analytics at reasonable cost, a scale at which useful analytics can be performed. Add to that the availability of open-source software predicated on memory-centric processing, and in-memory analytics is no longer for the rarefied few.
Going with the In-Memory Pitch
Recognizing the importance of in-memory analytics early on, Cray formulated a strategy that targets this next generation of analytics. Cray launched the Urika-XA™ extreme analytics platform at Strata+Hadoop World in New York in October. Coming pre-integrated with the Apache Spark™ in-memory cluster computing framework and possessing a beefy 1,500 CPU cores and 6 terabytes of RAM in a single rack, the Urika-XA platform is perfectly set up for the new wave of large-scale data analytics.
In a not entirely unexpected coincidence, Spark featured heavily at the Strata conference in product announcements and speaking sessions. The industry concurs that in-memory processing is the direction in which we are headed. This is the ball that is bearing down on you at the plate. Will you go with the pitch and build your analytics infrastructure with next-generation in-memory analytics in mind? If you do, we fully expect you to deliver a solid hit.