Cray, the stalwart of supercomputing, is evolving and changing the face of Big Data with it. For us ardent HPC followers, we’ve known Cray for precisely what it is: the Ferrari of computers – high performance machines that have been instrumental in pushing the boundaries of what’s computationally possible with bits and bytes.
Cray, the company founded by Seymour Cray over three decades ago, has had an interesting evolution but has always excelled at one thing: helping researchers solve some of world’s most challenging problems. Cray is no different today; however, the computing needs of society have drastically changed.
What’s also interesting is how our need for computing has changed. From simulating massive galaxies to molecular simulations, supercomputing is still primarily employed by researchers on the cutting edge of science but that doesn’t mean supercomputers aren’t being used to solve interesting problems in other areas.
It’s mind-boggling how much data we generate today. Managing all this data is a challenge in itself, but the more interesting answers lie hidden away in the terabytes of data that are being generated every moment. Fortunately as a pioneer in its field, Cray has always been in the forefront when it came to managing and making sense of lots of data. At a time when the rest of the world was just getting acquainted with email, Cray systems were already storing and analyzing massive amounts of data for the time. So as far as Cray is concerned, Big Data is nothing new.
Driven by commercial needs, businesses are recognizing the Big Data problem. Realizing the potential for data driven analytics and that graph problems in Big Data are different, YarcData was created to focus on the needs of this new market segment (You probably guessed by now that Yarc is Cray spelt backwards).
YarcData’s main product is the Urika™ graph analytics appliance. What sets Urika apart from almost every other so-called Big Data appliance out there is its architecture, which Cray junkies will recognize as being based on the Cray XMT II or Extreme Multi-Threaded. Those of us not familiar are probably asking why is this relevant?
Well, mostly because Cray designed the XMT architecture to solve a specific class of problem: one that is characterized by lots of data, is highly branched and requires a very high degree of parallelism to efficiently process. And more importantly, process this information in real time, not hours, days and weeks in some cases.
Whereas commercial processor architectures rely on data locality and high speed caches to hide memory latency, pointer-chasing problems don’t have much data locality; hence, caches offer little benefit. Instead, having multiple threads simultaneously fetching data from memory offers a better solution. Each XMT II processor in Urika can handle up to 128 threads of execution and a fully configured system can have as many as 8192 processors. That’s over a million threads in a full system.
In-memory is another term that is gaining a lot of traction in the commercial database world nowadays. The principle is that if your entire database fits in RAM (Random Access Memory) and never has to access the disk, and RAM is an order of magnitude faster than disk, then your application runs faster. It’s a simple premise and one that Cray designed into the XMT architecture over a decade ago. In fact, a full Urika system offers 512TB of RAM and the entire address space is globally shared. The memory and processors are interconnected via Cray’s high-speed SeaStarTM network.
As with any appliance, it’s the software that delivers a compelling product. With Urika, it’s the in-memory graph database that truly differentiates it. But why a graph database? The answer here is twofold. Firstly, graph databases are becoming increasingly prominent as the basis for the next generation of analytics because of their extreme flexibility.
Commodity hardware still struggles when it comes to running graph databases. Secondly, it just so happens that the Urika architecture is ideal for a graph database given its massive multi-threadedness and large shared memory. Urika is so effective at running graph databases that queries that take hours on commodity hardware can complete in seconds.
At this point you’re wondering what sort of real world applications would need such capabilities. Although you could use Urika as a traditional database just to store data, its intended use is quite different. Urika is a data discovery platform; in other words, you would use Urika to find non-obvious patterns in your data. Traditional databases are great if you know what you need in your data, but are very inflexible and cumbersome otherwise. Data discovery is about finding what you don’t know, so you can’t restrict yourself to a few data sources.
Urika lets you add new data sources and see how data elements are connected in ways you never knew, find new patterns, validate hypothesis and do so iteratively and quickly. Although this sounds elementary, trying to do this on existing Big Data tools at scale is painfully slow and takes months. Urika lets you do this in minutes.
Urika is being used in cancer research, fraud detection in financial services, cybersecurity, web analytics and many more commercial applications where analysts have the observational data but lacked the right tools to process it. Urika is finding how various activation pathways between proteins and genes and helping develop the next generation of customized medication.
We’re only at the forefront when it comes to novel applications of graph analytics in Big Data. Although graph based solutions have existed for a while, a major hurdle to their widespread adoption has been the lack of a scalable hardware platform. Urika addresses this problem and once again a Cray innovation is leading the way. And so the tradition continues at YarcData, as with Cray; we’re still helping people solve some of the most challenging problems out there.
If you have any questions, please feel free to comment or reach out to me at firstname.lastname@example.org.
Adnan Khaleel, Business Solutions, YarcData