“LEBM”: Cray Creating New Extension to LUBM Benchmark

I’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern in ... [ Read More ]

Making Sense of 50 Billion Triples: Getting Started

In my last post I outlined some of the reasons why the promise of graph analytics takes thought and planning to really capitalize on its potential. Now, I’d like to focus on getting started, appropriately, at the beginning with understanding the data itself. So let’s discuss methods of gaining an initial understanding of our data such that we can then feed that newfound understanding back into our ingestion process. First, we need to be able to query the database for some basic information that will tell us not only what is there but how the relationships are expressed. We would also, ideally, find out if there is a hierarchy to the relationships we find. By hierarchy, I mean that graphs can use containers, if you will, to organize ... [ Read More ]

Urika-GX: A Game Changer for Big Data Analytics

There’s a lot of hype around big data in healthcare and the life sciences. But big data is here to stay. Information is what drives the entire industry. When I worked in big pharma, I learned that the product of a pharmaceutical company is not a pill, it's a label. And a label is just a specific assemblage of information, carefully distilled from terabytes of information collected and analyzed over the course of many years by many intelligent people. To compete, companies have to be very good at turning data into information, and information into knowledge. The stakes couldn't be higher, because every day millions of patients rely on the quality of this data and the strength of the analyses done by researchers. Analyzing big data is ... [ Read More ]

Graph Databases: Key Thoughts from Online Chat

It’s pretty interesting to see graph analytics gain traction in the work of big data. We’ve been focusing on graph databases to round out Hadoop® and Spark™ ecosystems and allow for more advanced analytics — and enable people to uncover never-before-seen patterns. (Tell me that’s not cool!) From solving real-world problems such as detecting cyberattacks and creating value from IoT sensor data to precisely identifying drug interactions faster than ever before, graph has become a powerhouse in looking at complex, irregular and very large datasets to identify patterns in near real-time. On March 16, we hosted an online chat titled “Graph: The Missing Link in Big Data Analytics” with industry experts from Deloitte and Mphasis. Sixty-one ... [ Read More ]

How CGE Achieves High Performance and Scalability

In our graph series so far, we have explored what graph databases are and when they are valuable to use, as well as the Cray Graph Engine (“CGE”), a robust graph solution. For this last installment, we dive into how hardware affects the performance of a graph database. Cray’s main product line, the XC™ series, is mostly used for scientific computing. From the point of view of an applications programmer, there is an important difference between scientific computing and the kind of computations done on a graph database. Programmers call it spatial locality. In a nutshell, if a computation has a lot of spatial locality, when a computation has to fetch some value from memory, the next value it’s going to need is usually stored nearby in the ... [ Read More ]