“LEBM”: Cray Creating New Extension to LUBM Benchmark

I’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern in ... [ Read More ]

50 Billion Triples: Digging a Bit Deeper

In the last two installments of this series (part 1 and part 2), we discussed some higher-level thoughts on striking a balance between easy ingest and more prep work as well as some initial queries to get a sense of an unknown graph’s structure and other characteristics. The queries that we have run to this point were intended to discover structure within our dataset at a macro level. The algorithm that we will run now requires us to consider the dimensionality of our graph. In other words, using algorithms such as centrality or community detection on the entire graph without context is meaningless; we need to run these algorithms on subsets of the data. Prior to delving into the following queries, a quick note: All of the algorithms ... [ Read More ]

Making Sense of 50 Billion Triples: Getting Started

In my last post I outlined some of the reasons why the promise of graph analytics takes thought and planning to really capitalize on its potential. Now, I’d like to focus on getting started, appropriately, at the beginning with understanding the data itself. So let’s discuss methods of gaining an initial understanding of our data such that we can then feed that newfound understanding back into our ingestion process. First, we need to be able to query the database for some basic information that will tell us not only what is there but how the relationships are expressed. We would also, ideally, find out if there is a hierarchy to the relationships we find. By hierarchy, I mean that graphs can use containers, if you will, to organize ... [ Read More ]

Big Data Advantage, part 3: “The Dude Abides.”

In my prior two posts about analytics, I highlighted the vast opportunity available in big data and the obstacles that prevent organizations from attaining tangible benefits: Complexity across fronts An onslaught of analytics tools The difficulty retaining the right skillsets Slowdowns in getting to insights and decisions But these hurdles can be overcome. For innovative businesses grappling with the realities of big data, an agile analytics environment provides the best of all approaches. Such a platform enables you to seize your big data advantage with a potent combination of system agility and the pervasive speed needed to deliver high-frequency insights. To address this need, Cray has fused supercomputing technology ... [ Read More ]

Making Sense of 50 Billion Triples: No Free Lunch

A lot of grandiose claims have been made promising that graph databases would allow easy ingest of all manner of disparate data and make sense of it ­– and uncover hidden relationships and meaning. This is, in fact, possible — but there are a few considerations that you need to account for to make your database useful to an analyst charged with making sense of the information. There simply is no free lunch; where time and effort are saved in one place, they must be expended (at least partially) elsewhere. Let’s take a look at the fundamental difference between graph databases and relational databases from which these claims stem: Rather than store data in rows and columns, graph databases store data in a simpler format that describes a ... [ Read More ]