“LEBM”: Cray Creating New Extension to LUBM Benchmark

I’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern in ... [ Read More ]

Urika-GX: A Game Changer for Big Data Analytics

There’s a lot of hype around big data in healthcare and the life sciences. But big data is here to stay. Information is what drives the entire industry. When I worked in big pharma, I learned that the product of a pharmaceutical company is not a pill, it's a label. And a label is just a specific assemblage of information, carefully distilled from terabytes of information collected and analyzed over the course of many years by many intelligent people. To compete, companies have to be very good at turning data into information, and information into knowledge. The stakes couldn't be higher, because every day millions of patients rely on the quality of this data and the strength of the analyses done by researchers. Analyzing big data is ... [ Read More ]

Graph: The Missing Link in Big Data Analytics

Graph analytics is gaining traction in the world of big data and IoT. From solving real-world problems such as detecting cyberattacks and creating value from IoT sensor data to precisely identifying drug interactions faster than ever before, graph has become a powerhouse in detecting never-before-seen connections and emergent patterns. It’s critical to understand how graph can be added to traditional Hadoop® and Spark™ workflows for successful results. Join us Wednesday, March 16, for a live online chat, “Graph: The Missing Link in Big Data Analytics,” to learn and discuss all things graph analytics. You can easily participate using a Twitter, LinkedIn or Facebook account. Hear from industry experts from Deloitte, Mphasis and Cray who ... [ Read More ]

Built-in Graph Functions Accelerate Discovery Analytics

We often encounter analytics use cases from customers or prospects where the analyst wants to select a particular facet of the data and deeply analyze it.  For instance, in the healthcare world, data includes patient (including genomic information), procedure, provider, billing and outcome data.  When trying to discover new insights into the data, the analyst often doesn’t know exactly what she’s looking for, so she needs to answer several high-level analytic questions about the data to guide further exploration. (And, of course, today’s data is usually not just big — it’s Big Data, so performance and scalability are important.) Assuming we’ve confirmed that the data is sensible, the high-level analytic questions often include the ... [ Read More ]

Don’t use a hammer to screw in a nail: Alternatives to REGEX in SPARQL

In the past I've talked about some tips for tuning SPARQL performance, but one interesting type of query that I didn't touch on comes up time and again as a SPARQL performance problem. In fact, more often than not, it is actually user naiveté that is the real cause. So what does this horrible query look like? Strangely, it is quite innocuous, and I bet a good number of you have written exactly this at some point in the past: SELECT * WHERE { ?s <http://some/predicate> ?o . FILTER (REGEX(?o, "search")) } While this looks like a perfectly sane query, the fact of the matter is that it is really anything but.  Any kind of FILTER in SPARQL involves iterating over all the possible solutions found at the point where the filter ... [ Read More ]