LEBM: Cray Creating New Extension to LUBM Benchmark

We’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern ... [ Read More ]

Urika-GX: A Game Changer for Big Data Analytics

There’s a lot of hype around big data in healthcare and the life sciences. But big data is here to stay. Information is what drives the entire industry. When I worked in big pharma, I learned that the product of a pharmaceutical company is not a pill, it's a label. And a label is just a specific assemblage of information, carefully distilled from terabytes of information collected and analyzed over the course of many years by many intelligent people. To compete, companies have to be very good at turning data into information, and information into knowledge. The stakes couldn't be higher, because every day millions of patients rely on the quality of this data and the strength of the analyses done by researchers. Analyzing big data is ... [ Read More ]

Graph: The Missing Link in Big Data Analytics

Graph analytics is gaining traction in the world of big data and IoT. From solving real-world problems such as detecting cyberattacks and creating value from IoT sensor data to precisely identifying drug interactions faster than ever before, graph has become a powerhouse in detecting never-before-seen connections and emergent patterns. It’s critical to understand how graph can be added to traditional Hadoop® and Spark™ workflows for successful results. Join us Wednesday, March 16, for a live online chat, “Graph: The Missing Link in Big Data Analytics,” to learn and discuss all things graph analytics. You can easily participate using a Twitter, LinkedIn or Facebook account. Hear from industry experts from Deloitte, Mphasis and Cray who ... [ Read More ]

Don’t use a hammer to screw in a nail: Alternatives to REGEX in SPARQL

In the past I've talked about some tips for SPARQL implementation and performance, but one interesting type of query that I didn't touch on comes up time and again as a SPARQL performance problem. In fact, more often than not, it is actually user naiveté that is the real cause. So what does this horrible query look like? Strangely, it is quite innocuous, and I bet a good number of you have written exactly this at some point in the past: SELECT * WHERE { ?s <http://some/predicate> ?o . FILTER (REGEX(?o, "search")) } While this looks like a perfectly sane query, the fact of the matter is that it is really anything but.  Any kind of FILTER in SPARQL involves iterating over all the possible solutions found at the point where ... [ Read More ]

Equality and Inequality in SPARQL

In previous blog posts I've touched a little on equality and inequality in SPARQL; in this post I'm going to look at some of the more confusing aspects of these (and SPARQL expression semantics in general) that can surprise even seasoned SPARQL developers like myself sometimes. Previously, I introduced the fact that the equality operators in SPARQL (= and !=) represents value equality and inequality. This means that non-identical RDF terms can be considered equal/non-equal if they represent the same value. Consider the following example: PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT ("1"^^xsd:integer = "01"^^xsd:integer AS ?equals) WHERE { } The two RDF terms are not the same term, but they both represent the value 1 so ... [ Read More ]