Data Analytics Rule at Spark Summit West 2017

Spark Summit West is always well attended, and this year was no exception. Data engineers, data scientists, programmers, architects and technology enthusiasts descended on San Francisco’s Moscone Center earlier this month to learn all about the latest developments with Apache Spark™ and its massive ecosystem. Complexity of analytics use cases and data science was a dominant theme throughout this year’s event. The keynote by the CEO and co-founder of Databricks, Ali Ghodsi, highlighted some of the challenges with implementing large-scale analytics projects. Ghodsi discussed how the continued growth of Apache Spark has resulted in myriad innovative uses cases, from churn analytics to genome sequencing. These applications are difficult to ... [ Read More ]

“LEBM”: Cray Creating New Extension to LUBM Benchmark

I’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern in ... [ Read More ]

Making Sense of 50 Billion Triples: Getting Started

In my last post I outlined some of the reasons why the promise of graph analytics takes thought and planning to really capitalize on its potential. Now, I’d like to focus on getting started, appropriately, at the beginning with understanding the data itself. So let’s discuss methods of gaining an initial understanding of our data such that we can then feed that newfound understanding back into our ingestion process. First, we need to be able to query the database for some basic information that will tell us not only what is there but how the relationships are expressed. We would also, ideally, find out if there is a hierarchy to the relationships we find. By hierarchy, I mean that graphs can use containers, if you will, to organize ... [ Read More ]

Urika-GX: A Game Changer for Big Data Analytics

There’s a lot of hype around big data in healthcare and the life sciences. But big data is here to stay. Information is what drives the entire industry. When I worked in big pharma, I learned that the product of a pharmaceutical company is not a pill, it's a label. And a label is just a specific assemblage of information, carefully distilled from terabytes of information collected and analyzed over the course of many years by many intelligent people. To compete, companies have to be very good at turning data into information, and information into knowledge. The stakes couldn't be higher, because every day millions of patients rely on the quality of this data and the strength of the analyses done by researchers. Analyzing big data is ... [ Read More ]

Graph Databases: Key Thoughts from Online Chat

It’s pretty interesting to see graph analytics gain traction in the work of big data. We’ve been focusing on graph databases to round out Hadoop® and Spark™ ecosystems and allow for more advanced analytics — and enable people to uncover never-before-seen patterns. (Tell me that’s not cool!) From solving real-world problems such as detecting cyberattacks and creating value from IoT sensor data to precisely identifying drug interactions faster than ever before, graph has become a powerhouse in looking at complex, irregular and very large datasets to identify patterns in near real-time. On March 16, we hosted an online chat titled “Graph: The Missing Link in Big Data Analytics” with industry experts from Deloitte and Mphasis. Sixty-one ... [ Read More ]