Building a Computing Architecture for Drug Discovery

We recently had the pleasure of helping Jason Roszik and his colleagues at the University of Texas MD Anderson Cancer Center in developing a high-throughput architecture supporting their work in identifying combination therapies for cancer. This work sits at the interface of some major technology, processing and clinical trends, and it was quite an eye-opener — as well as a motivation — for us on how to use Cray-developed systems and processing technologies to build a useful and productive high-throughput IT architecture. The first trend, of course, is next-generation sequencing (NGS). Costs are going down and sequencing throughput is going up dramatically, to where today’s NGS companies state they can process tens of human genomes a ... [ Read More ]

Data Analytics Rule at Spark Summit West 2017

Spark Summit West is always well attended, and this year was no exception. Data engineers, data scientists, programmers, architects and technology enthusiasts descended on San Francisco’s Moscone Center earlier this month to learn all about the latest developments with Apache Spark™ and its massive ecosystem. Complexity of analytics use cases and data science was a dominant theme throughout this year’s event. The keynote by the CEO and co-founder of Databricks, Ali Ghodsi, highlighted some of the challenges with implementing large-scale analytics projects. Ghodsi discussed how the continued growth of Apache Spark has resulted in myriad innovative uses cases, from churn analytics to genome sequencing. These applications are ... [ Read More ]

LEBM: Cray Creating New Extension to LUBM Benchmark

We’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern ... [ Read More ]

Making Sense of 50 Billion Triples: Getting Started

In my last post I outlined some of the reasons why the promise of graph analytics takes thought and planning to really capitalize on its potential. Now, I’d like to focus on getting started, appropriately, at the beginning with understanding the data itself. So let’s discuss methods of gaining an initial understanding of our data such that we can then feed that newfound understanding back into our ingestion process. First, we need to be able to query the database for some basic information that will tell us not only what is there but how the relationships are expressed. We would also, ideally, find out if there is a hierarchy to the relationships we find. By hierarchy, I mean that graphs can use containers, if you will, to organize ... [ Read More ]

Urika-GX: A Game Changer for Big Data Analytics

There’s a lot of hype around big data in healthcare and the life sciences. But big data is here to stay. Information is what drives the entire industry. When I worked in big pharma, I learned that the product of a pharmaceutical company is not a pill, it's a label. And a label is just a specific assemblage of information, carefully distilled from terabytes of information collected and analyzed over the course of many years by many intelligent people. To compete, companies have to be very good at turning data into information, and information into knowledge. The stakes couldn't be higher, because every day millions of patients rely on the quality of this data and the strength of the analyses done by researchers. Analyzing big data is ... [ Read More ]