Using HPC Techniques to Accelerate NGS Workflows

Life Sciences_small

Next-generation sequencing (NGS) describes the modern nucleotide sequencing technologies that allow analysis of genetic material with unprecedented speed and efficiency. Its advent is shifting genome assembly from a problem of laboratory-based chemistry to one well suited to high performance computing (HPC). In simple terms, NGS involves breaking up long DNA or RNA molecules into millions of small, fragments (50 to 200 nucleotides), defined as a “reads” to be assembled into larger fragments called contigs. The process of taking genetic material, processing it on a sequencer, passing it to an HPC system for assembly, and outputting digital information in a form useful for research is contained in a “workflow,” the end-to-end flow of ... [ Read More ]

The Definition of Insanity


Almost ten years ago, when I was working at Pfizer, I wrote a position paper for a W3C Workshop on Semantic Web in Life Sciences. In that paper, I pointed out several vexing problems then faced by pharmaceutical researchers that I thought could be alleviated by use of a powerful knowledge architecture such as that enabled by Semantic Web technologies. Among these problems were those you might classify as knowledge management problems, and they had much to do with effectively sharing information throughout a large research organization where specialized vocabularies and varied purposes can easily get in the way. Other problems I described were just good, old-fashioned informatics problems, in particular the creation of so-called "data ... [ Read More ]

Video Blog: Solutions that Enable Life Sciences Advances

Ted Slater

Advanced IT solutions are transforming the life sciences industry, and there are few better places to learn about how this is happening than the Bio-IT World Conference & Expo in Boston. The conference is coming up soon, and at Cray, we're excited to highlight our computing, storage and analysis solutions. Ted Slater, senior solutions architect for life sciences at YarcData, a Cray company, will present on Wednesday, April 30, and talk about how YarcData’s Urika technology is fueling major advances. Ted's presentation, "Learn How YarcData's Graph Analytics Appliance Makes It Easy to Use Big Data in Life Sciences," is about how data-intensive research is rising in life sciences and how YarcData systems provide the functionality ... [ Read More ]

Optimizing Next Generation Sequencing Environments with Graph Analytics

Last week, I discussed using the right tool in your technological environment, and today I’d like to note the importance of this when it comes to life sciences, specifically bioinformatics and next generation sequencing (NGS). In this video on Cray’s blog, I describe some of the problems that pharmaceutical and biotech organizations are plagued with, and how a graph analytics solution can resolve them. In the video I answer many key questions that I’ve heard working with scientists, including: How can graph analytics address the data challenges seen in an NGS environment? What power does a graph representation give you that a traditional RDBMS does not give you? Where does a graph analytics solution fit in an NGS ... [ Read More ]

University of Chicago Researchers Make Strides in Genome Sequencing


When Charles Darwin explored the Galapagos Islands and developed his theory of natural selection, he did so with the help of the H.M.S. Beagle.  The University of Chicago is building on his pioneering work using a supercomputer and storage provided by Cray, also called the Beagle. Darwin's ship gained historic status not because it was particularly unique or special, but because it was the vehicle that carried him to the locations where he completed research that redefined the way we look at evolutionary theory. The Beagle became famous because it was a tool that enabled Darwin to do his work. Theories are essential, but tools are the means by which researchers test their hypotheses and make new discoveries. Today, high performance ... [ Read More ]