University of Chicago Researchers Make Strides in Genome Sequencing

When Charles Darwin explored the Galapagos Islands and developed his theory of natural selection, he did so with the help of the H.M.S. Beagle.  The University of Chicago is building on his pioneering work using a supercomputer and storage provided by Cray, also called the Beagle.

Darwin’s ship gained historic status not because it was particularly unique or special, but because it was the vehicle that carried him to the locations where he completed research that redefined the way we look at evolutionary theory. The Beagle became famous because it was a tool that enabled Darwin to do his work. Theories are essential, but tools are the means by which researchers test their hypotheses and make new discoveries.

Today, high performance computing (HPC) is the tool biologists use to ask questions that can only be answered by carrying out in silico simulations. Biology is at a point where experimental work is reliant on simulation. In today’s world, HPC helps bench biologists make their experiments not only more efficient, but more cost effective.

Since the advent of the human genome project, DNA sequencing is rapidly outpacing the ability to process and analyze data. In response, the Computation Institute (CI), a joint initiative between the University of Chicago and Argonne National Laboratory, along with the Department of Internal Medicine at the University of Michigan, the Penn Cardiovascular Institute and Perelman School of Medicine at the University of Pennsylvania, and the Washington University School of Medicine have partnered to do transformative research in the genome sequence sector. To achieve their goal, they turned to Cray. The CI selected a Cray® XE6™ system combined with a (Lustre) storage solution, also provided by Cray.  In honor of Darwin, they’ve dubbed the system Beagle, a clear reference to their pioneering work in bringing HPC to the field or genomics or bioinformatics in general.

Looking at the advances made through the work completed with Beagle
The CI research team has used Beagle to perform whole-genome sequencing. According to a recent CNET report, the team recently submitted a study to the Bioinformatics journal explaining that their work with Beagle has expanded the depth and breadth of genome sequencing.

According to CNET, the genome sequencing sector has long been held back by the amount of data that is included in human DNA. This has created a situation in which scientists attempting to test how a material, such as an experimental drug, will impact people focuses almost exclusively on a small portion of the genome, usually proteins. This practice can be an incredibly informative process, but it is also limited, particularly in the breadth and depth of the analysis involved in the sequencing process. The researchers at CI have moved past this limitation, and done so using only one-quarter of Beagle’s full processing power.

The abstract for the entry in the Bioinformatics journal explained that the team has managed to make whole-genome sequencing an option while using software that is publicly available. The software used to fine tune Beagle’s capabilities enables parallel concurrent whole-genome sequencing.  Beagle uses a parallel computation environment and a parallel Lustre file system deployed on shared storage. The solution utilizes a scalable InfiniBand interconnect delivered in an architecture Cray calls Direct-Attached Lustre (DAL).

Moving forward, this system could end up having a huge collaborative impact on genome sequencing because the accessible nature of the applications that support sequencing processes could ensure more research teams are able to take advantage of the capability.

CNET explained that the progress being made with Beagle is bringing the industry closer to whole-genome sequencing, which was once thought to be cost-prohibitive. Elizabeth McNally, director of the Cardiovascular Genetics clinic at the University of Chicago Medicine and author of the report submitted to Bioinformatics, explained that the cost-per-genome sequence will need to be brought down to approximately $1,000 for the solution to be fiscally viable for mainstream use.


  1. 1

    Ray Sheppard says

    It would have been great to include the name of which “software that is publicly available” was used.

    • 2

      Carlos P Sosa says

      Hi Ray

      Thank you for your comment. The blog has a reference to the scientific paper where the software is listed.



Speak Your Mind

Your email address will not be published. Required fields are marked *