Data Intensive Cluster Supercomputing Contributes to Major Advance in Genetic Research

A person’s genes play a prominent role in dictating who they become. Genetics is about more than just determining the color of a person’s eyes or hair. Genes can also impact how their brain develops and reveal any likelihood of a person being born with a mental disorder. The study of genetics has become a priority for many scientists because breakthroughs in such research could help medical professionals identify signs of mental disorders and possibly prevent such conditions from developing. Researchers recently used supercomputing capabilities to identify genetic patterns that could lead to autism and similar disorders. This advance could enable scientists to prevent such disorders from developing, the National Science Foundation (NSF) recently explained.

Making strides in genetic research
The project, which was recently detailed in the Genes, Brain and Behavior journal, used data-intensive supercomputing to complete sophisticated research that would have been impossible without such genetic computing technology. A team of scientists from the San Diego Supercomputing Center (SDSC) and the Institute Pasteur have identified a time-dependent gene-expression process that could help medical professionals treat mental disorders such as schizophrenia and autism.

Within the hierarchical tree of various coherent gene groups, there are transcription-factor networks in which a variety of genetic patterns develop. These patterns typically form during the brain’s development and often dictate various aspects of a person’s mental makeup. As a result, the master transcription factors that are at the top of coherent gene groups often hold the key to understanding how autism, schizophrenia and similar disorders develop.

In the National Science Foundation press release detailing the research, Igor Tsigelny, a research scientist with SDSC and UC San Diego’s Moores Cancer Center, explained that data is central to these types of advances.

“We live in the unique time when huge amounts of data related to genes, DNA, RNA, proteins, and other biological objects have been extracted and stored,” said Tsigelny.

Using very large genetic computing data simulation to make meaningful progress
Gathering and using genetic data can prove extremely challenging. For this genetic research project, progress was made through the use of SDSC’s Gordon supercomputer, which is a Cray CS300-AC™ Cluster Supercomputer. Gordon’s I/O nodes are specifically designed to handle large, complex data-intensive workloads that address I/O bottlenecks. Researchers and scientists at the SDSC facility have been using the Gordon supercomputer since January 2012, and the genetic computing system is now part of NSF’s Extreme Science and Engineering Discovery Environment (XSEDE) program — a nationwide partnership including 16 high-performance computers and high-end visualization and data analysis resources.

Gordon’s robust data I/O capabilities have been used by many scientists in a variety of fields, whose research requires the mining, searching and/or creating of large databases for immediate or later use, including mapping genomes for applications in personalized medicine and examining computer automation of stock trading by investment firms on Wall Street.
For more details on how the Gordon supercomputer has impacted genetic research, check out this NSF video.