Life Sciences

Cray Solutions for Life Sciences

Get to the answer faster.

Talk to a Cray Expert


In the life sciences, the next question is already waiting.

You need technology on your team that respects your budget and gets you answers quickly and reliably.

Today, pharmaceutical, biotech, healthcare and university organizations face increasingly complex challenges in a wide range of fields. From next-generation sequencing to molecular modeling to translational research to data integration, making discoveries and staying competitive depends on high performance computing (HPC) and big data analytics.

Cray supercomputing solutions can help you make the most of your research. We have proven expertise in designing, optimizing and supporting large-scale supercomputing and analytics environments for the life sciences field. The result for our customers is shorter discovery cycles, lower R&D costs and stronger overall competitiveness.

We are a complete supercomputing provider with computing, storage and analytics solutions to meet any size need.

Adaptable, upgradable Cray® XC™ series supercomputers provide extreme application scalability and excel at large-scale computations. The Cray® CS™ series supercomputer is a highly efficient, scale-out cluster system supporting a broad range of model requirements. Choose Cray's CS-Storm GPU-accelerated system to speed up massively parallel computing workloads. The Urika-GD™ graph analytics appliance, designed for the discovery process, enables new insights in real time, while the Urika-XA™ extreme analytics platform can be used to improve next generation sequencing workflows, enable a range of health care analytics use cases and power complex knowledge management solutions.

Cray storage solutions support your research with expert data management. Cray® Sonexion® is our integrated scale-out storage system for Lustre®. Cray Tiered Adaptive Storage provides a complete and open archiving solution for big data and HPC.

Featured Resources

Blue Waters: Enabling Scientific Breakthroughs at the Petascale

The leaders at NCSA turned to Cray to build the system and come up with an integrated and modular storage solution to meet the staggering requirements of "Blue Waters," one of the most powerful supercomputers in the world.

Cray Solutions for Chemistry & Life Sciences

Cray brings sustained performance, system reliability and decades of supercomputing expertise to the chemistry and life sciences industries’ data-intensive computing tasks.

The Problem with Cellulosic Ethanol

At Oak Ridge National Laboratory, simulation provides a close-up look at the molecule that complicates next-generation biofuels.

See All Resources

View Bioinformatics Applications
Owner:Open Source

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences and help identify members of gene families.

Owner:Open Source

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at more than 25 million 35bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end.

Owner:Open Source

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the other two are for longer sequences ranging from 70bp to 1Mbp.

Owner:Open Source

ClustalW is a general-purpose multiple alignment program for DNA or proteins. ClustalX is a graphical user interface for the ClustalW multiple-sequence alignment program.

Owner:Open Source

FASTA is a more sensitive derivative of the FASTP program that can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence database by translating the DNA database as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences.

Owner:Open Source

HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).

Owner:Open Source

MrBayes is a program for Bayesian inference of phylogeny using Markov Chain Monte Carlo methods. MrBayes has a console interface and uses a modified NEXUS format for data and batch files. It handles a wide range of probabilistic models for the evolution of nucleotide and amino acid sequences, restriction sites and standard binary data.

Application:Phred Phrap Consed
Owner:Commercial (Univ. Wash.)

Phred is a base calling software with quality estimation; Phrap performs shotgun sequence assembly; and Consed is a sequence assembly editor companion to Phrap. Also available are Swat and CrossMatch, sequence alignment tools; and Phrapview, a graphical tool that provides a "global" view of the Phrap assembly, complementary to the "local" view provided by the Consed.

Owner:Open Source

SOAP provides a full solution to next generation sequencing data analysis. It consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder (SOAPindel ), a structural variation scanner (SOAPsv) and a de novo short reads assembler (SOAPdenovo).

Owner:Open Source

Velvet performs de novo short read assembly using de Bruijn graphs. It can be used for Solexa and 454 sequencing data assembly.

View Drug Discovery Applications
Owner:Open Source

AutoDock is a suite of automated docking tools that predicts how small molecules, such as substrates or drug candidates, bind to a receptor of known 3-D structure.

Owner:Commercial (UCSF )

DOCK addresses the problem of "docking" molecules to each other. In general, "docking" is the identification of the low-energy binding modes of a small molecule, or ligand, within the active site of a macromolecule, or receptor, whose structure is known. A compound that interacts strongly with, or binds, a receptor associated with a disease may inhibit its function and thus act as a drug. Solving the docking problem computationally requires an accurate representation of the molecular energetics as well as an efficient algorithm to search the potential binding modes.

Owner:Commercial (BioSolveIT)

Predicts protein-ligand interactions.

Owner:Commercial (Schrödinger)

A complete solution for ligand-receptor docking, from virtual screening of millions of compounds to binding mode predictions.

Owner:Open Source

Gold calculates the docking modes of small molecules into protein binding sites.

Owner:Commercial (molsoft)

A desktop-modeling environment for molecular structure and function.

Owner:Commercial (Accelrys )

Code for docking ligands into protein active sites. The method employs a cavity detection algorithm for detecting invaginations in the protein as candidate active site regions.

Owner:Commercial (OpenEye)

ROCS is a virtual screening tool that can identifies potentially active compounds by shape comparison.

View Materials Science Applications
Owner:Open Source

Main program helps users find the total energy, charge density and electronic structure of systems made of electrons and nuclei (molecules and periodic solids) within density functional theory (DFT), using pseudopotentials and a planewave or wavelet basis. ABINIT also includes options to optimize the geometry according to the DFT forces and stresses, or to perform molecular dynamics simulations using these forces, or to generate dynamical matrices, Born effective charges, and dielectric tensors, based on density-functional perturbation theory, and many more properties.

Owner:Open Source

BigDFT is a DFT massively parallel electronic structure code (GPL license) using a wavelet basis set. Wavelets form a real space basis set distributed on an adaptive mesh. GTH or HGH pseudopotentials are used to remove the core electrons. With a Poisson solver based on a Green function formalism, periodic systems, surfaces and isolated systems can be simulated with the proper boundary conditions.

Owner:Commercial (Accelrys )

CASTEP is a code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of material properties including energetics, structure at the atomic level, vibrational properties and electronic response properties. Offers a wide range of spectroscopic features that link directly to experiment, such as infrared and Raman spectroscopies, NMR and core level spectra.

Owner:Open Source

CP2K performs atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as density functional theory using a mixed Gaussian and plane waves approach (GPW) and classical pair and many-body potentials.

Owner:Open Source

The CPMD code is a parallelized plane wave/pseudopotential implementation of density functional theory, particularly designed for ab-initio molecular dynamics.

Owner:Commercial (Accelrys )

DMol3 is a commercial (and academic) software package that uses density functional theory with a numerical radial function basis set to calculate the electronic properties of molecules, clusters, surfaces and crystalline solid materials from first principles. It can either use gas phase boundary conditions or 3-D periodic boundary conditions for solids or simulations of lower dimensional periodicity.

Owner:Open Source

Large-scale atomic/molecular massively parallel simulator (LAMMPS) is a classical molecular dynamics code.

Owner:Open Source (commercial version from Nanotec)

SIESTA is both a method and its computer program implementation for performing electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids. SIESTA uses strictly localized basis sets and linear-scaling algorithms that can be applied to suitable systems. The code can be used for a wide range of applications, from quick exploratory calculations to simulations that match other approaches, such as plane-wave and all-electron methods.

Owner:Commercial (Univ. of Vienna)

VASP performs ab-initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set. The approach is based on a finite-temperature local-density approximation (with the free energy as variational quantity) and an exact evaluation of the instantaneous electronic ground state at each MD-step using efficient matrix diagonalization schemes and an efficient Pulay mixing.

View Proteomics Applications
Owner:Commercial (Matrix Science)

Mascot is a mass spectral search algorithm that uses mass spectrometry data to identify proteins from primary sequence databases.

Owner:Open Source

Mass spectrometry software used for data acquisition, analysis or representation.

Owner:Open Source

Proteomics tools for mining sequence databases in conjunction with mass spectrometry experiments.

Owner:Open Source

SEQUEST correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases. It will determine the amino acid sequence and thus the protein(s) and organism(s) that correspond to the mass spectrum being analyzed.

Owner:Open Source

X! Tandem can match tandem mass spectra with peptide sequences. It generates theoretical spectra for peptide sequences using information about intensity associated with amino acids. These spectra are compared with experimental data to generate an expectation value as a threshold score.

View Structural Biology Applications

ACEMD is a heavily optimized molecular dynamics engine specially designed to run on NVIDIA GPUs.

Owner:Open Source

Assisted model building with energy refinement (AMBER) refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs), and a package of molecular simulation programs which includes source code and demos.

Owner:Commercial (Accelrys )

CHARMM (Chemistry at HARvard Molecular Mechanics) is the academic version of the CHARMM simulation program available through Harvard. CHARMM uses empirical energy functions to describe the forces on atoms in molecules.

Owner:D.E. Shaw Research

DESMOND performs high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. The code uses novel parallel algorithms and numerical techniques and can run on platforms containing a large number of processors, or on a single computer.

Owner:Open Source

General atomic and molecular electronic structure system (GAMESS) is a general ab initio quantum chemistry package. It can compute wave functions ranging from RHF, ROHF, UHF, GVB and MCSCF, with CI and MP2 energy corrections available for some of these.

Owner:Commercial (Gaussian)

Gaussian 09 provides electronic structure modeling, and is licensed for a wide variety of computer systems.

Owner:Open Source

GROMACS performs molecular dynamics, simulating the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules such as proteins, lipids and nucleic acids that have complicated bonded interactions, but is also being used for research on nonbiological systems, such as polymers.

Owner: Commercial (Schrödinger)

Jaguar is an ab initio package for both gas and solution phase simulations, with particular strength in treating metal containing systems.

Owner:Open Source

Large-scale atomic/molecular massively parallel simulator (LAMMPS) is a classical molecular dynamics code.

Owner:Commercial (University College Cardiff Consultants Ltd.)

Molpro is a complete system of ab initio programs for molecular electronic structure calculations. The emphasis is on highly accurate computations, with extensive treatment of the electron correlation problem through the multiconfiguration-reference CI, coupled cluster and associated methods.

Owner:Open Source

NAMD is a parallel molecular dynamics code designed for simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms.

Owner:Open Source

NWChem provides many methods to compute the properties of molecular and periodic systems by using standard quantum mechanical descriptions of the electronic wave function or density. NWChem can perform classical molecular dynamics and free energy simulations. These approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations.

Owner:Commercial (Q-Chem)

Q-Chem is a comprehensive ab initio quantum chemistry package. Its capabilities range from DFT/HF calculations to post-HF correlation methods.

Owner:Open Source

MPACK is a Fortran program package that involves scattering problems of two octet baryons by the quark-model interactions, fss2 and FSS (fssG.f), and their applications to the Faddeev calculations for the triton (triton.f) and the hypertriton (hypt.f

Owner:Commercial (PetaChem)

TeraChem is general-purpose quantum chemistry software designed to run on NVIDIA GPU architectures under a 64-bit Linux operating system.

Case Studies

Blue Waters: Enabling Scientific Breakthroughs at the Petascale

The leaders at NCSA turned to Cray to build the system and come up with an integrated and modular storage solution to meet the staggering requirements of "Blue Waters," one of the most powerful supercomputers in the world.

Boron Nitride and the Nanoribbons of Tomorrow

Materials modeling on ORNL’S "Jaguar" shows big future for boron nitride

ORNL and Purdue Explore Technology at the Nanoscale

A team led by Gerhard Klimeck of Purdue University has broken the petascale barrier while addressing a relatively old problem in the young field of computer chip design.

Application, Solution and Technology Briefs

Accelerating Cancer Research: Using a Big Data Approach

The ISB team worked with Cray to develop an innovative, real-time approach to cancer research discovery using the Urika-GD™ graph analytics appliance.

Cray Helps Optimize De Novo Assembler Application Trinity for Use on Massively Parallel Supercomputers

To fully realize the benefits of the Cray XC30 system for NGS, Cray is actively collaborating with leading researchers to improve the performance of NGS workflows.

Cray's Next-Generation Sequencing Solution

Cray’s next-generation sequencing solution helps research and clinical institutions manage datasets throughout their life cycle, from assembling raw data to archiving analyzed data. The Cray NGS solution comprises three core elements — computing, storage and analysis.

Cray Demonstrates Top-Level Performance and Scalability on Very Large Datasets with Velvet

Velvet is a de novo genomic assembler designed for short reads generated by NGS sequencers.

Cray Helps Tune Ray De Novo Genomic Assembler Software

Ray is a highly parallel computer software developed at the Université Laval that performs de novo genome assemblies with next-generation DNA sequencing data.

Cray Solutions for Chemistry & Life Sciences

Cray brings sustained performance, system reliability and decades of supercomputing expertise to the chemistry and life sciences industries’ data-intensive computing tasks.

Cray Storage Solutions for Life Sciences

Built on open systems, Cray’s scalable storage solutions address life science’s data- and I/O-intensive workflows and get results faster.

Patient Treatments: Accelerated data analysis finds ideal treatments for individual patients

With the Urika-GD system, a healthcare organization taps the latent value in millions of patient outcome records, improving patient care and saving lives.

Tuning NAMD on the Cray XK6 "Titan" Supercomputer

With Cray support, the NAMD developers are optimizing their code on each new iteration of Cray hardware, achieving scaling to hundreds of thousands of cores.

Speed Your Time to Results Using Galaxy on a Cray System

Galaxy is a widely used web-based platform for data integration and analysis in the life sciences.

Customer Solutions

The Problem with Cellulosic Ethanol

At Oak Ridge National Laboratory, simulation provides a close-up look at the molecule that complicates next-generation biofuels.

Supercomputers Simulate the Molecular Machines That Replicate and Repair DNA

Researchers used "Jaguar," at Oak Ridge National Laboratory, to elucidate the mechanism by which accessory proteins, called sliding clamps, are loaded onto DNA strands and coordinate enzymes that enable gene repair or replication.

Early Molecular Dynamics Research Blazes through Titan’s New GPUs

A look at the transition from CPUs to GPUs when the Oak Ridge Leadership Computing Facility upgraded its Cray "Jaguar" system to the new "Titan." 

Beagle: The CI Supercomputer for Biomedical Simulations & Data Analysis at the University of Chicago

The official website for "Beagle," a Cray XE6 system used primarily for biomedical research at the University of Chicago's Computation Institute.

Computation Institute, University of Chicago, Beagle Newsletter

The Beagle Cray system is one of the fastest supercomputers in the world that is devoted to life sciences.

Technical Papers

Genomic Applications on Cray Supercomputers: Next Generation Sequencing Workflow

A team of Cray and university researchers discusses its work with Ray, a parallel shortread de novo assembler code. They also present a configuration for an NGS workflow based on a Cray supercomputing system and the Cray Sonexion storage solution. 

Human Analysts at Superhuman Scales

Ray, a scalable genome assembler, addresses big data problems by using optimal resources and producing one correct and conservative timely solution.


Personalized Medicine: Technology Needs to Be Ready

Carlos Sosa, high performance computing architect at Cray Inc., says personalized medicine is on the way, but HPC technology must be more robust to answer questions quickly for patients and doctors.