POWERING FASTER R&D ANALYTICS AND BETTER PATIENT OUTCOMES
Advances in affordable sequencing and high-resolution imaging are producing extremely large, varied and complex data sets at ever-higher levels of sample specificity and sensitivity. Combine this with novel analytical methods and new computational tools to explore and collaborate on these data in minutes and at scale, and the stage is set for faster drug discovery, precision disease treatment, improved patient outcomes and more-efficient healthcare.
Cray has leveraged 40 years of dominance in HPC and supercomputing to build a platform that enables all the core components for precision medicine, with the power to take your data scientists from image and sequence analysis to computational modeling; your researchers from analysis tools and machine-learning frameworks to big-graph analytics; and your clinicians from big data store to connectors, query engines and productivity tools. All while minimizing the impact on your IT environment, and protecting your data and your patients.
Areas in which we collaborate with healthcare and life sciences organizations include:
Analyzing medical images at scale
Enabling computational pathology
Simulation for Life Sciences
Improve the resolution of molecular dynamics
Next-Generation Sequencing for Healthcare & Life Sciences
Assemble a large plant or human genome de novo in minutes
Find statistically significant variants in 10,000+ genomes
Leverage big data technologies to contextualize NGS results
AbokiaBLAST is a parallel implementation of NCBI BLAST created by the inventors of the open-source mpiBLAST project. AbokiaBLAST inherits the super-scalable architecture from mpiBLAST but is re-factored and re-engineered to offer production quality. With intelligent task parallelization and I/O optimization, AbokiaBLAST enables users to massively accelerate large-scale BLAST search on clusters or supercomputers with a single command.
ABySS is a de novo, parallel, paired-end sequence assembler designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences and help identify members of gene families.
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at more than 25 million 35bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end.
BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the other two are for longer sequences ranging from 70bp to 1Mbp.
FASTA is a more sensitive derivative of the FASTP program that can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence database by translating the DNA database as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences.
HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
MrBayes is a program for Bayesian inference of phylogeny using Markov Chain Monte Carlo methods. MrBayes has a console interface and uses a modified NEXUS format for data and batch files. It handles a wide range of probabilistic models for the evolution of nucleotide and amino acid sequences, restriction sites and standard binary data.
Phred is a base calling software with quality estimation; Phrap performs shotgun sequence assembly; and Consed is a sequence assembly editor companion to Phrap. Also available are Swat and CrossMatch, sequence alignment tools; and Phrapview, a graphical tool that provides a "global" view of the Phrap assembly, complementary to the "local" view provided by the Consed.
SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next-generation sequencing machines in mind, as well as Applied Biosystems' colourspace genomic representation.
SOAP provides a full solution to next generation sequencing data analysis. It consists of a new alignment tool (SOAPaligner/soap2), a re-sequencing consensus sequence builder (SOAPsnp), an indel finder (SOAPindel), a structural variation scanner (SOAPsv) and a de novo short reads assembler (SOAPdenovo).
SSAKE is a de novo assembler for short DNA sequence reads. It is designed to help leverage the information from short-sequence reads by assembling them into contigs and scaffolds that can be used to characterize novel sequencing targets.
A novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules — Inchworm, Chrysalis and Butterfly — applied sequentially to process large volumes of RNA-seq reads.
DOCK addresses the problem of "docking" molecules to each other. In general, "docking" is the identification of the low-energy binding modes of a small molecule, or ligand, within the active site of a macromolecule, or receptor, whose structure is known. A compound that interacts strongly with, or binds, a receptor associated with a disease may inhibit its function and thus act as a drug. Solving the docking problem computationally requires an accurate representation of the molecular energetics as well as an efficient algorithm to search the potential binding modes.
Main program helps users find the total energy, charge density and electronic structure of systems made of electrons and nuclei (molecules and periodic solids) within density functional theory (DFT), using pseudopotentials and a planewave or wavelet basis. ABINIT also includes options to optimize the geometry according to the DFT forces and stresses, or to perform molecular dynamics simulations using these forces, or to generate dynamical matrices, Born effective charges, and dielectric tensors, based on density-functional perturbation theory, and many more properties.
BigDFT is a DFT massively parallel electronic structure code (GPL license) using a wavelet basis set. Wavelets form a real space basis set distributed on an adaptive mesh. GTH or HGH pseudopotentials are used to remove the core electrons. With a Poisson solver based on a Green function formalism, periodic systems, surfaces and isolated systems can be simulated with the proper boundary conditions.
CASTEP is a code for calculating the properties of materials from first principles. Using density functional theory, it can simulate a wide range of material properties including energetics, structure at the atomic level, vibrational properties and electronic response properties. Offers a wide range of spectroscopic features that link directly to experiment, such as infrared and Raman spectroscopies, NMR and core level spectra.
CP2K performs atomistic and molecular simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different methods such as density functional theory using a mixed Gaussian and plane waves approach (GPW) and classical pair and many-body potentials.
DMol3 is a commercial (and academic) software package that uses density functional theory with a numerical radial function basis set to calculate the electronic properties of molecules, clusters, surfaces and crystalline solid materials from first principles. It can either use gas phase boundary conditions or 3-D periodic boundary conditions for solids or simulations of lower dimensional periodicity.
Owner:Open Source (commercial version from Nanotec)
SIESTA is both a method and its computer program implementation for performing electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids. SIESTA uses strictly localized basis sets and linear-scaling algorithms that can be applied to suitable systems. The code can be used for a wide range of applications, from quick exploratory calculations to simulations that match other approaches, such as plane-wave and all-electron methods.
VASP performs ab-initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set. The approach is based on a finite-temperature local-density approximation (with the free energy as variational quantity) and an exact evaluation of the instantaneous electronic ground state at each MD-step using efficient matrix diagonalization schemes and an efficient Pulay mixing.
SEQUEST correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases. It will determine the amino acid sequence and thus the protein(s) and organism(s) that correspond to the mass spectrum being analyzed.
X! Tandem can match tandem mass spectra with peptide sequences. It generates theoretical spectra for peptide sequences using information about intensity associated with amino acids. These spectra are compared with experimental data to generate an expectation value as a threshold score.
Assisted model building with energy refinement (AMBER) refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs), and a package of molecular simulation programs which includes source code and demos.
CHARMM (Chemistry at HARvard Molecular Mechanics) is the academic version of the CHARMM simulation program available through Harvard. CHARMM uses empirical energy functions to describe the forces on atoms in molecules.
DESMOND performs high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. The code uses novel parallel algorithms and numerical techniques and can run on platforms containing a large number of processors, or on a single computer.
General atomic and molecular electronic structure system (GAMESS) is a general ab initio quantum chemistry package. It can compute wave functions ranging from RHF, ROHF, UHF, GVB and MCSCF, with CI and MP2 energy corrections available for some of these.
GROMACS performs molecular dynamics, simulating the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules such as proteins, lipids and nucleic acids that have complicated bonded interactions, but is also being used for research on nonbiological systems, such as polymers.
Owner:Commercial (University College Cardiff Consultants Ltd.)
Molpro is a complete system of ab initio programs for molecular electronic structure calculations. The emphasis is on highly accurate computations, with extensive treatment of the electron correlation problem through the multiconfiguration-reference CI, coupled cluster and associated methods.
NAMD is a parallel molecular dynamics code designed for simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms.
NWChem provides many methods to compute the properties of molecular and periodic systems by using standard quantum mechanical descriptions of the electronic wave function or density. NWChem can perform classical molecular dynamics and free energy simulations. These approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations.
MPACK is a Fortran program package that involves scattering problems of two octet baryons by the quark-model interactions, fss2 and FSS (fssG.f), and their applications to the Faddeev calculations for the triton (triton.f) and the hypertriton (hypt.f
Researchers from the Centre for Biomolecular Sciences at the University of Nottingham along with the Edinburgh-based Cray Centre of Excellence team have been carrying out a pioneering study applying big data analysis techniques to simulation-generated DNA data.
Chronic respiratory diseases interrupt the airways and other lung structures, affecting hundreds of millions of people worldwide. Researchers created a first-ever 3D model of the lung using “Magnus,” Pawsey Supercomputing Centre’s Cray XC40 supercomputer, with the goal of improved delivery of aerosolized medications.
Genomic and genetic research have a big data problem. The field is producing ever-increasing amounts of data. But a lack of sufficiently scalable computational tools prevents researchers from analyzing it adequately. The situation leaves the massive opportunities inherent in this data untapped.
The Cray XC30 system Milner, which is named in recognition of significant work in neuropsychology that was done by Brenda Milner and her then husband Peter in the 1950s, is part of a high performance computing platform for research in neuroinformatics and was funded by an infrastructure grant from the Swedish Research Council (VR).
Researchers used "Jaguar," at Oak Ridge National Laboratory, to elucidate the mechanism by which accessory proteins, called sliding clamps, are loaded onto DNA strands and coordinate enzymes that enable gene repair or replication.
A team of Cray and university researchers discusses its work with Ray, a parallel shortread de novo assembler code. They also present a configuration for an NGS workflow based on a Cray supercomputing system and the Cray Sonexion storage solution.
Emerging big data analytics techniques hold the promise of accelerating scientific data processing, lowering the cost and complexity of data management and providing new capabilities for genomic interpretation.
Dave Anstey, global head of life sciences at Cray Inc., and Boris Umylny, director of bioinformatics services at the National Center for Genome Resources, talk about processing NGS data with unprecedented speed