Managing Data from High-Performance Lustre to Deep Tape Archives

On September 17 we will present Cray’s Tiered Adaptive Storage (TAS) solution at the Preservation and Archiving Special Interest Group (PASIG) event in Germany. Session topics will include managing and preserving research data and infrastructures to support preservation of data at scale. Cray has a natural interest in contributing to these areas because a large number of our customers run Lustre® file systems. As of today, more than 120 petabytes of data capacity has been deployed on Cray supercomputers.

Our customers produce enormous amounts of simulation data every day with the numerical models they use in science and engineering. Data protection policies apply to a significant portion of that data, and it frequently needs to be preserved on a long-term basis. Manufacturing companies, for example, keep the simulation data that was produced during the design, testing and manufacturing process for the entire life of the product. This is important for continuous improvements of their products and investigations of accidents or unforeseen incidents.

Another area in which long-term preservation is inevitable is climate modeling. Simulations rely on observational data provided by instruments like satellites or sensors in the ocean or the atmosphere. Observational data is highly valuable and cannot be reproduced. Data must be readable for decades without dependence on a vendor.

While the Lustre file system successfully evolved in demanding HPC environments and enterprise computing, it lacked hierarchical storage management (HSM) functionality. User data had to be moved manually from Lustre to disk or tape archives. With the release of Lustre 2.5, this core data management capability is now natively available from the Lustre file system. Files can now be seamlessly migrated from the fastest performance tiers to an archive system such as Cray’s TAS or similar products offered by other vendors in the market. A versatile tool called Robinhood Policy Engine monitors constantly the filesystem contents and maintains a replicate of the metadata in a MySQL database.

TAS TierTo provide a complete solution, Cray announced the TAS Connector for Lustre File System earlier this this year.  Through the TAS Connector, all data stored on Lustre can be protected and transparently managed on Cray TAS and staged for access through Lustre. The TAS Connector allows data to migrate fluidly across tiers – from Lustre to capacity-optimized nearline disks and tape archives. Files that have been released and are offline will automatically be re-staged as soon as they’re requested by a user. Users can access a full overview of both online and offline files by using native Lustre commands.

TAS enables users to browse files as they normally would on a local file system or network share available over industry-standard file sharing protocols such as NFS. TAS includes Versity Storage Manager (VSM) tiered storage software, which utilizes Linux® and has familiar policies that are easy to set and control. Data can be stored and migrated among up to five tiers (composed of SSD, disk or tape) with off-site archiving options for disaster protection. Files and metadata are stored in an open format so they’re always protected and accessible even in the absence of VSM software.

VSM, the underlying software used by TAS, manages the fill levels of different tiers and takes automated action to either remove data or move it to other tiers to ensure the primary tier – the first storage location into which a user writes a file – never fills up. This automated management of the file system’s primary storage is a key to the Cray solution and required for complex systems of multiple tiers to remove the administrator requirement of constantly managing data locations. Another benefit to the user community is that the file system will appear to have infinitely large capacity (limited, of course, to the total storage in the configuration) because all storage tiers appear as one tier – the file system – to users.  All of the data is managed around the clock providing for continuous operations and data protection.

Speak Your Mind

Your email address will not be published. Required fields are marked *