We recently held a webinar for the Lustre® community — now available on demand — to address questions about the convergence on a single canonical Lustre release stream and what that means for future feature development.
Some of you asked questions that we ran out of time to cover in our Q&A session. Our storage experts answer them here:
1. Any tips on how to identify usage patterns on previous Lustre versions that would benefit from Lockahead in 2.11?
Our 2017 CUG paper, “Lustre Lockahead: Early Experience and Performance Using Optimized Locking,” addresses that very topic. Lockahead shows significantly improved write performance for collective, shared-file I/O workloads. Initial tests show write performance improvements of more than 200% for small transfer sizes and over 100% for larger transfer sizes compared to traditional Lustre locking. Standard Lustre shared-file locking mechanisms limit scaling of shared file I/O performance on modern high performance Lustre servers. The new Lockahead feature provides a mechanism for applications (or libraries) with knowledge of their I/O patterns to overcome this limitation by explicitly requesting locks. MPI-IO is able to use this feature to dramatically improve shared file collective I/O performance, achieving more than 80% of file per process performance.
2. With the MDT redundancy in 2.13, will reads be load balanced for performance?
The Metadata Target (MDT) redundancy feature has yet to be designed, so we can’t yet comment on how this feature will affect performance.
3. Can you talk a bit about data migration capabilities for when a small file (using DoM) is no longer a small file and needs to move off the MDT? This needs to be an automated process that doesn’t involve re-copying the file.
DoM files typically won’t have to be migrated because DoM should be combined with the Progressive File Layout (PFL) feature. The file will then simply “break out” onto more data stripes on Object Storage Targets (OST). If, however, a user/admin decides to migrate a file, it can be done with a simple copy and need not be automated. However, Cray will provide lfs migrate and other rich copy tools in the future. These tools will be driven automatically via expressed policies.
4. What are the challenges of Lustre on flash devices?
There are many, and discussing them would make a great topic for a future webinar. In fact, it is our plan to dive into Lustre on flash at our next upcoming webinar. In the meantime, we can say that flash exposes some formerly insignificant latencies in the Lustre software stack that we have addressed, and we’ve been busy improving Lustre to be able to take advantage of the inherent properties of the technology. Also, it seems clear to us that the economics are such that flash won’t fulfill 100% of primary storage requirements, so we’re contemplating hybrid storage architectures. In that world, understanding data placement, data movement and data management becomes much more important. For a discussion of some of these topics, please see panelist Cory Spitz’s LAD ’17 talk, “In-Board Lustre on a Performance Tier.”
5. What is the best choice of data copy tool to fit into the HSM piece?
Our Connector copy tool technology fits into the Lustre Hierarchical Storage Management (HSM) architecture. This tool can manage parallelized data movement to select HSM or POSIX environments. There are other choices better suited to particular environments: for example, HPSS. We’re evaluating changes to both the Lustre-HSM architecture and the Connector technology that would allow it to optimally manage data movement requests for a variety of services. We’d love to hear more about your Lustre-HSM requirements.