One of our initiatives here at Cray is to provide supercomputer resources so innovators can advance artificial intelligence in its many forms — and, ultimately, human understanding. One way that we do this is through a partnership with the UK innovation center Digital Catapult and its Machine Intelligence Garage program. Machine Intelligence Garage helps businesses access the computation power and expertise they need to develop and build machine learning and artificial intelligence solutions.
As part of this partnership, one of the program participants, Bloomsbury AI, used a Cray® CS-Storm™ system in the Cray Accel AI™ lab to train and optimize the deep learning models within Cape, an open-source technology that can answer questions about information contained in text documents.
The result of the collaboration is a new standard for reading comprehension performance in the TriviaQA competition, where a set of questions is used to measure the system’s ability to accurately generate answers using a large body of text (Wikipedia articles and web pages) as reference. Question answering is a challenging natural language processing (NLP) task and is used as a milestone for artificial intelligence systems. The reading comprehension competition allows participants to measure question answering prowess using a publicly available dataset (TriviaQA: a large-scale dataset for reading comprehension and question answering) and questions unknown to the competitor in advance.
The Bloomsbury AI Cape model achieved first place in the competition standings with scores of 67.32% EM and 72.35% F1, setting a new state of the art in Wikipedia open-domain question answering. The model can answer an impressive ~80% of TriviaQA’s questions correctly — close to human performance, and well above TriviaQA’s two baseline algorithms: a feature-based classifier and a neural network (23% and 40%).
As Bloomsbury AI was developing the Cape model, it used the Cray Accel AI lab via Digital Catapult’s Machine Intelligence Garage and associated Cray CS-Storm GPU-accelerated systems to reduce model training time from over 42 hours using cloud resources to 4 hours using data parallel training on the Cray CS-Storm system. Training time reduction is an important contributor to Bloomsbury AI’s success, as model development is an iterative process. Faster training leads to more accurate models.
Bloomsbury AI’s researchers also made use of the CS-Storm system’s computing power and large GPU memory to train the Cape model with larger text chunks. This approach looks promising and could result in better automated reading comprehension in the future.
You can also find out more about artificial intelligence, machine learning, and AI technology on the Cray website.