In my previous blog post, I presented the challenges in extracting business value from big data by adapting to new rules and datasets in a process I like to call fast onboarding. So how can we design a system for fast onboarding?
Technology infrastructure can play a role. Fast onboarding, with or without that name, has been the motivation for a number of technology innovations for decades. One can even argue that fast onboarding was the motivation a half century ago for moving system design from hardware (hard to change) to software (easier to change).
In modern times, we see advanced technologies whose relevance comes entirely from their ability to support fast onboarding. NOSQL databases and business rule systems are two such examples.
In contrast to the data warehousing story, what does the software development world look like, when fast onboarding is the norm? It starts out much the same – we identify some requirements and some datasets to meet those requirements. We build a system that incorporates those datasets and extracts the required conclusions.
The difference happens when the context changes – a new, follow-on question comes along, requiring a new dataset. A competitor has improved on what they can find in the data, and we have to keep up.
In a fast onboarding world, we determine what new resource is needed – maybe just a new pattern working over the existing data, or maybe bringing in a new data set to support some new questions. We map this resource into the current system, and carry on.
So what can technology provide to support this vision? Graph databases go a long way towards supporting this vision. Also graphs provide a flexible way to represent data, describe new datasets, and relate them to old ones. The work done on rendering data as a graph is leveraged with each new query, and each new dataset. Adding a new dataset doesn’t involve going back to square one to start again.
But to really achieve the vision of fast onboarding, we have to exploit the onboarded data quickly and effectively. We need to match these new patterns to the new datasets quickly – often interactively.
Technologies to address various parts of these dynamic problems have existed as disparate solutions for a while. However, it is only with the advent of a specialized architecture like that of Cray’s Urika™ appliance, that combines specialized graph processors and an impressive I/O architecture to ingest new (streaming) data at up to 350TB/s, that we now have a business-ready infrastructure that combines these facilities, in a scalable way into a single system.
These infrastructures that can combine the flexibility of a graph-based database with a powerful pattern language, and can perform these queries interactively and at scale, represent a step forward in allowing technology to onboard new data as fast as businesses need it.
Dean Allemang, co-author of the bestselling book, Semantic Web for the Working Ontologist, is a consultant, thought-leader, and entrepreneur focusing on industrial applications of distributed data technology. He served nearly a decade as Chief Scientist at TopQuadrant, the world’s leading provider of Semantic Web development tools, producing enterprise solutions for a variety of industries. As part of his drive to see the Semantic Web become an industrial success, he is particularly interested in innovations that move forward the state of the art in distributed data technology. Dean’s current work is concentrated on the life sciences and finance industries, where he currently sees the most promising industrial interest in this technology.