In this two-part blog series, I’m going to first discuss various frustrations of data management professionals and then the solution they need to extract value from the massive volumes available to them.
I want to start by telling three different stories of information challenges, spanning decades of information processing history. What do they have in common?
- A credit card company creates a system to detect patterns of card usage that indicate the possibility of fraud. But the fraudsters catch on, and change their tactics. So the bank catches on, and changes its fraud check policies. How can the fraud detection system keep up?
- A contractor provides information systems to a government office. New legislation is enacted regarding what information may and may not be kept for public records, making the policies of these systems obsolete. How can the system keep up?
- A detailed analysis of a market requires sophisticated statistics to react quickly enough to exploit market opportunities. But this analysis raises more detailed questions. Whoever can find these answers quickly enough can gain a further market edge. How can the analysis systems keep up?
The oldest story comes from twenty years ago, the most recent from this very year. All three of them point to a problem using information systems in a dynamic business environment: Keeping up with the rapid pace of business change.
This is an old problem, but an ongoing one, and I want to give it a name: I call it fast onboarding. Defined rather broadly, fast onboarding is the ability of a system to bring new rules, patterns, datasets, or other information resources to bear quickly on business problems.
What are the business situations that particularly demand fast onboarding capabilities? We’ve seen a few in the stories above. These situations typically involve highly competitive situations, with stakeholders who are responding to some external pressure, outside the view of the system.
We see it in security situations of all sorts, where adverse parties (fraudsters, spies, political enemies) are working against your business goals. We also see it in less adversarial situations, as in the example of the fast-breaking legislation above, where competing political pressures make drastic changes to the information context.
What does this mean for the underlying technology? First, let’s think about how onboarding is done using conventional software methods today. For diverse datasets, we have data warehousing approaches that let us combine them into a single resource that can drive the new application.
We can define patterns in the data with the help of powerful query languages that are of business interest – possible cases of fraud, regulatory violations, or new competitive opportunities. These technologies and methods allow us to develop software that onboards new datasets and new patterns as the application landscape evolves.
How well do these approaches work? In an informal poll among data warehouse professionals, I asked them how long, on the average, it took to design a warehouse, perform the ETL, design the queries on the new system, and provide business value. The answer was uniformly given “about six months.”
Suppose then that once the analysis delivers value, the business line has a follow-on question that recognizes a new pattern or integrates a new dataset. How long does it take to build the follow-on system? The surprising answer was again, “about six months.” In all the times I have told this story, the only objection I have ever received was that the estimate of “about six months” was a bit optimistic.
What are the take-aways? The real lesson is that there is no cumulative value in the system design; onboarding a new dataset or pattern is just as expensive as starting from scratch. We normally think of good technology development as a predictable process, but when the system has to both satisfy the requirements known at design time and adapt to a dynamic business, even a technical success in this sense does not translate into business success.
A fast onboarding system isn’t measured just by how well it answers some business question, but by how quickly it can answer new questions in new information contexts. In my next blog post, I’ll discuss how best to build a system for fast onboarding so that you can learn how to tackle your massive data problems.
Dean Allemang, co-author of the bestselling book, Semantic Web for the Working Ontologist, is a consultant, thought-leader, and entrepreneur focusing on industrial applications of distributed data technology. He served nearly a decade as Chief Scientist at TopQuadrant, the world’s leading provider of Semantic Web development tools, producing enterprise solutions for a variety of industries. As part of his drive to see the Semantic Web become an industrial success, he is particularly interested in innovations that move forward the state of the art in distributed data technology. Dean’s current work is concentrated on the life sciences and finance industries, where he currently sees the most promising industrial interest in this technology.