Almost ten years ago, when I was working at Pfizer, I wrote a position paper for a W3C Workshop on Semantic Web in Life Sciences. In that paper, I pointed out several vexing problems then faced by pharmaceutical researchers that I thought could be alleviated by use of a powerful knowledge architecture such as that enabled by Semantic Web technologies. Among these problems were those you might classify as knowledge management problems, and they had much to do with effectively sharing information throughout a large research organization where specialized vocabularies and varied purposes can easily get in the way.
Other problems I described were just good, old-fashioned informatics problems, in particular the creation of so-called “data silos.” Data silos are created when data are put into databases or documents in their own unique format, a bespoke schema for example, such that they are not interoperable with any other data. Data silos are destructive to a knowledge-based organization because they prevent the synthesis of disparate information necessary to gain useful insight into data. Data silos kill productivity and innovation, because they limit the kinds of questions researchers can ask of their data and are typically difficult to modify or expand (which means you’re probably not going to get very much help any time soon). Data silos happen because by now we don’t even think about what container we’re going to put our data in; it just reflexively goes into a relational database or a document. And so we almost never think ahead to consider how we’ll represent the knowledge so that we can actually put it to use along with other data.
Semantic Web technologies, including RDF(s) graphs, Linked Data, and SPARQL, provide a standard, uniform way to model data and capture their semantics, so they can really help with the problems described in that position paper.
But only if they get used.
The interesting thing about that position paper (and here I hasten to remind you that it was written a decade ago) is that I’m pretty sure I could submit it as-is today, and it would be just as correct as it was when I originally wrote it. Every single one of the problems described in that paper is still being experienced today by life sciences researchers in every substantial research organization. Except now there’s a lot more data, so those problems are even worse.
At that Workshop I showed a sort of tongue-in-cheek slide illustrating the current state of knowledge management in large life sciences research organizations. It looked like this:
Yeah, I’d still show that, too.
So, nothing has really changed in ten years. How can that be? These problems are serious, and the goals of life sciences research are simply too important to let anything stand in the way.
Did you ever hear that Albert Einstein said, “The definition of insanity is doing the same thing over and over again and expecting a different result?” Well, it turns out he probably never said that. (I know, I thought he did, too!) Nonetheless, I think he gets the credit because it’s such a smart way to look at it. Now, here’s the kicker: over the last ten years, the way we’ve been using computers to help us solve research problems basically hasn’t changed a bit. To put it another way, we keep putting our data into relational databases or documents like we’ve always done, and hoping this time it’s going to be different.
Well, I hate to be the one to tell you this, but it isn’t. And you just built yourself yet another data silo, didn’t you?
I’ll tell you what Albert Einstein did say: “We can’t solve problems by using the same kind of thinking we used when we created them.”
Part of what makes the life sciences so endlessly fascinating, so fun, is the sheer amount and variety of information, and how all those bits of information are related to each other. We owe it to ourselves to learn how to work with those data, so that we can understand what it all means and make a difference in the lives of the people who need our help. New things are scary, but think of what we could accomplish if we’re brave enough to put our energy into the data, not just into the containers for the data. Einstein would be proud.