Cray “Blue Waters” Supercomputer Tackles Gerrymandering

Yan Liu and Wendy K. Tam Cho

Redistricting — the process by which congressional and state legislative district boundaries are drawn — sounds like an unremarkable government chore. And, in theory, it should be. But, too often, it is subject to “gerrymandering,” or manipulation, by the majority political party. Decades ago, University of Illinois political science professor Wendy K. Tam Cho (pictured above) realized that what’s needed is a computational tool that would help the courts objectively measure the fairness of a legislative map. She developed a tool that could generate hundreds of millions of voter district maps that would serve as a “comparison set” — a way to measure the level of partisanship exhibited by any particular electoral map. But any further work ... [ Read More ]

“LEBM”: Cray Creating New Extension to LUBM Benchmark

I’ve written a few posts about the Cray Graph Engine (CGE), a robust, scalable graph database solution. CGE is a graph database that uses Resource Description Framework (RDF) triples to represent the data, SPARQL as the query language and extensions to call upon a set of “classical” graph algorithms. There are two main advantages of CGE. One is that it scales a lot better than most other graph databases — because the other ones weren’t designed by supercomputer wizards. (Of course I would say that.) The other advantage is that not only does CGE scale well, it performs unusually well on complex queries on large, complex graphs. Typical of a lot of complex graph queries: Where are all the places where the red pattern matches a pattern in ... [ Read More ]

50 Billion Triples: Digging a Bit Deeper

In the last two installments of this series (part 1 and part 2), we discussed some higher-level thoughts on striking a balance between easy ingest and more prep work as well as some initial queries to get a sense of an unknown graph’s structure and other characteristics. The queries that we have run to this point were intended to discover structure within our dataset at a macro level. The algorithm that we will run now requires us to consider the dimensionality of our graph. In other words, using algorithms such as centrality or community detection on the entire graph without context is meaningless; we need to run these algorithms on subsets of the data. Prior to delving into the following queries, a quick note: All of the algorithms ... [ Read More ]

Making Sense of 50 Billion Triples: Getting Started

In my last post I outlined some of the reasons why the promise of graph analytics takes thought and planning to really capitalize on its potential. Now, I’d like to focus on getting started, appropriately, at the beginning with understanding the data itself. So let’s discuss methods of gaining an initial understanding of our data such that we can then feed that newfound understanding back into our ingestion process. First, we need to be able to query the database for some basic information that will tell us not only what is there but how the relationships are expressed. We would also, ideally, find out if there is a hierarchy to the relationships we find. By hierarchy, I mean that graphs can use containers, if you will, to organize ... [ Read More ]

Big Data Advantage, part 3: “The Dude Abides.”

In my prior two posts about analytics, I highlighted the vast opportunity available in big data and the obstacles that prevent organizations from attaining tangible benefits: Complexity across fronts An onslaught of analytics tools The difficulty retaining the right skillsets Slowdowns in getting to insights and decisions But these hurdles can be overcome. For innovative businesses grappling with the realities of big data, an agile analytics environment provides the best of all approaches. Such a platform enables you to seize your big data advantage with a potent combination of system agility and the pervasive speed needed to deliver high-frequency insights. To address this need, Cray has fused supercomputing technology ... [ Read More ]