posted on February 09, 2015 18:59
Big Data is a term that is bandied about quite a bit these days. One sees references to Big Data on television commercials, in pop-ups on web sites, and in print advertisements. Have you ever stopped to think why “Big Data” has Big Data become the latest technical buzz word?
Of course, I think Big Data is much more than a buzz word, and so do The Partnership for Public Service and the IBM Center for The Business of Government, who have sponsored our Conversations on Big Data because we think there is value in communicating how public agencies are using quantitative analytics creatively to better deliver services to their constituencies. We have seen several examples of Big Data at work in this blog. What we need to remember is that Big Data is not just a vast sea of facts and figures, pulled together using sophisticated toolsets, and backed up by text and images. Big Data gives us the ability to tie all those data points and documents together to form a picture or snapshot. What’s special about Big Data is integration.
The importance of integration was a key theme of our Conversation with Lori Walsh, Chief of the Center for Risk and Quantitative Analysis at the Securities Exchange Commission (SEC). Lori described her group as the “center of a hub-and-spoke system” that centralizes information from regional offices across the country. The trading of securities in the US is subject to a set of laws that require transparency. Transparency is achieved by the collection of data for each and every transaction, and about each and every participant. According to Lori, the value provided through analytics is “pulling together the pieces of the puzzle. If you just have a big pile of pieces, you can’t tell what the picture is. When you start organizing the data, you can start putting the pieces of the puzzle together and … a picture starts to emerge.” How do you accomplish that organization so that you can integrate all those pieces of the puzzle?
The first step in integrating data is always discovery. What exactly are all these little pieces? In terms of a securities transaction, the pieces are:
1.) The Buyer and the Seller
2.) The Security being traded and how many Units of the Security are being exchanged
3.) The Amount the Buyer is paying the Seller
4.) The Date upon which the transaction occurred
All of these data elements can be precisely defined and documented. For example, Buyers and Sellers are typically financial institutions registered with the SEC who place trades according to instructions received from investors hoping to increase the value of their portfolio of securities. Defining data elements in business terms is critical to providing context to any analysis of the data itself. That is called business metadata.
The second step in integration is collection. Once we have discovered and defined the pertinent data points we are dealing with, we need to collect them into a single repository. The “hub and spoke” structure that Lori referred to provides a metaphor for building a repository of integrated data. Continuing with our example, as securities transactions occur, they are documented by the Seller, the Buyer, whatever exchange in which they occur, etc. We now have many records of our transaction, all of which are captured in data repositories. To make that data meaningful we need to organize those repositories by determining which of those data elements provide context. Look again at our example of a securities trade. Within our dataset, there are numbers (Amount and Units) and one date. The other data elements are “Who” (Buyer and Seller) and “What” (the Security traded). When we think of our data elements in that sense, we can start to put thinking of our transaction as “Who is buying What When” or “Who is selling What When?” Organizing our repository by looking to our business metadata to determine our context enables us to start querying – asking questions – that are relevant to our data consumers.
Over time, we have built up many large data repositories with well defined content that has been organized in order to provide answers to common business questions. That is one level of data integration, but now let’s takes a look at the next level. Lori Walsh provided us an example of how the SEC leverages integration across numerous data repositories.
“… perhaps we have a tip about … insider trading. We have done … an examination of the broker dealer who was named in the tip … We have previously done investigations on that broker but never brought an action against him or her, and by pulling this information together, … we’re able to scan across our databases and see … information bubble up immediately that gives more credibility to the tip. So it is having … the data in place and the pieces organized so that when a new piece comes in you can immediately see where it fits and how it associates with the other information that we already have.”
This is why Big Data is much more than a buzz word. Big Data provides the ability to integrate data from different repositories that are organized differently and have widely varying content but contain at least one commonly defined data element (in this example the broker), that enables us to pull all that disparate data out, look at where it intersects, and then connect the dots to make reasonable, fact-based conclusions.
Check out past Conversations on Big Data for more tips and insights on using big data to improve your organization’s mission effectiveness.
This article represents the views of the author only, and the information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation.