January 21, 2009. Why in the world would that be a date to recall? Well, for one, it was the day after Barack Obama was inaugurated as the 44th President of the United States. And secondly, it was the day President Obama released an arguably game changing document, his Memorandum on Transparency and Open Government. This one document set the stage for a new era in how government would look at the data it collects and creates. Since that time, the world of data has changed dramatically! Consider this – new analytics tools, new data types, new devices creating data, new storage ideas, new visualization applications, new concepts, new laws – the list of innovations goes on.
But, all these great innovations are not really why I’m writing today. Today, I’d like to call your attention to a news article I read in NextGov, “Amid Open Data Push, Agencies Feel Urge for Analytics”. I have to admit, as I read this article, I found myself getting just a little bit giddy. Why? Great question, thanks for asking. J Before going on with my thoughts, please take a moment to read the article. Go ahead, I have time. I’ll wait.
Picking up where I left off…
Since 2009, the notion of “open data” has been discussed primarily from one of two main perspectives:
- Transparency of government to citizens – Accountability
- What the private sector can do – Innovation
No doubt, there have been significant advances on both of these topics. Yet, as important as these concepts are, budget and resource constraints can cause open data efforts to be prioritized lower than, say, a mission-critical program.
Of course, I get this – mission first – but, a couple years ago it hit me, maybe government agencies are not seeing a potential opportunity that’s sitting right in front of them. Along with the mandate to publish open data, is the opportunity to consume open data and get it into their analytics engines, thus, supporting the agency’s mission! Just this slight mind shift has the potential to turn open data initiatives into a means to create value. Now do you see why I am excited by the article? (If not, I’ll assume you’ve yet to read it.) I’m thrilled to see agencies adding a third perspective to the open data conversation:
- Consumption of open data – Improving an agency’s ability to deliver on its mission(s)
I am looking forward to following the success of any agency effort to take advantage of open data as a strategic resource. If you have other examples beyond the cases noted in the NextGov article, please share!
That’s right, Valentine’s Day is upon us, the day that symbolizes the power of love and has the ability to strengthen relationships between people. I’ve personally experienced 53 Valentine’s Days so I believe I speak with no small measure of authority on the topic of how to make the best of it. Here are my top five suggestions for having a great day:
- Know everything you can about the people you have relationships with
- Quality matters
- ALL your relationships matter
- Uncover your hidden or anonymous relationships
- Treat your relationships with respect all year long
OK, I admit, this is not the most romantic list ever and might get you in more trouble with your significant other than actually forgetting Valentine’s Day altogether! But, what did you expect? I work for a software company, not eHarmony!
Right. Software. Let’s put this list into the context of government agencies.
- Know everything – If your agency’s mission involves delivering services to citizens, likely, you have multiple “systems of record”, each with a supposed accurate record of all the people being tracked by each system. In reality though, it’s rare that the data about individuals is consistently accurate and complete from system to system. The ability to centralize all the data about individuals into a single, authoritative “record” is key to improving service delivery. Such a record will enable you to ensure the citizens you serve are able to take full advantage of all the services available to them. Further, having a single record for each citizen has the added benefit of reducing fraud, waste and abuse.
- Quality matters – Few things hinder the delivery of services more than bad data, data with errors, inconsistencies and gaps in completeness. It is difficult, at best, to make sound business decisions with bad data. At the individual level and at the macro level, agency decision makers need complete and accurate data to ensure each citizen is fully served.
- All relationships matter – In this context, going beyond having single records to represent people, it’s also important to have single, authoritative views of other entities – programs, services, providers, deliverables, places, etc.
- Uncover hidden relationships – Too often, in the complex eco-system of government programs and services, the inability to easily recognize relationships between people and the additional entities mentioned above creates inefficiencies in the “system”. For example, it can go unnoticed that a single parent is not enrolled in a special program designed for their unique life circumstances. Flipping the coin, not having a full view of hidden relationships also opens the door for the less scrupulous in society, giving them the ability to hide their fraudulent activities in plain sight.
- Treat relationships respectfully all year – Data hygiene is not a one-time endeavor. Having the right mindset, processes and tools to implement and automate the process of “mastering” data as an on-going process will better ensure the relationship between your agency and those it serves will remain positive and productive.
I may not win the “Cupid of the Year” award, but, I hope my light-hearted Valentine’s Day message has given you a thing or two to think about. Maybe Lennon and McCartney are right, between people, “Love is All You Need”. But, we at Informatica believe for Government-Citizen relationships, a little of the right software can go a long way.
If you work for or with the government and you care about the cloud, you’ve probably already read the recent MeriTalk report, “Cloud Without the Commitment”. As well, you’ve probably also read numerous opinions about the report. In fact, one of Informatica’s guest bloggers, David Linthicum, just posted his thoughts. As I read the report and the various opinions, I was struck by the seemingly, perhaps, unintentional suggestion that (sticking with MeriTalk’s dating metaphor) the “commitment issues” are a government problem. Mr. Linthicum’s perspective is “there is really no excuse for the government to delay migration to cloud-based platforms” and “It’s time to see some more progress”, suggesting that the onus in on government to move forward.
I do agree that, leveraged properly, there’s much more value to be extracted from the cloud by government. Further, I agree that cloud technologies have sufficiently matured to the point that it is feasible to consider migrating mission critical applications. Yet, is it possible that the government’s “fear of commitment” is, in some ways, justified?
Consider this stat from the MeriTalk report – only half (53%) of the respondents rate their experience with the cloud as very successful. That suggests the experience of the other half, as MeriTalk words it, “leave(s) something to be desired.” If I’m a government decision maker and I’m tasked with keeping mission critical systems up and sensitive data safe, am I going to jump at the opportunity to leverage an approach that only half of my peers are satisfied with? Maybe, maybe not.
Now factor this in:
- 53% are concerned about being locked into a contract where the average term is 3.6 years
- 58% believe cloud providers do not provide standardized services, thus creating lock in
Back to playing government decision maker, if I do opt to move applications to the cloud, once I get there, I’m bound to that particular provider – contractually and, at least to some extent, technologically. How comfortable am I with the notion of rewriting/rehosting my mission-critical, custom application to run in XYZ cloud? Good question, right?
Inevitably, government agencies will end up with mission-critical systems and sensitive data in the cloud, however, successful “marriages” are hard, making them a bit of a rare commodity
Do I believe government has a “fear of commitment”? Nah, I just see their behavior as prudent caution on their way to the altar.
To level set, let’s make sure you understand my definition of dark data. I prefer using visualizations when I can so, picture this: the end of the first Indiana Jones movie, Raiders of the Lost Ark. In this scene, we see the Ark of the Covenant, stored in a generic container, being moved down the aisle in a massive warehouse full of other generic containers. What’s in all those containers? It’s pretty much anyone’s guess. There may be a record somewhere, but, for all intents and purposes, the materials stored in those boxes are useless.
Applying this to data, once a piece of data gets shoved into some generic container and is stored away, just like the Arc, the data becomes essentially worthless. This is dark data.
Opening up a government agency to all its dark data can have significant impacts, both positive and negative. Here are couple initial tips to get you thinking in the right direction:
- Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
- Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
- Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
- Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
- Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
- Store it – it’s interesting how often agencies think this is the first step
- Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
- Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.
Clearly, there’s more to shining the light on dark data than I can offer in this post. If you’d like to take the next step to learning what is possible, I suggest you download the eBook, The Dark Data Imperative.
What I love about the cloud is it has something of value to offer practically any government organization, regardless of size, maturity, point of view, approach. Even for the most conservative IT shops, there are use cases that just plain make sense. And with the growing availability of FEDRAMP certified offerings, it’s becoming easier to procure. But, thinking realistically, for reasons of law, budget, time, architecture, we know the cloud will not be the solution for every public sector problem. Some applications, some data will never leave your agency’s premises. And here in lies the new complexity. You have applications and data on-prem. You have applications and data in the cloud. And you have business requirements that require these apps to work together, to share data.
So, now that you have a hybrid environment, what can you do about? Let’s face it, we can talk about technology, architecture and approaches all day long, but, it always comes down to this, what should be done with the data. You need answers to questions such as; Is it safe? Is it accessible? It is reliable? How do I know if the integrity has been compromised? What about the quality? How error-prone is the data? How complete is the data? How do we manage it across this new hybrid landscape? How can I get data from a public cloud application to my on-prem data warehouse? How can I leverage the flexibility of public IaaS to build a new application that will need access to data that is also required for an on-prem legacy application?
I know many government IT professional are wrestling with these questions and seeking solutions. So, here’s an interesting thought. Most of these questions are not exactly new, they are just taking on the added context of the cloud. Prior to the cloud, many agencies discovered answers in the form of a data integration platform. The platform is used to ensure every application, every user has access to the data they need to perform their mission or job. I think of it this way. The platform is a “standardized” abstraction layer that ensures all your data gets to where it needs to be, when it needs to be there, in the form it needs to be in. There are hundreds of government IT shops using such an approach.
Here’s the good news. This approach to integrating data can be extended to include the cloud. Imagine placing “agents” in all the places where your data needs to live, the agents capable of communicating with each other to integrate, alter or move data. Now add to this the idea of a cloud-based remote control that allows you to control all the functions of the agents. Using such a platform now enables your agency to tie on-prem systems to cloud systems, minimizing the effect of having multiple silos of information. Now government workers and warfighters will have the ability to more quickly get complete, accurate data, regardless of where it originates and citizens will benefit from more effectively delivered services.
How would such an approach change your ideas on how to leverage the cloud for your agency? If you live near the Washington, DC area, you may wish to drop in on the Government Cloud Computing and Data Center Conference & Expo. One of my colleagues, Ronen Schwartz will be discussing this topic. For those not in the vicinity, you can learn more here.
I just finished reading a great article from one of my former colleagues, Bill Franks. He makes a strong argument that Big Data is not inherently good or evil anymore than money is. What makes Big Data (or any data as I see it) take on a characteristic of good or evil is how it is used. Same as money, right? Here’s the rest of Bill’s article.
Bill framed his thoughts within the context of a discussion with a group of government legislators who I would characterize based on his commentary as a bit skittish of government collecting Big Data. Given many recent headlines, I sincerely do not blame them for being concerned. In fact, I applaud them for being cautious.
At the same time, while Big Data seems to be the “type” of data everyone wants to speak about, the scope of the potential problem extends to ALL data. Just because a particular dataset is highly structured into a 20 year old schema that does not exclude it from misuse. I believe structured data has been around for so long people are comfortable with (or have forgotten about) the associated risks.
Any data can be used for good or ill. Clearly, it does not make sense to take the position that “we” should not collect, store and leverage data based on the notion someone could do something bad.
I suggest the real conversation should revolve around access to data. Bill touches on this as well. Far too often, data, whether Big Data or “traditional”, is openly accessible to some people who truly have no need based on job function.
Consider this example – a contracted application developer in a government IT shop is working on the latest version of an existing application for agency case managers. To test the application and get it successfully through a rigorous quality assurance process the IT developer needs a representative dataset. And where does this data come from? It is usually copied from live systems, with personally identifiable information still intact. Not good.
Another example – Creating a 360 degree view of the citizens in a jurisdiction to be shared cross-agency can certainly be an advantageous situation for citizens and government alike. For instance, citizens can be better served, getting more of what they need, while agencies can better protect from fraud, waste and abuse. Practically any agency serving the public could leverage the data to better serve and protect. However, this is a recognized sticky situation. How much data does a case worker from the Department of Human Services need versus that of a law enforcement officer or an emergency services worker need? The way this has been addressed for years is to create silos of data, carrying with it, its own host of challenges. However, as technology evolves, so too should process and approach.
Stepping back and looking at the problem from a different perspective, both examples above, different as they are, can be addressed by incorporating a layer of data security directly into the architecture of the enterprise. Rather than rely on a hodgepodge of data security mechanisms built into point applications and silo’d systems, create a layer through which all data, Big or otherwise, is accessed.
Through such a layer, data can be persistently and/or dynamically masked based on the needs and role of the user. In the first example of the developer, this person would not want access to a live system to do their work. However, the ability to replicate the working environment of the live system is crucial. So, in this case, live data could be masked or altered in a permanent fashion as it is moved from production to development. Personally identifiable information could be scrambled or replaced with XXXXs. Now developers can do their work and the enterprise can rest assured that no harm can come from anyone seeing this data.
Further, through this data security layer, data can be dynamically masked based on a user’s role, leaving the original data unaltered for those who do require it. There are plenty of examples of how this looks in practice, think credit card numbers being displayed as xxxx-xxxx-xxxx-3153. However, this is usually implemented at the application layer and considered to be a “best practice” rather than governed from a consistent layer in the enterprise.
The time to re-think the enterprise approach to data security is here. Properly implemented and deployed, many of the arguments against collecting, integrating and analyzing data from anywhere are addressed. No doubt, having an active discussion on the merits and risks of data is prudent and useful. Yet, perhaps it should not be a conversation to save or not save data, it should be a conversation about access