Category Archives: Public Sector
I just finished reading a great article from one of my former colleagues, Bill Franks. He makes a strong argument that Big Data is not inherently good or evil anymore than money is. What makes Big Data (or any data as I see it) take on a characteristic of good or evil is how it is used. Same as money, right? Here’s the rest of Bill’s article.
Bill framed his thoughts within the context of a discussion with a group of government legislators who I would characterize based on his commentary as a bit skittish of government collecting Big Data. Given many recent headlines, I sincerely do not blame them for being concerned. In fact, I applaud them for being cautious.
At the same time, while Big Data seems to be the “type” of data everyone wants to speak about, the scope of the potential problem extends to ALL data. Just because a particular dataset is highly structured into a 20 year old schema that does not exclude it from misuse. I believe structured data has been around for so long people are comfortable with (or have forgotten about) the associated risks.
Any data can be used for good or ill. Clearly, it does not make sense to take the position that “we” should not collect, store and leverage data based on the notion someone could do something bad.
I suggest the real conversation should revolve around access to data. Bill touches on this as well. Far too often, data, whether Big Data or “traditional”, is openly accessible to some people who truly have no need based on job function.
Consider this example – a contracted application developer in a government IT shop is working on the latest version of an existing application for agency case managers. To test the application and get it successfully through a rigorous quality assurance process the IT developer needs a representative dataset. And where does this data come from? It is usually copied from live systems, with personally identifiable information still intact. Not good.
Another example – Creating a 360 degree view of the citizens in a jurisdiction to be shared cross-agency can certainly be an advantageous situation for citizens and government alike. For instance, citizens can be better served, getting more of what they need, while agencies can better protect from fraud, waste and abuse. Practically any agency serving the public could leverage the data to better serve and protect. However, this is a recognized sticky situation. How much data does a case worker from the Department of Human Services need versus that of a law enforcement officer or an emergency services worker need? The way this has been addressed for years is to create silos of data, carrying with it, its own host of challenges. However, as technology evolves, so too should process and approach.
Stepping back and looking at the problem from a different perspective, both examples above, different as they are, can be addressed by incorporating a layer of data security directly into the architecture of the enterprise. Rather than rely on a hodgepodge of data security mechanisms built into point applications and silo’d systems, create a layer through which all data, Big or otherwise, is accessed.
Through such a layer, data can be persistently and/or dynamically masked based on the needs and role of the user. In the first example of the developer, this person would not want access to a live system to do their work. However, the ability to replicate the working environment of the live system is crucial. So, in this case, live data could be masked or altered in a permanent fashion as it is moved from production to development. Personally identifiable information could be scrambled or replaced with XXXXs. Now developers can do their work and the enterprise can rest assured that no harm can come from anyone seeing this data.
Further, through this data security layer, data can be dynamically masked based on a user’s role, leaving the original data unaltered for those who do require it. There are plenty of examples of how this looks in practice, think credit card numbers being displayed as xxxx-xxxx-xxxx-3153. However, this is usually implemented at the application layer and considered to be a “best practice” rather than governed from a consistent layer in the enterprise.
The time to re-think the enterprise approach to data security is here. Properly implemented and deployed, many of the arguments against collecting, integrating and analyzing data from anywhere are addressed. No doubt, having an active discussion on the merits and risks of data is prudent and useful. Yet, perhaps it should not be a conversation to save or not save data, it should be a conversation about access
Recently, my US-based job led me to a South African hotel room, where I watched Germany play Brazil in the World Cup. The global nature of the event was familiar to me. My work covers countries like Malaysia, Thailand, Singapore, South Africa and Costa Rica. And as I pondered the stunning score (Germany won, 7 to 1), my mind was drawn to emerging markets. What defines an emerging market? In particular, what are the data-related themes common to emerging markets? Because I work with global clients in the banking, oil and gas, telecommunications, and retail industries, I have learned a great deal about this. As a result, I wanted to share my top 5 observations about data in Emerging Markets.
1) Communication Infrastructure Matters
Many of the emerging markets, particularly in Africa, jumped from one or two generations of telco infrastructure directly into 3G and fiber within a decade. However, this truth only applies to large, cosmopolitan areas. International diversification of fiber connectivity is only starting to take shape. (For example, in Southern Africa, BRICS terrestrial fiber is coming online soon.) What does this mean for data management? First, global connectivity influences domestic last mile fiber deployment to households and businesses. This, in turn, will create additional adoption of new devices. This adoption will create critical mass for higher productivity services, such as eCommerce. As web based transactions take off, better data management practices will follow. Secondly, European and South American data centers become viable legal and performance options for African organizations. This could be a game changer for software vendors dealing in cloud services for BI, CRM, HCM, BPM and ETL.
2) Competition in Telecommunication Matters
If you compare basic wireless and broadband bundle prices between the US, the UK and South Africa, for example, the lack of true competition makes further coverage upgrades, like 4G and higher broadband bandwidths, easy to digest for operators. These upgrades make telecommuting, constant social media engagement possible. Keeping prices low, like in the UK, is the flipside achieving the same result. The worst case is high prices and low bandwidth from the last mile to global nodes. This also creates low infrastructure investment and thus, fewer consumers online for fewer hours. This is often the case in geographically vast countries (Africa, Latin America) with vast rural areas. Here, data management is an afterthought for the most part. Data is intentionally kept in application silos as these are the value creators. Hand coding is pervasive to string data together to make small moves to enhance the view of a product, location, consumer or supplier.
3) A Nation’s Judicial System Matters
If you do business in nations with a long, often British judicial tradition, chances are investment will happen. If you have such a history but it is undermined by a parallel history of graft from the highest to the lowest levels because of the importance of tribal traditions, only natural resources will save your economy. Why does it matter if one of my regional markets is “linked up” but shipping logistics are burdened by this excess cost and delay? The impact on data management is a lack of use cases supporting an enterprise-wide strategy across all territories. Why invest if profits are unpredictable or too meager? This is why small Zambia or Botswana are ahead of the largest African economy, Nigeria.
4) Expertise Location Matters
Anybody can have the most advanced vision on a data-driven, event-based architecture supporting the fanciest data movement and persistence standards. Without the skill to make the case to the business it is a lost cause unless your local culture still has IT in charge of specifying requirements, running the evaluation, selecting and implementing a new technology. It is also done for if there are no leaders who have experienced how other leading firms in the same or different sector went about it (un)successfully. Lastly, if you don’t pay for skill, your project failure risk just tripled. Duh!
5) Denial is Universal
No matter if you are an Asian oil company, a regional North American bank, a Central American National Bank or an African retail conglomerate. If finance or IT invested in any technologies prior and they saw a lack of adoption, for whatever reason, they will deny data management challenges despite other departments complaining. Moreover, if system integrators or internal client staff (mis)understand data management as fixing processes (which it is not) instead of supporting transactional integrity (which it is), clients are on the wrong track. Here, data management undeservedly becomes a philosophical battleground.
This is definitely not a complete list or super-thorough analysis but I think it covers the most crucial observations from my engagements. I would love to hear about your findings in emerging markets.
Stay tuned for part 2 of this series where I will talk about the denial and embrace of corporate data challenges as it pertains to an organization’s location.
What springs to mind when you think about old applications? What happens to them when they outlived their usefulness? Do they finally get to retire and have their day in the sun, or do they tenaciously hang on to life?
Think for a moment about your situation and of those around you. From the time work started you have been encouraged and sometimes forced to think about, plan for and fund your own retirement. Now consider the portfolio your organization has built up over the years; hundreds or maybe thousands of apps, spread across numerous platforms and locations – A mix of home-grown with the best-in-breed tools or acquired from the leading application vendors.
Evaluating Your Current Situation
- Do you know how many of those “legacy” systems are still running?
- Do you know how much these apps are costing?
- Is there a plan to retire them?
- How is the execution tracking to plan?
Truth is, even if you have a plan, it probably isn’t going well.
Providing better citizen service at a lower cost
This is something every state and local organization aspires to do by reducing costs. Many organizations are spending 75% or more of their budgets on just keeping the lights on – maintaining existing applications and infrastructure. Being able to fully retire some, or many of these applications saves significant money. Do you know how much these applications are costing your organization? Don’t forget to include the whole range of costs that applications incur – including the physical infrastructure costs such as mainframes, networks and storage, as well as the required software licenses and of course the time of the people that actually keep them running. What happens when those with with Cobol and CICS experience retire? Usually the answer is not good news. There is a lot to consider and many benefits to be gained through an effective application retirement strategy.
August 2011 report by ESG Global shows that some 68% of organizations had over six or more legacy applications running and that 50% planned to retire at least one of those over the following 12-18 months. It would be interesting to see today’s situation and be able evaluate how successful these application retirement plans have been.
A common problem is knowing where to start. You know there are applications that you should be able to retire, but planning, building and executing an effective and success plan can be tough. To help this process we have developed a strategy, framework and solution for effective and efficient application retirement. This is a good starting point on your application retirement journey.
To get a speedy overview, take six minutes to watch this video on application retirement.
We have created a community specifically for application managers in our ‘Potential At Work’ site. If you haven’t already signed up, take a moment and join this group of like-minded individuals from across the globe.
In the other, they hear administrative talk of smaller budgets and scarcer resources.
As stringent requirements for both transparency and accountability grow, this paradox of pressure increases.
Sometimes, the best way to cope is to TALK to somebody.
What if you could ask other data technologists candid questions like:
- Do you think government regulation helps or hurts the sharing of data?
- Do you think government regulators balance the privacy needs of the public with commercial needs?
- What are the implications of big data government regulation, especially for users?
- How can businesses expedite the government adoption of the cloud?
- How can businesses aid in the government overcoming the security risks associated with the cloud?
- How should the policy frameworks for handling big data differ between the government and the private sector?
What if you could tell someone who understood? What if they had sweet suggestions, terrific tips, stellar strategies for success? We think you can. We think they will.
That’s why Twitter needs a #DataChat.
What on earth is a #DataChat?
Good question. It’s a Twitter Chat – A public dialog, at a set time, on a set topic. It’s something like a crowd-sourced discussion. Any Twitter user can participate simply by including the applicable hashtag in each tweet. Our hashtag is #DataChat. We’ll connect on Twitter, on the third Thursday of each month to share struggles, victories and advice about data governance. We’re going to begin this week, Thursday April 17, at 3:00 PM Eastern Time. For our first chat, we are going to discuss topics that relate to data technologies in government organizations.
What don’t you join us? Tell us about it. Mark your calendar. Bring a friend.
Because, sometimes, you just need someone to talk to.
Loraine Lawson does an outstanding job of covering the issues around government use of “data heavy” projects. This includes a report by the government IT site, MeriTalk.
“The report identifies five factors, which it calls the Big Five of IT, that will significantly affect the flow of data into and out of organizations: Big data, data center consolidation, mobility, security and cloud computing.”
MeriTalk surveyed 201 state and local government IT professionals, and found that, while the majority of organizations plan to deploy the Big Five, 94 percent of IT pros say their agency is not fully prepared. “In fact, if Big Data, mobile, cloud, security and data center consolidation all took place today, 89 percent say they’d need additional network capacity to maintain service levels. Sixty-three percent said they’d face network bottleneck risks, according to the report.”
This report states what most who work with the government already know; the government is not ready for the influx of data. Nor is the government ready for the different uses of data, and thus there is a large amount of risk as the amount of data under management within the government explodes.
Add issues with the approaches and technologies leveraged for data integration to the list. As cloud computing and mobile computing continue to rise in popularity, there is not a clear strategy and technology for syncing data in the cloud, or on mobile devices, with data that exists within government agencies. Consolidation won’t be possible without a sound data integration strategy, nor will the proper use of big data technology.
The government sees a huge wave of data heading for it, as well as opportunities with new technology such as big data, cloud, and mobile. However, there doesn’t seem to be an overall plan to surf this wave. According to the report, if they do wade into the big data wave, they are likely to face much larger risks.
The answer to this problem is really rather simple. As the government moves to take advantage of the rising tide of data, as well as new technologies, they need to be funded to get the infrastructure and the technology they need to be successful. The use of data integration approaches and technologies, for example, will return the investment ten-fold, if properly introduced into the government problem domains. This includes integration with big data systems, mobile devices, and, of course, the rising use of cloud-based platforms.
While data integration is not a magic bullet for the government, nor any other organization, the proper and planned use of this technology goes a long way toward reducing the inherent risks that the report identified. Lacking that plan, I don’t think the government will get very far, very fast.
In the first two issues I spent time looking at the need for states to pay attention to the digital health and safety of their citizens, followed by the oft forgotten need to understand and protect the non-production data. This is data than has often proliferated and also ignored or forgotten about.
In many ways, non-production data is simpler to protect. Development and test systems can usually work effectively with realistic but not real PII data and realistic but not real volumes of data. On the other hand, production systems need the real production data complete with the wealth of information that enables individuals to be identified – and therefore presents a huge risk. If and when that data is compromised either deliberately or accidentally the consequences can be enormous; in the impact on the individual citizens and also the cost of remediation on the state. Many will remember the massive South Carolina data breach of late 2012 when over the course of 2 days a 74 GB database was downloaded and stolen, around 3.8 million payers and 1.9 million dependents had their social security information stolen and 3.3 million “lost” bank account details. The citizens’ pain didn’t end there, as the company South Carolina picked to help its citizens seems to have tried to exploit the situation.
The biggest problem with securing production data is that there are numerous legitimate users and uses of that data, and most often just a small number of potentially malicious or accidental attempts of inappropriate or dangerous access. So the question is… how does a state agency protect its citizens’ sensitive data while at the same time ensuring that legitimate uses and users continues – without performance impacts or any disruption of access? Obviously each state needs to make its own determination as to what approach works best for them.
This video does a good job at explaining the scope of the overall data privacy/security problems and also reviews a number of successful approaches to protecting sensitive data in both production and non-production environments. What you’ll find is that database encryption is just the start and is fine if the database is “stolen” (unless of course the key is stolen along with the data! Encryption locks the data away in the same way that a safe protects physical assets – but the same problem exists. If the key is stolen with the safe then all bets are off. Legitimate users are usually easily able deliberately breach and steal the sensitive contents, and it’s these latter occasions we need to understand and protect against. Given that the majority of data breaches are “inside jobs” we need to ensure that authorized users (end-users, DBAs, system administrators and so on) that have legitimate access only have access to the data they absolutely need, no more and no less.
So we have reached the end of the first series. In the first blog we looked at the need for states to place the same emphasis on the digital health and welfare of their citizens as they do on their physical and mental health. In the second we looked at the oft-forgotten area of non-production (development, testing, QA etc.) data. In this third and final piece we looked at the need to and some options for providing the complete protection of non-production data.
In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.
Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.
Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.
1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.
Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.
It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.
This is the first in a series of articles where I will take an in-depth look at how state and local governments are affected by data breaches and what they should be considering as part of their compliance, risk-avoidance and remediation plans.
Each state has one or more agencies that are focused on the lives, physical and mental health and overall welfare of their citizens. The mission statement of the Department of Public Welfare of Pennsylvania, my home state is typical, it reads “Our vision is to see Pennsylvanians living safe, healthy and independent lives. Our mission is to improve the quality of life for Pennsylvania’s individuals and families. We promote opportunities for independence through services and supports while demonstrating accountability for taxpayer resources.”
Just as in the enterprise, over the last couple of decades the way an agency deals with citizens has changed dramatically. No longer is everything paper-based and manually intensive – each state has made enormous efforts not just to automate more and more of their processes but more lately to put everything online. The combination of these two factors has led to the situation where just about everything a state knows about each citizen is stored in numerous databases, data warehouses and of course accessed through the Web.
It’s interesting that in the PA mission statement two of the three focus areas are safety and health– I am sure when written these were meant in the physical sense. We now have to consider what each state is doing to safeguard and promote the digital safety and health of its citizens. You might ask what digital safety and health means – at the highest level this is quite straightforward – it means that each state must ensure the data it holds about its’ citizens is safe from inadvertent or deliberate exposure or disclosure. It seems that each week we read about another data breach – high profile data breach infographic - either accidental (a stolen laptop for instance) or deliberate (hacking as an example) losses of data about people – the citizens. Often that includes data contents that can be used to identify the individuals, and once an individual citizen is identified they are at risk of identity theft, credit card fraud or worse.
Of the 50 states, 46 now have a series of laws and regulations in place about when and how they need to report on data breaches or losses – this is all well and good, but is a bit like shutting the stable door after the horse has bolted – but with higher stakes as there are potentially dire consequences to the digital safety and health of their citizens.
In the next article I will look at the numerous areas that are often overlooked when states establish and execute their data protection and data privacy plans.
Personally Identifiable Information is under attack like never before. In the news recently two prominent organizations—institutions—were attacked. What happened:
- A data breach at a major U.S. Insurance company exposed over a million of their policyholders to identity fraud. The data stolen included Personally Identifiable information such as names, Social Security numbers, driver’s license numbers and birth dates. In addition to Nationwide paying million dollar identity fraud protection to policyholders, this breach is creating fears that class action lawsuits will follow. (more…)