Informatica announced Secure@Source last week, unveiling the industry’s first data security intelligence offering. At a time when ‘Not Knowing Where Sensitive and Confidential Data Reside‘ is the number one thing that keeps security professionals up at night for two years in a row, according to The Ponemon Institute, it seems like the timing is right for a capability, such as Data Security Intelligence, that gives line of site to obscured threats.
Neuralytix conducted market research, entitled The Future State of Data Security Intelligence, where they define data security intelligence (DSI) as a framework for understanding the risk of sensitive or confidential data and recommending the optimal set of controls to mitigate that risk. DSI is comprised of technology that provides the definition, classification, discovery, and assessment phases of a data-centric security approach. The state:
By deploying data security intelligence in combination with data security controls, enterprises can gain active insight into where risks exist and proactively set controls to mitigate the impact in the event of a data breach.
The Enterprise Strategy Group further commented in the report, “Data‐centric Security: A New Information Security Perimeter”, authored by industry expert, Jon Oltsik:
To address modern threats and IT mobility, CISOs must adopt two new security perimeters around identity attributes and data-centric security. In this regard, sensitive data must be continuously monitored for situational awareness and risk management.
This launch precedes the security industry’s equivalent of the NFL’s Superbowl – RSA Conference, where the world talks security. Informatica will be there, debuting its first Data Security Intelligence offering Secure@Source. The team should be so proud – this is by far one of the coolest products I have had the opportunity to be a part of. Here is a brief blurb on what Secure@Source is and does:
Secure@Source discovers, analyzes and visualizes data relationships, proliferation and sensitivity that details data risks and vulnerabilities to focus data protection and monitoring to secure data from external breaches and insider abuse. Secure@Source leverages proven data integration and quality capabilities to provide integrated views of data, independent of platform, from legacy, cloud, big data and mobile environments.
Secure@Source provides granular detail on what data has value, where the data resides and how it transverses the enterprise and how it should be protected. Informatica leverages market leading technology for data discovery and profiling, protection and retirement, and innovative analysis and visualizations for monitoring data security in real-time.
At a conference where the world talks security, I’m looking forward to engaging in conversations with you about getting smarter about Data Security Intelligence and eliminate blind spots. See you at the venue from April 20-24, South Hall, Booth No.2626 .
Original article is posted at techcrunch.com
It’s probably no surprise to the security professional community that once again, identity theft is among the IRS’s Dirty Dozen tax scams. Criminals use stolen Social Security numbers and other personally identifiable information to file tax claims illegally, deposit the tax refunds to rechargeable debit cards, and vanish before the average citizen gets around to filing.
Since the IRS began publishing its “Dirty Dozen” list to alert filers of the worst tax scams, identity theft has continually topped the list since 2011. In 2012, the IRS implemented a preventive measure to catch fraud prior to actually issuing refunds, and issued more than 2,400 enforcement actions against identity thieves. With an aggressive campaign to fight identity theft, the IRS saved over $1.4 billion in 2011 and over $63 billion since October 2014.
That’s great progress – but given that of the 117 million tax payers who filed electronically in 2014, 80 million received on average $2,851 directly deposited into their bank, which is more than $229 billion changing hands electronically. The pessimist in me has to believe that cyber criminals are already plotting how to nab more Social Security numbers and e-filing logins to tap into that big pot of gold.
So where are criminals getting the data to begin with? Any organization that has employees and a human resources department collects and possibly stores Social Security numbers, birthdays, addresses and income either on-premises or in a cloud HR application. This information is everything a criminal would need to fraudulently file taxes. Any time a common business process is digitally transformed, or moved to the cloud, the potential risk of exposure increases.
As the healthcare industry transforms to electronic health records and patient records, another abundant source of Social Security numbers and personally identifiable information increases the surface area of opportunity. When you look at the abundance of Social Security numbers stolen in major data breaches, such as the case with Anthem, you start to connect the dots.
One of my favorite dynamic infographics comes from the website Information is Beautiful entitled, ‘World’s Biggest Data Breaches.’ When you filter the data based on number of records versus sensitivity, the size of the bubbles indicate the severity. Even though the sensitivity score appears to be somewhat arbitrary, it does provide one way to assess the severity based on the type of information that was breached:
|Just email address/online information||1|
|Credit card information||300|
|Email password/health records||4000|
|Full bank account details||50000|
What would be an interesting addition is how many records were sold on the black market that resulted in tax or insurance fraud.
Cyber-security expert Brian Krebs, who was personally impacted by a criminal tax return filing last year, says we will likely see “more phony tax refund claims than last year.” With credentials for TurboTax and H&R Block marketed on black market websites for about 4 cents per identity, it is hard to disagree.
The Ponemon Institute published a survey last year, entitled The State of Data Centric Security. One research finding that sticks out is when security professionals were asked what keeps them up at night, and more than 50 percent said “not knowing where sensitive and confidential data reside.” As we enter full swing into tax season, what should security professionals be thinking about?
Data Security Intelligence promises to be the next big thing that provides a more automated and data-centric view into sensitive data discovery, classification and risk assessment. If you don’t know where the data is or its risk, how can you protect it? Maybe with a little more insight, we can at least reduce the surface area of exposed sensitive data.
The International Association of Privacy Professionals (IAPP) held its Global Privacy Summit in Washington DC March 4-6. The topic of Data-Centric Security was presented by Informatica’s Robert Shields, Product Marketing, Data Security Group. Here is a quick recap of the conversation in case you missed it.
In an age of the massive data breach, there is agreement between security and privacy professionals that we must redefine privacy policies and controls. What we are doing is just not working effectively. Network, Host and Endpoint Security needs to be strengthened by Data-Centric Security approaches. The focus needs to be on using data security controls such that they can be enforced no matter where sensitive or confidential data proliferates.
Data-Centric Security does not mean ‘encrypt it all’. That is completely impractical and introduces unnecessary cost and complexities. The approach can be simplified into four categorical steps: 1. Classify it, 2. Find it, 3. Assess its risk, 4. Protect it.
1. Classify it.
The idea behind Data-Centric Security is that based on policy, an enterprise defines its classifications of what is sensitive and confidential then apply controls to that set of data. For example, if the only classified and sensitive data that you store in your enterprise is employee data, than focus on just employee data. No need to boil the ocean in that case. However, if you have several data domains of sensitive and confidential data, you need to know where it resides and assess its risk to help prioritize your moves.
2. Find it.
Discover where in your enterprise sensitive and classified data reside. This means looking at how data is proliferating from its source to multiple targets – and not just copies made for backup and disaster recovery purposes.
For example, if you have a data warehouse where sensitive and confidential data is being loaded through a transformation process, the data is still considered classified or sensitive, but its shape or form may have changed. You also need to know when data leaves the firewall it becomes available to view on a mobile device, or accessible by a remote team, such as offshore development and support teams.
3.Assess its risk.
Next, you need to be able to assess the data risk based the number of users who may have access to the data and where those users are physically located and based on existing security controls that may already exist. If large volumes of sensitive data is potentially being exposed to a large population in another country, you might want to consider this data more at risk than a few number of records that are encrypted residing in your protected data center. That helps you prioritize where to start implementing controls to maximize the return on your efforts.
4. Protect it.
Once you have a sense of prioritization, you can then apply the appropriate, cost effective controls that aligns with its level of risk. Place monitoring tools around the sensitive data and detect when usage patterns become unusual. Train on normal user behavior and then initiate an alert to recommend a change to the application of a control.
In a world where policies are defined and enforced based on data privacy regulations and standards, it only makes sense to align the right intelligence and controls to ensure proper enforcement. In reality these four steps are complex and they do require cross-functional teams to come together and agree on a strategy.
Who remembers their first game of Pong? Celebrating more than 40 years of innovation, gaming is no longer limited to monochromatic screens and dedicated, proprietary platforms. The PC gaming industry is expected to exceed $35bn by 2018. Phone and handheld games is estimated at $34bn in 5 years and quickly closing the gap. According to EEDAR, 2014 recorded more than 141 million mobile gamers just in North America, generating $4.6B in revenue for mobile game vendors.
This growth has spawned a growing list of conferences specifically targeting gamers, game developers, the gaming industry and more recently gaming analytics! This past weekend in Boston, for example, was PAX East where people of all ages and walks of life played games on consoles, PC, handhelds, and good old fashioned board games. With my own children in attendance, the debate of commercial games versus indie favorites, such as Minecraft , dominates the dinner table.
Online games are where people congregate online, collaborate, and generate petabytes of data daily. With the added bonus of geospatial data from smart phones, the opportunity for more advanced analytics. Some of the basic metrics that determine whether a game is successful, according to Ninja Metrics, include:
- New Users, Daily Active Users, Retention
- Revenue per user
- Session length and number of sessions per user
Additionally, they provide predictive analytics, customer lifetime value, and cohort analysis. If this is your gig, there’s a conference for that as well – the Gaming Analytics Summit !
At the Game Developers Conference recently held in San Francisco, the focus of this event has shifted over the years from computer games to new gaming platforms that need to incorporate mobile, smartphone, and online components. In order to produce a successful game, it requires the following:
- Needs to be able to connect to a variety of devices and platforms
- Needs to use data to drive decisions and improve user experience
- Needs to ensure privacy laws are adhered to.
Developers are able to quickly access online gaming data and tweak or change their sprites’ attributes dynamically to maximize player experience.
When you look at what is happening in the gaming industry, you can start to see why colleges and universities like my own alma mater, WPI, now offers a computer science degree in Interactive Media and Game Design degree . The IMGD curriculum includes heavy coursework in data science, game theory, artificial intelligence and story boarding. When I asked a WPI IMGD student about what they are working on, they are mapping out decision trees that dictate what adversary to pop up based on the player’s history (sounds a lot like what we do in digital marketing…).
As we start to look at the Millennial Generation entering into the workforce, maybe we should look at our own recruiting efforts and consider game designers. They are masters in analytics and creativity with an appreciation for the importance of great data. Combining the magic and the math makes a great gaming experience. Who wouldn’t want that for their customers?
Informatica, over the last two years, successfully transformed from running 80% of its application portfolio on premises to 80% in the cloud. Success was based on two key criteria:
- Ensuring the SaaS-based processes are integrated with no disruption
- Data in the cloud continues to be available and accessible for analytics
With industry analysts predicting that the majority of new application deployments will be SaaS-based by 2017, the requirement of having connected data should not be negotiable. It is a must have. Most SaaS applications ensure businesses are able to keep processes integrated using connected and shared data through application programming interfaces (APIs).
If you are a consumer of SaaS applications, you probably know the importance of having clean, connected and secure data from the cloud. The promise of SaaS is improved agility. When data is not easily accessible, that promise is broken. With the plethora of options available in the SaaS ecosystem and marketplace, not having clean, connected and safe data is a compelling event for switching SaaS vendors.
If you are in the SaaS application development industry, you probably know that building these APIs and connectors is a critical requirement for success. However, how do you decide which applications you should build connectors for when the ecosystem keeps changing? Investment in developing connectors and interfaces consumes resources and competes with developing competitive and differentiating features.
This week, Informatica launched its inaugural DataMania event in San Francisco where the leading topic was SaaS application and data integration. Speakers from AWS, Adobe, App Dynamics, Dun & Bradstreet, and Marketo – to name a few – contributed to the discussion and confirmed that we entering into the era of the Data Ready Enterprise. Also during the event, Informatica announced the Connect-a-thon, a hackathon-like event, where SaaS vendors can get connected to hundreds of cloud and on-premises apps.
Without a doubt, transitioning to a cloud and SaaS-based application architecture can only be successful if the applications are easily connectable with shared data. Here at Informatica, this was absolutely the case. Whether you are in the business or a consumer of SaaS applications, consider the benefits of using a standard library of connectors, such as what Informatica Cloud offers so you can focus your time and energy on innovation and more strategic parts of your business.
In the 2011 film Moneyball Billy Beane introduced to the sports industry how to use data analytics to acquire statistically optimal players for the Oakland A’s. In the last 4 years, advancements in data collection, preparation, aggregation and advanced analytics technology have made it possible to broaden the scope of applying analytics beyond the game and player, drastically change the shape of an industry that has a long history built on tradition.
Last week, MIT Sloan held its 9th annual Sports Analytics Conference in Boston, MA. Amidst the 6 foot snow banks, sports fanatics and data scientists came together at this sold out event to discuss the increasing role of analytics in the sports industry. This year’s conference agenda included topics spanning game statistics and modeling, player contract and salary negotiations, dynamic ticket pricing, referee calls to improving fan experiences.
This latter topic, improving fan experiences, is one that has seen a boost in technology innovation such that data is more readily available for use in analytics. For example, newer NFL stadiums are wifi connected throughout so that fans can watch replays on their devices, tweet, and share selfies during the game. With mobile devices connected to the stadium’s wifi, franchises can drive revenue generating marketing campaigns to their home fan base throughout the game.
More important, however, is the need to keep the Millennial Generation interested in watching games live. In an article posted by TechRepublic, college students are more likely to leave a game during halftime if they are not able to connect to the internet or use social media. Teams need to keep fans in the stadiums so the goal needs to ensure the fan experience in a live venue matches what they can experience at home.
Innovation in advanced analytics and Big Data platforms such as Hadoop gives sports analysts the ability to access significant volumes of detailed data resulting in greater modeling accuracy. Streamlined data preparation tools speed the process from receiving raw data to delivering insight. Advanced analytics offered in the cloud as a service offers team owners and managers access to predictive analytics tools without having to manage and staff large data centers. Better visualization applications provide an effective way to communicate what the data means to those without a math degree.
When applying these innovations to new data sources while combining with advancements of analytics in sports, the results will be game changing far beyond what Billy Beane was able to accomplish with the Okland A’s.
Our congratulations to the winners of the top research papers submitted at the MIT Sloan Sports Analytics conference: Who Is Responsible For A Called Strike? and Counterpoints: Advanced Defensive Metrics for NBA Basketball. It will be interesting to see how these models will make an impact, with Spring Training and March Madness just around the corner. Maybe next year, we will see a submission on the dependencies of atmospheric conditions on football pressure and its impact on the NFL playoffs (PV=NRT) and get a data-driven explanation of Deflate Gate.
How do you know if you have found ‘true love’?
Biologists and psychologists tell us that when we are struck by cupid’s arrow, our body is reacting to a set of chemicals that are released in the brain that evoke emotions and feelings of lust, attraction and attachment. When those chemicals are released, our bodies respond. Our hearts race, blood pumps through our veins, faces flush, body temperatures rise. Some say it feels like electricity is conducting all over the skin. It releases a flood of emotions that may cloud our judgment and may even cause us to make a choice considered unreasonable to others. Sound familiar?
But what causes our brains to react to one person and not another? Are we predisposed to how certain people look or smell? Do our genes play a role in determining an affinity toward a body type or shape?
Pheromone research has shown how sensors in our nose can smell whether or not someone’s immune system compliments our own based on the scent of urine and sweat. Meaning, if someone has a similar immune deficiency, that individual won’t smell good to us. We are more likely to prefer the smell of someone who has an immune system that is different. Is our genetic code programming our instincts to preselect who we should mate with so our offspring has a higher chance of surviving?
It is probably not surprising that most men are attracted to women with symmetrical faces and hourglass figures. Genetic research hints that men’s predispositions are also based on a genetic code. There is a correlation between asymmetric facial characteristics and genetic disorders as well as between waist to hip ratios and fertility. Depending on where you are in your stage in life, these characteristics could have a weighting factor in how your brain responds to the smell of the perfect pheromone and how someone appears. And, some argue it is all influenced by body language, voice tone and actual words used in dialogue.
Psychologists report it takes only two to four minutes to decide if you are falling in love with someone. Even if you dismiss some or accept all of the possibilities I am presenting, experiencing love is impacted by a variety and intensity of senses, interpretations and emotions combined together in a short period of time. If you are a data nerd like myself, variety, volume and velocity of ‘signals’ begins to sound like a Big Data marketing pitch. This really is an application of predictive analytics using different data types, large volumes of data and real-time decision making algorithms. But, I’m actually more interested in how affective computing, wearable devices and analytics could help determine whether or not what you feel is actually ’true love’ or just a bad case of indigestion.
Affective computing, according to researcher Rosalind Picard, gives a computer the ability to recognize and express emotions, develop that ability and enable it to regulate and utilize emotions. When applied to wearable devices that can listen to how you talk, measure blood pressure, detect changes in heart and respiration rate and even measure electro-dermal responses, is it possible that technology could sense when your body is responding to the chemicals of love?
What about mood rings, you may ask? Mood rings, the original form of an affective wearable device that grew in popularity in the 1970s changed color based on your mood. Unfortunately, mood rings only change based on body temperature. Through data collection and research, researchers have shown that physiology patterns cannot be determined by body temperature alone. In order to truly differentiate emotion of, let’s say ‘true love,’ you need to be able to collect multiple physiological signals and detect a pattern using multi-variant pattern recognition algorithms. And, if you only have 2-4 minutes, it pretty much needs to calculate chances of ‘true love’ in real-time to prevent making a life-altering mistake.
The evolution of wearables technology has reached medical grade, allowing parents to detect when their children are about to have an epileptic seizure or are experiencing acute levels of stress. When tuned to love-seekers’ queues, is it possible that this same technology could send an audio or visual signal to your smart phone alerting you as to whether or not this person is a ‘true love’ candidate? Or glow red if you are in the proximity of someone who is experiencing similar physiological changes? Maybe this is the next application for match-making companies such as eHarmony or Match.com?
The reality is this. Assuming that the data is clean and accurate, safe from violating any data privacy concerns and truly connected to your physiological signals, wearable device technology that could detect close proximity of ‘true love’ is probably five years out. It is more likely to show up in a popular science fiction film than at an Apple store in the near term. But, when it does, think about how the signal on your smart phone device tells you the proximity of a potential candidate, where a local flower shop is, integrated with facial recognition and Facebook photos and ‘status’ (assuming it is true), with an iTunes recommendation of ‘Love Is In The Air’ by John Paul Young, ‘True Love’ is only 2-4 minutes away.
 R. Picard. Affective Computing. Pages 227-239, MIT Press, 2000
 Cacioppa and Tassinary (1990)
Data proliferation has traditionally been measured based on the number of copies data reside on different media. For example, if data residing on an enterprise storage device was backed up to tape, the proliferation was measured by the number of tapes the same piece of data would reside. Now that backups are no longer restricted to the data center and data is no longer constrained by the originating application, this definition is due for an update.
Data proliferation should be measured based on the number of users who have access to or can view the data and that data proliferation is a primary factor in measuring the risk of a data breach. My argument here is that as sensitive, confidential or private data proliferates beyond the original copy, it increases its surface area and proportionally increases its risk of a data breach.
Using the original definition of data proliferation and an example of data storage shown below, data proliferation would include production, production copies used for disaster recovery purposes and all physical backup copies. But as you can see, data is also copied to test environments for development purposes. When factoring in the number of privileged users with access to those copies, you have a different view of proliferation and potential risk.
In the example, there are potentially thousands of copies of sensitive data but only a small number of users who are authorized to access the data.
In the case of test and development, this image highlights a potentially high area of risk because the number of users who could see the sensitive data is high.
Similarly with online advertising, the measure of how many people see an online ad is called an impression. If an ad was seen by 100 online users, it would have 100 impressions.
When you apply that same principal to data security, you could say that data proliferation is a calculation of the number of copies of a data element multiplied by the potential number of users who could physically view the data, or in other words ‘impressions’. In this second image below, rather than considering the total number of copies, what if we measured risk based on the total number of impressions?
In this case, the measure of risk is independent of the physical media the data reside on. You could take this a few steps further and add a factor based on security controls in place to prevent unauthorized access.
This week, another reputable organization, Anthem Inc, reported it was ‘the target of a very sophisticated external cyber attack’. But rather than be upset at Anthem, I respect their responsible data breach reporting.
In this post from Joseph R. Swedish, President and CEO, Anthem, Inc., does something that I believe all CEO’s should do in this situation. He is straight up about what happened, what information was breached, actions they took to plug the security hole, and services available to those impacted.
When it comes to a data breach, the worst thing you can do is ignore it or hope it will go away. This was not the case with Anthem. Mr Swedish did the right thing and I appreciate it.
You only have one corporate reputation – and it is typically aligned with the CEO’s reputation. When the CEO talks about the details of a data breach and empathizes with those impacted, he establishes a dialogue based on transparency and accountability.
Research that tells us 44% of healthcare and pharmaceutical organizations experienced a breach in 2014. And we know that when personal information when combined with health information is worth more on the black market because the data can be used for insurance fraud. I expect more healthcare providers will be on the defensive this year and only hope that they follow Mr Swedish’s example when facing the music.
I understand that fighting for budget and time to implement analytics is a challenge with all the changes happening in healthcare (ICD-10, M&A, etc.). But hospitals using analytics to drive Value-based care are leading healthcare reform and setting a higher bar for quality of service. Value-based care promises quicker recoveries, fewer readmissions, lower infection rates, and fewer medical errors – something we all want as consumers.
In order to truly achieve value-based care, analytics is a must have. If you are looking for the business case or inspiration for the business driver, here are a few ideas:
- In surgery, do you have the data to show how many patients had lower complication rates and higher long-term survival rates? Do you have that data across the different surgical procedures you offer?
- Do you have data to benchmark your practice quality? How do you compare to other practices in terms of infection rates? Can you use that data to promote your services from a marketing perspective?
- Do you know how much a readmission is costing your hospital?
- From a finance perspective, have you adopted best practices from other industries with respect to supply-chain management or cost optimization strategies?
If you don’t have the expertise, there are plenty of consulting organizations who specialize in implementing analytics to provide insight to make the transition to value-based care and pricing.
We are always going to be facing limited budgets, the day will always have 24 hours in it, and organizations are constantly changing as new leaders take over with a different agenda. But one thing is certain; a decision without data is just someone’s opinion. In healthcare with only half of the executives making decisions based on analytics, maybe we should all be asking for a second opinion – and one based on data.