Tag Archives: Data Services
Working on IDS
Informatica Data Services (IDS) is a data virtualization software stack built on a new platform that was created from the various shared internals of the venerable PowerCenter. One of the significant reasons for the shift into the new platform (code named ‘Mercury’) was to make it easy to quickly write new ‘plug-in’s for the integration platform and improve the time to market for new business initiatives with scalability and extensibility; one of these initiatives was IDS.
The IDS suite is a set of “end-points” within the Data Integration Service (DIS) of Mercury. The DIS, which does resource and engine management, works with the Model Repository Service (MRS) for persistence on the Informatica Services Platform. Old hands at PowerCenter will recognize that the DIS and MRS are counterparts of PowerCenter’s LM (Load Manager) and the C++ Repository Service (CRS).
IDS allows users to access Informatica data objects, data sources, and mapping operations via standard interfaces such as SQL, JDBC/ODBC and Web Services. I had the opportunity to design and engineer various parts of the new platform in the DIS and the SQL Data Service, colloquially termed as SQL End-point (SQLEP). The notion of exposing traditional Informatica PowerCenter as a SQL consumable table brings in a litany of benefits like homogenous view of disparate sources and targets, rapid sharing and prototyping with complex data quality and cleansing transformations, lower development and maintenance costs, seamless analytical integration with a variety of tools supporting JDBC/ODBC, to name a few.
Through my journey of the development of SQLEP, I have had the opportunity to deal with a variety of technical challenges, some more interesting than others. SQLEP’s initial intended use-case was rich transformation support for data aggregation to be quickly consumed in a variety of BI tools. An enterprise could possess data across any number of data stores and appliances, and SQLEP would provide a homogenous and holistic view by utilizing the well-known and well-adopted Informatica Mapping Language (IML). SQLEP is responsible for the maintenance and book-keeping of client requests along with the translation and validation of the SQL query to an equivalent Informatica Mapping.
Our engine, the DataTransformation Machine (DTM), is written in C++ and only works on the IML, while the DIS, which manages the DTM as a resource within its own process, is written in Java. Initially, the communication and the serialization and deserialization (SerDe) of objects between the engine and DIS used JNI. This worked quite well, especially in terms of throughput and performance but one problem with housing the DTM within the DIS process was that a catastrophic failure in the execution of any mapping would terminate the DIS process along with all the other DTM runs. A secondary problem was that of unaddressed memory leaks which would result in a practically unusable DIS at some point.
This lead to the development of a new execution paradigm within Mercury, termed Out-of-Process (OOP), where we housed each DTM in its own process “outside of” the DIS. This was great for isolation and memory leaks (because the new process exited after the mapping executed) but wait, we were spawning a new process for each run! And yes, it suffered tremendously due to high latency compared to the JNI version, the “in-process (IP)” model. I was tasked with the rewrite of OOP in order to achieve a) better latency b) preserve concurrency and affinity and c) provide some amount of isolation from the DIS process. Essentially, get everything that IP offers without the drawbacks. We termed it OOP++!
Request Isolation while Maintaining Throughput
We addressed the problems with a multi-pronged approach: we overcame the issue of latency by having a configurable number of processes start-up along with the DIS to overcome any initial bursts of requests. The concept of ‘process-sunset’ (a configurable amount of time before a process was retired) and ‘process-affinity’ (mappings belonging to a specific application) was introduced. Process-sunset ensured that any unaddressed resource leaks in the process were eventually cleaned out, while process-affinity allowed multiple concurrent executions within one process provided they had the same application affinity. This improved latency by reducing, and delaying, subsequent spawn of new processes and also prevented a catastrophic failure in one mapping from affecting other clients.
Fast-forward a bit, Informatica’s Information Lifecycle Management (ILM) wanted to leverage SQLEP for replication, but replication belonged to the other end of the spectrum of aggregation, which is that they want the entire data loaded. This called for a new task for me: improving high-volume throughput.
At first glance, many areas for improvement were revealed; for one, we were transmitting everything back in verbose XML. We quickly altered it to utilize a faster implementation, termed Fast-Infoset XML, which while better, was still verbose. We then went all out with binary SerDe, which yielded substantially better results. However, it was still not enough, and there was a ~30% overhead compared to the IP counterpart. A colleague and I were puzzled at the outcome, and started to analyze a sample request very closely, and we discovered that the communication was no longer the bottleneck. The problem seemed to be in the DIS, where another layer of inefficient and extraneous SerDe was being performed. Within a matter of days, we were able to alter that and have another test run; the results soared! Well, OK, they didn’t soar, but they were within ~8-10% of the IP counterpart. But the fact that we caught the offending piece of code in a completely different area was very satisfying.
Among those changes: Marketing will take a bigger role in customer experience, directly impacting an organization’s competitiveness. If marketing is driving customer experience, they will need the right technology – with IT as a partner for success.
The trouble is: That partnership is lagging. A 2013 Accenture study found that 90% of CIOs and CMOs do not believe collaboration between their areas is sufficient. It’s time for IT to rapidly align with marketing.
What can IT expect from the marketing side of the shop in the next five years? Three things emerge: The rise of the customer experience, the increase in marketing analytics, and personalized customer messaging.
Put customer experience first. If, as Gartner predicts, all organizations will soon compete primarily on customer experience, marketing needs to align with IT to ensure competitiveness and growth. In one Gartner survey, 50% of respondents said marketing controlled the biggest chunk of the customer experience budget. Driving success by exceeding customer expectations requires a single view of customers and valid data. Many marketing technology tools are available to provide a great customer journey, and IT can find value in understanding and suggesting these tools.
Marketing is more analytical than ever. The perception that marketing dollars don’t yield measurable results is in the past. Technology tools create a wealth of ‘ah-hah’ moments. Marketers question “Why?” at every move. Marketers can measure their activities, so instead of a murky budget with unknown ROI, marketers try new things, measure them, see what is effective, and invest where things work. The reason is technology. Marketers are moving away from branding and creative activities to be data driven. When asked what skills they need to develop, marketers put ‘advertising/branding’ and ‘creative/graphic arts’ at the bottom of the list in a 2015 survey from The Economist.
Messaging is more personalized and segmented. Organizations put a premium on the data they have, but what about the data they could have? Enriching customer profiles with third-party data and marrying that with interactions allows organizations to personalize the experience. Marketing leaders use sophisticated personalization tools for a one-to-one conversation. This drives customer messaging and engagement, while eliminating a messaging strategy based on guesswork. Without this kind of targeting, enabled by technology and accurate data, organizations can’t stay competitive.
The common thread with all of these marketing trends and the need for IT to help achieve them comes back to data. IT can relate to marketing activities through the data they acquire and retain, and soon both the marketing and IT areas will find alignment by being highly focused on the quality of their data.
Co-authored by Thomas Brence, Director of Product Marketing at Informatica.
Security technologies that focus on securing the network and perimeter require additional safeguards when sensitive and confidential data traverse beyond these protective controls. Data proliferates to cloud-based applications and mobile devices. Application security and identity access management tools may lack visibility and granular control when data is replicated to Big Data and advanced analytics platforms.
Informatica is filling this need with its data-centric security portfolio, which now includes Secure@Source. Informatica Secure@Source is the industry’s first data security intelligence solution that delivers insight into where sensitive and confidential data reside, as well as the data’s risk profile.
To hear more from Informatica and an esteemed panel of security experts about where the future of security is going and why Informatica Secure@Source is an ideal solution for the data challenges around security, please register here. Panelist include:
- Security Industry leader Anil Chakravarthy, CPO and EVP Informatica and myself, Amit Walia, GM and SVP Informatica
- Luminaries Larry Ponemon, Founder Ponemon Institute and Jeff Northrop, CTO IAPP
- CISOs Bill Burns, Informatica and Arnold Federbaum, Former CISOs and CyberSecurity Professor NYU
- Enterprise Security Architect, Linda Hewlett, Santander Holdings USA.
The opportunity for Data Security Intelligence is extensive. In a recently published report, Neuralytix defined Data-Centric Security as “an approach to security that focuses on the data itself; to cover the gaps of traditional network, host and application security solutions.” A critical element for successful data security is collecting intelligence required to prioritize where to focus security controls and efforts that mitigate risk. This is precisely what Informatica Secure@Source was designed to achieve.
What has emerged from a predominantly manual practice, the data security intelligence software market is expected to reach $800M by 2018 with a CAGR of 27.8%. We are excited about this opportunity! As a leader in data management software, we are uniquely qualified to take an active role in shaping this emerging market category.
Informatica Secure@Source addresses the need to get smarter about where our sensitive and private data reside, who is accessing it, prioritize which controls to implement, and work harmoniously with existing security architectures, policies and procedures. Our customers are asking us for data security intelligence, the industry deserves it. With more than 60% of security professionals stating their biggest challenge is not knowing where their sensitive and confidential data reside, the need for Data Security Intelligence has never been greater
Neuralytix says “data security is about protecting individual data objects that traverse across networks, in and out of a public or private cloud, from source applications to targets such as partner systems, to back office SaaS applications to data warehouses and analytics platforms”. We couldn’t agree more. We believe that the best way to incorporate a data-centric security approach is to begin with data security intelligence.
 “The State of Data Centric Security,” Ponemon Institute, sponsored by Informatica, June 2014
Informatica recently released the findings of a survey (entitled “Data is Holding You Back from Analytics Success”) in which respondents revealed that 85% of are effective at putting financial data to use to inform decision making. However, it also discovered that many are less confident about putting data to use to inform patient engagement initiatives that require access to external data and big data, which they note to be more challenging.
The idea is that data unto itself does not carry that much value. For example, I’ve been gathering data with my fitbit for over 90 days. A use of that data could be to looking at patterns that might indicate I’m more likely to have heart attack. However, this can only be determined if we compare my data with external historical patient data that exists in a large analytical database (big data).
The external data provides the known patterns that lead to known outcomes. Thus, when compared with my data, predictive analytics can occur. In other words, we can use data integration as a way to mash up and analyze the data so it has more meaning and value. In this case, perhaps having me avoid a future heart attack.
Inter-organizational transformational business processes require information sharing between data sources, and yet, according to the Informatica Survey, over 65% of respondents say data integration and data quality are significantly challenging. Thus, healthcare providers collect data, but many have yet to integrate these data silos to realize its full potential.
Indeed, the International Institute of Analytics, offered a view of the healthcare analytics maturity by looking at more than 20 healthcare provider organizations. The study validated the fact that, while healthcare providers are indeed gathering the EMR data, they are not acting upon the data in meaningful ways.
The core problem is a lack of understanding of the value that this data can bring. Or, perhaps the lack of a budget for the right technology. Much as my Fitbit could help me prevent a future heart attack by tracking my activity data, healthcare providers can use their data to become more proactive around health issues.
Better utilization of this data will reduce costs by leveraging predictive analytics to take more preventative measures. For instance, automatically culling through the family tree of a patient to determine risks for cancer, heart disease, etc., and automatically scheduling specific kinds of tests that are not normally given unless the patient is symptomatic.
Of course, putting your data to work is not free. It’s going to take some level of effort to create strategies, and acquire and deploy data integration technology. However, the benefits are easy to define, thus the business case is easy to create as well.
For myself, I’ll keep gathering data. Hopefully it will have some use, someday.
What a day! This should go down as one of the best days in this entire year.
At the sold out Informatica World 2015 All Things Data conference today, Sohaib Abbasi, CEO of Informatica outlined the move to the “age of engagement”. The main highlight for the MDMer in me was the announcement of Social360 for Internet of Master Data.
Informatica MDM now offers Social360 to view master data, relationships and associated interactions, including social media feeds and media mentions. By matching transactions with social relationships, Informatica MDM can identify the top influencers among the most valuable customers. Designed to run natively on Hadoop, Informatica Big Data Relationship Management identifies household and social relationships across billions of records representing millions of people.
Suresh Menon – VP of Product Management at Informatica opened the MDM Day sessions with a one liner – “we make great data ready to use”.
Companies are trying master the digital transformation successfully. In order to accelerate this transformation, Suresh talked about 3 key aspects that will drive the new age –
New Age of Relationships: Relations between party, places, things, household, family and their interactions in a socially rich world will continue in fast pace.
New Age of Bimodal Governance: Information governance has become a democratic exercise that works both top-down and buttom-up. Bimodal is an ENABLER for business and IT, managing the way how we colloborate and how processes are exectuted to do the right things right.
New Age of Trusted Master Data Fueled Apps: We are entering a new era of master data fuled applications which deliver clean, consistent data. For example, Product 360 for retail, Total supplier relationship for supply chain optimization, Big Data Realtionship Management to address use cases such as effective campaign management to increase your marketing ROI. These apps run on Informatica’s powerful multidomain platform to deliver wide range of use case and industry specific solutions.
Following this was Devon Energy and Noah Consulting keynote where Devon shared how they could deliver fast, accurate Big Data to make smart and quick decisions. Devon uses Informatica MDM and Big Data solutions to provide authoritative and trusted data as it relates to wells, suppliers, and other key master and reference data. This allows them to build data pipelines in Hadoop that transform and prepare data for Big Data analytics.
I attended a number of sessions that were part of MDM Day agenda. That include –
- Symantec & EMC using Informatica MDM and Informatica Data Quality to deliver trusted customer views.
- eBay one of the world’s largest online marketplaces, led a financial transformation with data to maximize the value of finance and drive functional effectiveness.
- A large global company utilizing MDM to enable successful global workday implementation for HR across 60 countries with 100+ HR applications with varying level of data governance across 275 companies.
- GE Aviation shared their new approach to scaling MDM to the enterprise, architecture and concepts on data lakes, and how MDM plays into their large data volumes.
A great day overall with a lot of discussion on topics thar are near and dear to me. I also got a chance to sit (and have a drink) with some amazing folks who focus on MDM, data quality and data governance.
With that I will sign-off as the clock hits the midnight. I will bring you more news from Informatica World, specifically the Information Quality and Governance track sessions happening tomorrow. Stay tuned and as always keep an eye on @MDMGeek on twitter.
Enabling ISVs to Connect to More Data
Data is critical to application growth. Bringing additional data into your application is costly, and time spent on point-to-point integration takes time away from introducing new features.
Today, Informatica is releasing the Informatica Technology Partner Network (TPN) – an online developer portal designed to build a connector that makes it easy for Independent Software Vendors (ISVs) to access more application data. The Technology Partner Network provides ISVs with everything they need to fast-track cloud and hybrid connectivity with Informatica, including access to the following:
• Informatica development environment and connector toolkit
• Interactive REST API, instant API mock server and automated testing
• Technical resources, samples and adapter tester
• Developer community forum
The TPN provides developers with a development environment to load a connector toolkit (SDK) and immediately begin building their connector. The open and interactive REST API provides a space to learn, share and experience functionality without writing any code. A debugging proxy provides more detail on the request and response of the API call and can point to a mock server. These tools enable ISVs to build a prototype in a day and complete their connector development in just a couple of weeks.
The Informatica Vibe™ platform – a virtual data machine (VDM) provides the underlying data management engine that allows ISVs to transforms application data. Created exclusively for Independent Software Vendors (ISVs), Informatica is introducing the Vibe Ready Partner Program.
ISVs who join the Informatica Vibe Ready Partner Program gain access – at no cost – to the following:
• Pre-built Informatica connectors, mappings and end-user starter kit
• Informatica cloud sandbox multi-user instance
• Informatica software development not-for-resale (NFR) license Pack
• Connector developer support and certification
• Vibe Ready partner and certification logos
• 1-click activation of the connector on the Informatica Marketplace
ISVs that complete the Vibe Ready Certification can provide their customers with a 1-click trial or paid edition on the Informatica Marketplace. The Informatica Marketplace enables customers to search by application, connector or bundle. The Vibe Ready logo provides a simple way for customers to identify solutions that Informatica has certified.
Enabling a successful ISV ecosystem around the Vibe platform is a cornerstone of our business strategy. The Technology Partner Network and Informatica Vibe Ready Partner Program will enable our ISVs to make their data clean, connected and safe.
Here are some additional resources to get started developing with Informatica:
• Explore the Technology Partner Network
• Register as an Informatica Vibe Ready Partner Program
• Technology Partner Network questions? Email us
This week we kick off our 16th Informatica World – the conference about All Things Data. The 2015 show is literally bigger and better than ever, with a record number of attendees, record number of breakout sessions, record number of sponsors and beyond. I’m proud to report that Informatica World 2015 is sold out! This clearly shows me that our data message is striking a cord with customers, prospects, partners and beyond.
One of the aspects that I’m most excited about at the show is unveiling the Data Ready Enterprise campaign.
Read more about Data Ready Enterprise campaign here – www.informatica.com/ready.
In my keynote, I’ll be sharing some thoughts regarding how “great data” can help you advance your career by enabling you to make substantial improvements to your company’s business processes and operations. We will be discussing why it’s critical for you – as a data integration or data management professional – to ensure that, with great data, your company is ready for everything.
Today’s business imperatives of mobility, cloud computing and big data analytics are driven by data from multiple sources – both inside and outside the four walls of your enterprise.
Data is the fuel powering your business processes, machines, sensors and customized user experiences.
As a result, your enterprise today is confronted with an unparalleled opportunity to put massive amounts of data, structured and unstructured, to work in optimizing your business. You can streamline business processes through consolidated applications, build efficiencies through real-time collaboration and decision-making, and improve customer relationships and your total customer experience through data-enabled interactions and 24/7 support.
To succeed as a Data Ready Enterprise, you are faced with the imperative of putting massive amounts of both structured and unstructured data to work quickly and effectively. But not just any old data. You need great data.
But what do I mean by this? What is “great data” all about? What is so important about the term “great”?
Simply put, great data is clean, connected and safe. Great data can make an executive a success, and bad data can ruin an executive, literally.
You’re either smarter about what’s happening with your competition, smarter about what’s happening with your business, smarter about what’s happening in real-time with your portfolio, and in your go-to-market strategy vis-à-vis your margins, et cetera — or you’re guessing.
Worse yet, if you have bad data you’re bound to make horrible decisions – no matter how big the data is. The bigger the crummy data, the crummier the big decisions.
If the data’s not connected, if it’s not up to date, if it’s not connected with the systems that actually make for a complete answer, the data can’t be great.
This is why you absolutely must have great data that’s ready to use – and ready for everything – to be a Data Ready Enterprise.
Data Ready Enterprises today will be Decision Ready because they are able to take advantage of data analytics. They will be Customer Ready and prepared to comprehensively manage their total customer relationships. They will by Application Ready through application consolidation and optimization – making sure the right applications access the right data at the right time. They will be Cloud Ready and able to accelerate the transition to the real-time data driven collaboration. And, not least, they will be Regulation Ready to lift the burden of compliance with industry-specific and other regulations.
Every organization has its own unique data signature with the potential to build smarter systems, more intuitive services and better products. Our aim at Informatica is to empower organizations by delivering them great data that is ready for everything — enabling them to be ready in the ways that matter most.
At Informatica, we are dedicated to making data ready to use. We enable great data. Data that’s connected, clean, safe and intelligent. These are the four business-critical Data Ready Enterprise benefits delivered by our Intelligent Data Platform. We look forward to discussing at Informatica World 2015, and beyond, how our Data Ready approach can help you and your company to succeed.
Check out how you and your organization can become “Data Ready” here – www.informatica.com/ready.
There are lots of really fascinating applications coming out the big data space as of late, and I recently came across one that really may be the coolest of the coolest. There’s a UK-based firm that is employing big data to help predict earthquakes.
Unfortunately, predicting earthquakes thus far has been almost impossible. Imagine if people living in an earthquake zone could get at least several hours’ notice, maybe even several days, just as those in the paths of hurricanes can advanced warning and can flee or prepare. Hurricane and storm modeling is one of the earliest examples of big data in action, going back decades. The big data revolution may now be on the verge of earthquake prediction modeling as well.
Bernard Marr, in a recent Forbes post, explains how Terra Seismic employs satellite data to sense impending shakers:
“The systems use data from US, European and Asian satellite services, as well as ground based instruments, to measure abnormalities in the atmosphere caused by the release of energy and the release of gases, which are often detectable well before the physical quake happens. Large volumes of satellite data are taken each day from regions where seismic activity is ongoing or seems imminent. Custom algorithms analyze the satellite images and sensor data to extrapolate risk, based on historical facts of which combinations of circumstances have previously led to dangerous quakes.”
So far, Marr reports, Terra Seismic has been able to predict major earthquakes anywhere in the world with 90% accuracy. Among them is a prediction, issued on February 22nd, that a 6.5-magnitude quake would hit the Indonesian island of Sumatra. The island was hit by a 6.4-magnitude quake on March 3rd.
There’s no question that the ability to accurately forecast earthquakes – at least as closely as hurricanes and major blizzards can be predicted – will not only save many human lives, but also be invaluable to government agencies and businesses as well.
At the same time, such creative – and potentially and game-changing – applications of big data provide very graphic examples of how data is converted to insights that were never possible before. Many business leaders are looking for ways to shine a light on potential events within their organizations and markets, and examples such as Terra Seismic accentuate the positive benefits big data can deliver.
Terra Seismic’s forecasts are available through a public website: http://quakehunters.com/
Everyone nods deferentially when that old adage is uttered about people being the most valuable asset in the business. Being one myself, I’d like to agree. The problem is I don’t think it’s the case anymore. In many recent conversations with business leaders, there’s growing agreement that ‘people’ have been knocked off the ‘number one importance’ pedestal – the most valuable asset any business has is now its data. The Chief Data Officer (CDO) is in the ascendant.
The rise of the CDO highlights the ever-sharpening focus that organisations, particularly in the Financial Services sector, are placing on their data. CDOs are on the increase. There is wide recognition that data is indeed an asset. CDOs are redefining ways in which the organisation drives itself, its relationships with the outside world, its products, services and future success. A recent Capgemini Report[i] suggests that ‘FS firms need to become information-centric enterprises’ not just because of proliferating regulatory reporting demands, but also due to ‘unparalleled competition for customer assets and allegiance.’
Given that data is an asset, it must be treated like one. It has to be protected, maintained, and sustained. Adhering to these disciplines will increase its value and produce the return for the business that well curated and managed data is eminently capable of doing. The only challenge then to be overcome is how quickly the business can unlock the return. The CDO has control of the most valuable asset in the business. At Informatica we believe that he or she will only be in control if a governance and management ecosystem is built on three fundamental pillars:
- Securing the data: This is about more than simply the value of the information asset, and the statutes and legal responsibilities that surround it – it’s about the immeasurable damage to an organisation that can come about if its data is accessed by outsiders;
- Knowing where the data is: It sounds like an obvious requirement but I encounter many organisations that cannot locate all their data, all the time. It has often been accumulated by different departments in numerous locations and divergent formats. It is often widely dispersed with no central view or control because it has grown organically. As a result it is under-utilised. Retrieving it wastes time and money and creates inefficiencies;
- Refining it: Data needs careful and diligent attention to ensure it is always up-to-date and relevant. It needs always to tell the full story. Users must have confidence and trust in the data – knowing they can simply draw down on it and put it to profitable use, in the knowledge that it is optimal, constantly. Like many other assets, data is a raw material which must be refined to be of value.
In a world where technology simplification is the right of passage for kings, how will CDOs rise above the status quo and seize this opportunity with a firm grasp? The question is, how evolved are your foundational pillars?
I mentioned that the CDO is in the ascendancy. Watch this space. I predict the rise in significance of the role within firms is going to be stratospheric. It will rise above all others as organisations recognise that they are data-driven, data-defined, data-centric and ultimately data-dependant. I see no valid reason why CDOs will not only be at the top table but will get the comfiest seat alongside their CEO.
In my last blog, The New Insurance Model, I argued the case for a new design point for company systems and capabilities, where IT architecture should be wrapped around Data First principles. I suggest now that this approach needs to go further, deeper, and wider. The winning FS firm of the future will be one that adopts a Data First Business Architecture – full recognition that the business thrives on the basis of its data. The CDO has arrived, and long may he or she reign.
[i] The Role of the Chief Data Officer in Financial Services
Original article is posted at techcrunch.com
It’s probably no surprise to the security professional community that once again, identity theft is among the IRS’s Dirty Dozen tax scams. Criminals use stolen Social Security numbers and other personally identifiable information to file tax claims illegally, deposit the tax refunds to rechargeable debit cards, and vanish before the average citizen gets around to filing.
Since the IRS began publishing its “Dirty Dozen” list to alert filers of the worst tax scams, identity theft has continually topped the list since 2011. In 2012, the IRS implemented a preventive measure to catch fraud prior to actually issuing refunds, and issued more than 2,400 enforcement actions against identity thieves. With an aggressive campaign to fight identity theft, the IRS saved over $1.4 billion in 2011 and over $63 billion since October 2014.
That’s great progress – but given that of the 117 million tax payers who filed electronically in 2014, 80 million received on average $2,851 directly deposited into their bank, which is more than $229 billion changing hands electronically. The pessimist in me has to believe that cyber criminals are already plotting how to nab more Social Security numbers and e-filing logins to tap into that big pot of gold.
So where are criminals getting the data to begin with? Any organization that has employees and a human resources department collects and possibly stores Social Security numbers, birthdays, addresses and income either on-premises or in a cloud HR application. This information is everything a criminal would need to fraudulently file taxes. Any time a common business process is digitally transformed, or moved to the cloud, the potential risk of exposure increases.
As the healthcare industry transforms to electronic health records and patient records, another abundant source of Social Security numbers and personally identifiable information increases the surface area of opportunity. When you look at the abundance of Social Security numbers stolen in major data breaches, such as the case with Anthem, you start to connect the dots.
One of my favorite dynamic infographics comes from the website Information is Beautiful entitled, ‘World’s Biggest Data Breaches.’ When you filter the data based on number of records versus sensitivity, the size of the bubbles indicate the severity. Even though the sensitivity score appears to be somewhat arbitrary, it does provide one way to assess the severity based on the type of information that was breached:
|Just email address/online information||1|
|Credit card information||300|
|Email password/health records||4000|
|Full bank account details||50000|
What would be an interesting addition is how many records were sold on the black market that resulted in tax or insurance fraud.
Cyber-security expert Brian Krebs, who was personally impacted by a criminal tax return filing last year, says we will likely see “more phony tax refund claims than last year.” With credentials for TurboTax and H&R Block marketed on black market websites for about 4 cents per identity, it is hard to disagree.
The Ponemon Institute published a survey last year, entitled The State of Data Centric Security. One research finding that sticks out is when security professionals were asked what keeps them up at night, and more than 50 percent said “not knowing where sensitive and confidential data reside.” As we enter full swing into tax season, what should security professionals be thinking about?
Data Security Intelligence promises to be the next big thing that provides a more automated and data-centric view into sensitive data discovery, classification and risk assessment. If you don’t know where the data is or its risk, how can you protect it? Maybe with a little more insight, we can at least reduce the surface area of exposed sensitive data.