Category Archives: Data Services
You probably know this already, but I’m going to say it anyway: It’s time you changed your infrastructure. I say this because most companies are still running infrastructure optimized for ERP, CRM and other transactional systems. That’s all well and good for running IT-intensive, back-office tasks. Unfortunately, this sort of infrastructure isn’t great for today’s business imperatives of mobility, cloud computing and Big Data analytics.
Virtually all of these imperatives are fueled by information gleaned from potentially dozens of sources to reveal our users’ and customers’ activities, relationships and likes. Forward-thinking companies are using such data to find new customers, retain existing ones and increase their market share. The trick lies in translating all this disparate data into useful meaning. And to do that, IT needs to move beyond focusing solely on transactions, and instead shine a light on the interactions that matter to their customers, their products and their business processes.
They need what we at Informatica call a “Data First” perspective. You can check out my first blog first about being Data First here.
A Data First POV changes everything from product development, to business processes, to how IT organizes itself and —most especially — the impact IT has on your company’s business. That’s because cloud computing, Big Data and mobile app development shift IT’s responsibilities away from running and administering equipment, onto aggregating, organizing and improving myriad data types pulled in from internal and external databases, online posts and public sources. And that shift makes IT a more-empowering force for business change. Think about it: The ability to connect and relate the dots across data from multiple sources finally gives you real power to improve entire business processes, departments and organizations.
I like to say that the role of IT is now “big I, little t,” with that lowercase “t” representing both technology and transactions. But that role requires a new set of priorities. They are:
- Think about information infrastructure first and application infrastructure second.
- Create great data by design. Architect for connectivity, cleanliness and security. Check out the eBook Data Integration for Dummies.
- Optimize for speed and ease of use – SaaS and mobile applications change often. Click here to try Informatica Cloud for free for 30 days.
- Make data a team sport. Get tools into your users’ hands so they can prepare and interact with it.
I never said this would be easy, and there’s no blueprint for how to go about doing it. Still, I recognize that a little guidance will be helpful. In a few weeks, Informatica’s CIO Eric Johnson and I will talk about how we at Informatica practice what we preach.
Configuring your Oracle environment for using PowerExchange CDC can be challenging, but there are some best practices you can follow that will greatly simplify the process. There are two major factors to consider when approaching this: latency requirements for your data and the ability to restart your environment.
Data Latency Requirements
The first factor that will effect latency of your data is the location of your PowerExchange CDC installation. From a best practice perspective, it is optimal to install the PowerExchange Listener on the source database server as this eliminates the need to pass data across the network and will provide the smallest amount of latency from source to target.
The volume of data that PowerExchange CDC has to process can also have a significant impact on performance. There are several items in addition to the changed data that can effect performance, including, but are not limited to, Oracle catalog dumps, Oracle workload monitor customizations and other non-Oracle tools that use the redo logs. You should conduct a review of all the processes that access Oracle redo logs, and make an effort to minimize them in terms both volume and frequency. For example, you could monitor the redo log switches and the creation of archived log files to see how busy the source database is. The size of your production archive logs and knowing how often they are being created will provide the information necessary to properly configure PowerExchange CDC.
Environment Restart Ability
When certain changes are made to the source database environment, the PowerExchange CDC process will need to be stopped and restarted. The amount of time this restart takes should be considered whenever this needs to occur. PowerExchange CDC must be restarted when any of the following changes occur:
- A change is made to the schema or a table that is part of the CDC process
- An existing Capture Registration is changed
- A change is made to the PowerExchange configuration files
- An Oracle patch is applied
- An Operating System patch or upgrade is applied
- A PowerExchange version upgrade or service pack is applied
If using the CDC with LogMiner, a copy of the Oracle catalog must be placed on the archive log in order to function properly. The frequency of these copies is site-specific and will have an impact on the amount of time it will take to restart the CDC process.
Once your PowerExchange CDC process is in production, any changes to the environment must have extensive impact analysis performed to ensure the integrity of the data and the transactions remains intact upon restart. Understanding the configurable parameters in the PowerExchange configuration files that will assist restart performance is of the utmost importance.
Even with the challenges presented when configuring PowerExchange CDC for Oracle, there are trusted and proven methods that can significantly improve your ability to complete this process and have real time or near real time access to your data. At SSG, we’re committed to always utilizing best practice methodology with our PowerExchange Baseline Deployments. In addition, we provide in-depth knowledge transfer to set end users up with a solid foundation for optimizing PowerExchange functionality.
Visit the Informatica Marketplace to learn more about SSG’s Baseline Deployment offerings.
How are they accomplishing this? A new generation of hackers has learned to reverse engineer popular software programs (e.g. Windows, Outlook Java, etc.) in order to find so called “holes”. Once those holes are exploited, the hackers develop “bugs” that infiltrate computer systems, search for sensitive data and return it to the bad guys. These bugs are then sold in the black market to the highest bidder. When successful, these hackers can wreak havoc across the globe.
I recently read a Time Magazine article titled “World War Zero: How Hackers Fight to Steal Your Secrets.” The article discussed a new generation of software companies made up of former hackers. These firms help other software companies by identifying potential security holes, before they can be used in malicious exploits.
This constant battle between good (data and software security firms) and bad (smart, young, programmers looking to make a quick/big buck) is happening every day. Unfortunately, the average consumer (you and I) are the innocent victims of this crazy and costly war. As a consumer in today’s digital and data-centric age, I worry when I see these headlines of ongoing data breaches from the Targets of the world to my local bank down the street. I wonder not “if” but “when” I will become the next victim. According to the Ponemon institute, the average cost to a company was $3.5 million in US dollars and 15 percent more than what it cost last year.
As a 20 year software industry veteran, I’ve worked with many firms across global financial services industry. As a result, my concerned about data security exceed those of the average consumer. Here are the reasons for this:
- Everything is Digital: I remember the days when ATM machines were introduced, eliminating the need to wait in long teller lines. Nowadays, most of what we do with our financial institutions is digital and online whether on our mobile devices to desktop browsers. As such every interaction and transaction is creating sensitive data that gets disbursed across tens, hundreds, sometimes thousands of databases and systems in these firms.
- The Big Data Phenomenon: I’m not talking about sexy next generation analytic applications that promise to provide the best answer to run your business. What I am talking about is the volume of data that is being generated and collected from the countless number of computer systems (on-premise and in the cloud) that run today’s global financial services industry.
- Increase use of Off-Shore and On-Shore Development: Outsourcing technology projects to offshore development firms has be leverage off shore development partners to offset their operational and technology costs. With new technology initiatives.
Now here is the hard part. Given these trends and heightened threats, do the companies I do business with know where the data resides that they need to protect? How do they actually protect sensitive data when using it to support new IT projects both in-house or by off-shore development partners? You’d be amazed what the truth is.
According to the recent Ponemon Institute study “State of Data Centric Security” that surveyed 1,587 Global IT and IT security practitioners in 16 countries:
- Only 16 percent of the respondents believe they know where all sensitive structured data is located and a very small percentage (7 percent) know where unstructured data resides.
- Fifty-seven percent of respondents say not knowing where the organization’s sensitive or confidential data is located keeps them up at night.
- Only 19 percent say their organizations use centralized access control management and entitlements and 14 percent use file system and access audits.
Even worse, those surveyed said that not knowing where sensitive and confidential information resides is a serious threat and the percentage of respondents who believe it is a high priority in their organizations. Seventy-nine percent of respondents agree it is a significant security risk facing their organizations. But a much smaller percentage (51 percent) believes that securing and/or protecting data is a high priority in their organizations.
I don’t know about you but this is alarming and worrisome to me. I think I am ready to reach out to my banker and my local retailer and let him know about my concerns and make sure they ask and communicate my concerns to the top of their organization. In today’s globally and socially connected world, news travels fast and given how hard it is to build trustful customer relationships, one would think every business from the local mall to Wall St should be asking if they are doing what they need to identify and protect their number one digital asset – Their data.
“Enterprise Architecture needs to be the forward, business facing component of IT. Architects need to create a regular structure for IT based on the service and product line functions/capabilities. They need to be connected to their business counterparts. They need to be so tied to the product and service road map that they can tie changes directly to the IT roadmap. Often times, I like to pair a Chief Business Strategist with a Chief Enterprise Architect”.
To get there, Enterprise Architects are going to have to think differently about enterprise architecture. Specifically, they need think “data first” to break through the productivity barrier and deliver business value in the time frame that business requires it.
IT is Not Meeting the Needs of the Business
A study by McKinsey and Company has found that IT is not delivering in the time frame that business requires. Even worse, the performance ratings have been dropping over the past three years. And even worse than that, 20% of the survey respondents are calling for a change in IT leadership.
Our talks with CIOs and Enterprise Architects tell us that the ability to access, manage and deliver data on a timely basis is the biggest bottleneck in the process of delivering business initiatives. Gartner predicts that by 2018, more than half the cost of implementing new large systems will be spent on integration.
The Causes: It’s Only Going to Get Worse
Data needs to be easily discoverable and sharable across multiple uses. Today’s application-centric architectures do not provide that flexibility. This means any new business initiative is going to be slowed by issues relating to finding, accessing, and managing data. Some of the causes of problems will include:
- Data Silos: Decades of applications-focused architecture have left us with unconnected “silos of data.”
- Lack of Data Management Standards: The fact is that most organizations do not manage data as a single system. This means that they are dealing with a classic “spaghetti diagram” of data integration and data management technologies that are difficult to manage and change.
- Growth of Data Complexity: There is a coming explosion of data complexity: partner data, social data, mobile data, big data, Internet of Things data.
- Growth of Data Users: There is also a coming explosion of new data users, who will be looking to self-service.
- Increasing Technology Disruption: Gartner predicts that we are entering a period of increased technology disruption.
Looking forward, organizations are increasingly running on the same few enterprise applications and those applications are rapidly commoditizing. The point is that there is little competitive differentiation to be had from applications. The only meaningful and sustainable competitive differentiation will come from your data and how you use it.
Recommendations for Enterprise Architects
- Think “data first” to accelerate business value delivery and to drive data as your competitive advantage. Designing data as a sharable resource will dramatically accelerate your organization’s ability to produce useful insights and deliver business initiatives.
- Think about enterprise data management as a single system. It should not be a series of one-off, custom, “works of art.” You will reduce complexity, save money, and most importantly speed the delivery of business initiatives.
- Design your data architecture for speed first. Do not buy into the belief that you must accept trade-offs between speed, cost, or quality. It can be done, but you have to design your enterprise data architecture to accomplish that goal from the start.
- Design to know everything about your data. Specifically, gather and carefully manage all relevant metadata. It will speed up data discovery, reduce errors, and provide critical business context. A full compliment of business and technical metadata will enable recommendation #5.
- Design for machine-learning and automation. Your data platform should be able to automate routine tasks and intelligently accelerate more complex tasks with intelligent recommendations. This is the only way you are going to be able to meet the demands of the business and deal with the growing data complexity and technology disruptions.
Technology disruption will bring challenges and opportunities. For more on this subject, see the Informatica eBook, Think ‘Data First’ to Drive Business Value.
Within every corporation there are lines of businesses, like Finance, Sales, Logistics and Marketing. And within those lines of businesses are business users who are either non-technical or choose to be non-technical.
These business users are increasingly using Next-Generation Business Intelligence Tools like Tableau, Qliktech, MicroStrategy Visual Insight, Spotfire or even Excel. A unique capability of these Next-Generation Business Intelligence Tools is that they allow a non-technical Business User to prepare data, themselves, prior to the ingestion of the prepared data into these tools for subsequent analysis.
Initially, the types of activities involved in preparing this data are quite simple. It involves, perhaps, putting together two excel files via a join on a common field. However, over time, the types of operations a non-technical user wishes to perform on the data become more complex. They wish to do things like join two files of differing grain, or validate/complete addresses, or even enrich company or customer profile data. And when a non-technical user reaches this point they require either coding or advanced tooling, neither of which they have access to. Therefore, at this point, they will pick up the phone, call their brethren in IT and ask nicely for help with combining, enhancing quality and enriching the data. Often times they require the resulting dataset back in a tight timeframe, perhaps a couple of hours. IT, will initially be very happy to oblige. They will get the dataset back to the business user in the timeframe requested and at the quality levels expected. No issues.
However, as the number of non-technical Business Users using Next-Generation Business Intelligence tools increase, the number of requests to IT for datasets also increase. And so, while initially IT was able to meet the “quick hit dataset” requests from the Business, over time, and to the best of their abilities, IT increasingly becomes unable to do so.
The reality is that over time, the business will see a gradual decrease in the quality of the datasets returned, as well as an increase the timeframe required for IT to provide the data. And at some point the business will reach a decision point. This is where they determine that for them to meet their business commitments, they will have to find other means by which to put together their “quick hit datasets.” It is precisely at this point that the business may do things like hire an IT contractor to sit next to them to do nothing but put together these “quick hit” datasets. It is also when IT begins to feel marginalized and will likely begin to see a drop in funding.
This dynamic is one that has been around for decades and has continued to worsen due to the increase in the pace of data driven business decision making. I feel that we at Informatica have a truly unique opportunity to innovate a technology solution that focuses on two related constituents, specifically, the Non-Technical Business User and the IT Data Provisioner.
The specific point of value that this technology will provide to the Non-Technical Business User will enable them to rapidly put together datasets for subsequent analysis in their Next-Generation BI tool of choice. Without this tool they might spend a week or two putting together a dataset or wait for someone else to put it together. I feel we can improve this division-of-labor and allow business users to spend 1-2 weeks performing meaningful analysis before spending 15 minutes putting the data set together themselves. Doing so, we allow non-technical business users to dramatically decrease their decision making time.
The specific point of value that this technology will provide the IT data provisioner is that they will now be able to effectively scale data provisioning as the number of requests for “quick hit datasets” rapidly increase. Most importantly, they will be able to scale, proactively.
Because of this, the Business and IT relationship has become a match made in heaven.
At the Informatica World 2014 pre-conference, the “ILM Day” sessions were packed, with over 100 people in attendance. This attendance reflects the strong interest in data archive, test data management and data security. Customers were the focus of the panel sessions today, taking center stage to share their experiences, best practices and lessons learned from successful deployments.
Both the test management and data archive panels had strong audience interest and interaction. For Test Data Management, the panel topic was “Agile Development by Streamlining Test Data Management”; for data archive, the session tackled “Managing Data Growth in the Era of Application Consolidation and Modernization”. The panels provided practical tactics and strategies to address the challenges and issues in managing data growth, and how to efficiently and safely provision test data. Thank you to the customers, partners and analysts who served on the panels; participating was EMC, Visteon, Comcast, Lowes, Tata Consultancy Services and Neuralytix.
The day concluded with a most excellent presentation from the ILM General Manager, Amit Walia and the CTO of the International Association of Privacy Professionals, Jeff Northrop. Amit provided an executive summary pre-view of Tuesday’s Secure@Source(TM) announcement, while Jeff Northrop provided a thought provoking market backdrop on the issues and challenges for data privacy and security, and how the focus on information security needs to shift to a ‘data-centric’ approach.
A very successful event for all involved!
In the Information Age we live and work in, where it’s hard to go even one day without a Google search, where do you turn for insights that can help you solve work challenges and progress your career? This is a tough question. How can we deal with the challenges of information overload – which some have called information pollution? (more…)
Data is everywhere. It’s in databases and applications spread across your enterprise. It’s in the hands of your customers and partners. It’s in cloud applications and cloud servers. It’s on spreadsheets and documents on your employee’s laptops and tablets. It’s in smartphones, sensors and GPS devices. It’s in the blogosphere, the twittersphere and your friends’ Facebook timelines. (more…)
In a recent visit to a client, three people asked me to autograph their copies of Integration Competency Center: An Implementation Guidebook. David Lyle and I published the book in 2005, but it was clear from the dog-eared corners and book-mark tabs that it is still relevant and actively being used today. Much has changed in the last seven years including the emergence of Big Data, Data Virtualization, Cloud Integration, Self-Service Business Intelligence, Lean and Agile practices, Data Privacy, Data Archiving (the “death” part of the information life-cycle), and Data Governance. These areas were not mainstream concerns in 2005 like they are today. The original ICC (Integration Competency Center) book concepts and advice are still valid in this new context, but the question I’d like readers to comment on is should we write a new book that explicitly provides guidance for these new capabilities in a shared services environment? (more…)