Category Archives: Data Services
Fulfilling its duty after a 10-year journey, Rosetta dropped its lander, Philae, to gather data from the comet below.
Everything about the comet so far is both challenging and fascinating, from its advanced age – 4.6 billion years old, to its hard-to- pronounce name, Churyumov-Gerasimenko.
The size of Gerasimenko? Roughly that of lower Manhattan. The shape wasn’t the potato-like image some anticipated of a typical comet. Instead it had a form that was compared to that of a “rubber-duck,” making landing trickier than expected.
To add one more challenging feature, the comet was flying at 38,000 mph. The feat of landing the probe onto the comet has been compared to hitting a speeding bullet with another speeding bullet.
All of this would have been impossible if the ESA didn’t have serious data on the lofty goal they set forth.
As this was happening, on the same day there was a quieter landing: Salesforce and LinkedIn paired up to publish research they conducted on marketing strategies by surveying 900+ senior-level B2B and B2C marketers through their social networks about their roles, marketing trends, and challenges they face.
This one finding stood out to me: “Only 17% of respondents said their company had fully integrated their customer data across all areas of the organization. However, 97% of those ‘fully integrated’ marketing leaders said they were at least somewhat effective at creating a cohesive customer journey across all touchpoints and channels.”
While not as many companies were implementing customer data like they should, those who did felt the strong impact of the benefits. It’s like knowing the difference between interacting with a potato-shaped company, or a B2C, vs. interacting with a rubber-duck-shaped company, or a B2B, for example.
Efficient customer data could help you learn how to land each one properly. While the methods for dealing with both might be similar, they’re not identical, and taking the wrong approach could mean a failed landing. One of the conclusions from the survey showed there is a “direct link between how well a marketer integrated customer data and the self-reported successes of that brand’s strategy.”
When interviewed by MSNBC on the comet landing, Bill Nye, also known as “the Science Guy,” had many positive things to say on the historic event. One question he answered was why do programs like the ESA exist – or basically, why do we go to space?
Nye had two replies: “It raises the expectations of your society,” and “You’re going to make discoveries.”
Marketers armed with insights from powerful customer data can have their own “landing on a comet” moment. Properly integrated customer data means you’ll be making new discoveries about your own clientele while simultaneously raising the expectations of your business.
The world couldn’t progress forward without quality data, whether in the realm of retail or planetary science. We put a strong emphasis on validating and cleanse your customer data at the point of entry or the point of collection.
This article was originally published on www.federaltimes.com.
November – that time of the year. This year, November 1 was the start of Election Day weekend and the associated endless barrage of political ads. It also marked the end of Daylight Savings Time. But, perhaps more prominently, it marked the beginning of the holiday shopping season. Winter holiday decorations erupted in stores even before Halloween decorations were taken down. There were commercials and ads, free shipping on this, sales on that, singing, and even the first appearance of Santa Claus.
However, it’s not all joy and jingle bells. The kickoff to this holiday shopping season may also remind many of the countless credit card breaches at retailers that plagued last year’s shopping season and beyond. The breaches at Target, where almost 100 million credit cards were compromised, Neiman Marcus, Home Depot and Michael’s exemplify the urgent need for retailers to aggressively protect customer information.
In addition to the holiday shopping season, November also marks the next round of open enrollment for the ACA healthcare exchanges. Therefore, to avoid falling victim to the next data breach, government organizations as much as retailers, need to have data security top of mind.
According to the New York Times (Sept. 4, 2014), “for months, cyber security professionals have been warning that the healthcare site was a ripe target for hackers eager to gain access to personal data that could be sold on the black market. A week before federal officials discovered the breach at HealthCare.gov, a hospital operator in Tennessee said that Chinese hackers had stolen personal data for 4.5 million patients.”
Acknowledging the inevitability of further attacks, companies and organizations are taking action. For example, the National Retail Federation created the NRF IT Council, which is made up of 130 technology-security experts focused on safeguarding personal and company data.
Is government doing enough to protect personal, financial and health data in light of these increasing and persistent threats? The quick answer: no. The federal government as a whole is not meeting the data privacy and security challenge. Reports of cyber attacks and breaches are becoming commonplace, and warnings of new privacy concerns in many federal agencies and programs are being discussed in Congress, Inspector General reports and the media. According to a recent Government Accountability Office report, 18 out of 24 major federal agencies in the United States reported inadequate information security controls. Further, FISMA and HIPAA are falling short and antiquated security protocols, such as encryption, are also not keeping up with the sophistication of attacks. Government must follow the lead of industry and look for new and advanced data protection technologies, such as dynamic data masking and continuous data monitoring to prevent and thwart potential attacks.
These five principles can be implemented by any agency to curb the likelihood of a breach:
1. Expand the appointment and authority of CSOs and CISOs at the agency level.
3. Protect all environments from development to production, including backups and archives.
4. Data and application security must be prioritized at the same level as network and perimeter security.
5. Data security should follow data through downstream systems and reporting.
So, as the season of voting, rollbacks, on-line shopping events, free shipping, Black Friday, Cyber Monday and healthcare enrollment begins, so does the time for protecting personal identifiable information, financial information, credit cards and health information. Individuals, retailers, industry and government need to think about data first and stay vigilant and focused.
Recent published research shows that “faster” is better than “slower.” The point, ladies and gentlemen, is that speed, for lack of a better word, is good. But granted, you won’t always have the need for speed. My Lamborghini is handy when I need to elude the Bakersfield fuzz on I-5, but it does nothing for my Costco trips. There, I go with capacity and haul home my 30-gallon tubs of ketchup with my Ford F150. (Note: this is a fictitious example, I don’t actually own an F150.)
But if speed is critical, like in your data streaming application, then Informatica Vibe Data Stream and the MapR Distribution including Apache™ Hadoop® are the technologies to use together. But since Vibe Data Stream works with any Hadoop distribution, my discussion here is more broadly applicable. I first discussed this topic earlier this year during my presentation at Informatica World 2014. In that talk, I also briefly described architectures that include streaming components, like the Lambda Architecture and enterprise data hubs. I recommend that any enterprise architect should become familiar with these high-level architectures.
Data streaming deals with a continuous flow of data, often at a fast rate. As you might’ve suspected by now, Vibe Data Stream, based on the Informatica Ultra Messaging technology, is great for that. With its roots in high speed trading in capital markets, Ultra Messaging quickly and reliably gets high value data from point A to point B. Vibe Data Stream adds management features to make it consumable by the rest of us, beyond stock trading. Not surprisingly, Vibe Data Stream can be used anywhere you need to quickly and reliably deliver data (just don’t use it for sharing your cat photos, please), and that’s what I discussed at Informatica World. Let me discuss two examples I gave.
Large Query Support. Let’s first look at “large queries.” I don’t mean the stuff you type on search engines, which are typically no more than 20 characters. I’m referring to an environment where the query is a huge block of data. For example, what if I have an image of an unidentified face, and I want to send it to a remote facial recognition service and immediately get the identity? The image would be the query, the facial recognition system could be run on Hadoop for fast divide-and-conquer processing, and the result would be the person’s name. There are many similar use cases that could leverage a high speed, reliable data delivery system along with a fast processing platform, to get immediate answers to a data-heavy question.
Data Warehouse Onload. For another example, we turn to our old friend the data warehouse. If you’ve been following all the industry talk about data warehouse optimization, you know pumping high speed data directly into your data warehouse is not an efficient use of your high value system. So instead, pipe your fast data streams into Hadoop, run some complex aggregations, then load that processed data into your warehouse. And you might consider freeing up large processing jobs from your data warehouse onto Hadoop. As you process and aggregate that data, you create a data flow cycle where you return enriched data back to the warehouse. This gives your end users efficient analysis on comprehensive data sets.
Hopefully this stirs up ideas on how you might deploy high speed streaming in your enterprise architecture. Expect to see many new stories of interesting streaming applications in the coming months and years, especially with the anticipated proliferation of internet-of-things and sensor data.
To learn more about Vibe Data Stream you can find it on the Informatica Marketplace .
The Informatica Cloud team has been busy updating connectivity to Hadoop using the Cloud Connector SDK. Updated connectors are available now for Cloudera and Hortonworks and new connectivity has been added for MapR, Pivotal HD and Amazon EMR (Elastic Map Reduce).
Informatica Cloud’s Hadoop connectivity brings a new level of ease of use to Hadoop data loading and integration. Informatica Cloud provides a quick way to load data from popular on premise data sources and apps such as SAP and Oracle E-Business, as well as SaaS apps, such as Salesforce.com, NetSuite, and Workday, into Hadoop clusters for pilots and POCs. Less technical users are empowered to contribute to enterprise data lakes through the easy-to-use Informatica Cloud web user interface.
Informatica Cloud’s rich connectivity to a multitude of SaaS apps can now be leveraged with Hadoop. Data from SaaS apps for CRM, ERP and other lines of business are becoming increasingly important to enterprises. Bringing this data into Hadoop for analytics is now easier than ever.
Users of Amazon Web Services (AWS) can leverage Informatica Cloud to load data from SaaS apps and on premise sources into EMR directly. Combined with connectivity to Amazon Redshift, Informatica Cloud can be used to move data into EMR for processing and then onto Redshift for analytics.
Self service data loading and basic integration can be done by less technical users through Informatica Cloud’s drag and drop web-based user interface. This enables more of the team to contribute to and collaborate on data lakes without having to learn Hadoop.
Bringing the cloud and Big Data together to put the potential of data to work – that’s the power of Informatica in action.
Free trials of the Informatica Cloud Connector for Hadoop are available here: http://www.informaticacloud.com/connectivity/hadoop-connector.html
I recently participated on an EDM Council panel on BCBS 239 earlier this month in London and New York. The panel consisted of Chief Risk Officers, Chief Data Officers, and information management experts from the financial industry. BCBS 239 set out 14 key principles requiring banks aggregate their risk data to allow banking regulators to avoid another 2008 crisis, with a deadline of Jan 1, 2016. Earlier this year, the Basel Committee on Banking Supervision released the findings from a self-assessment from the Globally Systemically Important Banks (GISB’s) in their readiness to 11 out of the 14 principles related to BCBS 239.
Given all of the investments made by the banking industry to improve data management and governance practices to improve ongoing risk measurement and management, I was expecting to hear signs of significant process. Unfortunately, there is still much work to be done to satisfy BCBS 239 as evidenced from my findings. Here is what we discussed in London and New York.
- It was clear that the “Data Agenda” has shifted quite considerably from IT to the Business as evidenced by the number of risk, compliance, and data governance executives in the room. Though it’s a good sign that business is taking more ownership of data requirements, there was limited discussions on the importance of having capable data management technology, infrastructure, and architecture to support a successful data governance practice. Specifically capable data integration, data quality and validation, master and reference data management, metadata to support data lineage and transparency, and business glossary and data ontology solutions to govern the terms and definitions of required data across the enterprise.
- With regard to accessing, aggregating, and streamlining the delivery of risk data from disparate systems across the enterprise and simplifying the complexity that exists today from point to point integrations accessing the same data from the same systems over and over again creating points of failure and increasing the maintenance costs of supporting the current state. The idea of replacing those point to point integrations via a centralized, scalable, and flexible data hub approach was clearly recognized as a need however, difficult to envision given the enormous work to modernize the current state.
- Data accuracy and integrity continues to be a concern to generate accurate and reliable risk data to meet normal and stress/crisis reporting accuracy requirements. Many in the room acknowledged heavy reliance on manual methods implemented over the years and investing in Automating data integration and onboarding risk data from disparate systems across the enterprise is important as part of Principle 3 however, much of what’s in place today was built as one off projects against the same systems accessing the same data delivering it to hundreds if not thousands of downstream applications in an inconsistent and costly way.
- Data transparency and auditability was a popular conversation point in the room as the need to provide comprehensive data lineage reports to help explain how data is captured, from where, how it’s transformed, and used remains a concern despite advancements in technical metadata solutions that are not integrated with their existing risk management data infrastructure
- Lastly, big concerns regarding the ability to capture and aggregate all material risk data across the banking group to deliver data by business line, legal entity, asset type, industry, region and other groupings, to support identifying and reporting risk exposures, concentrations and emerging risks. This master and reference data challenge unfortunately cannot be solved by external data utility providers due to the fact the banks have legal entity, client, counterparty, and securities instrument data residing in existing systems that require the ability to cross reference any external identifier for consistent reporting and risk measurement.
To sum it up, most banks admit they have a lot of work to do. Specifically, they must work to address gaps across their data governance and technology infrastructure.BCBS 239 is the latest and biggest data challenge facing the banking industry and not just for the GSIB’s but also for the next level down as mid-size firms will also be required to provide similar transparency to regional regulators who are adopting BCBS 239 as a framework for their local markets. BCBS 239 is not just a deadline but the principles set forth are a key requirement for banks to ensure they have the right data to manage risk and ensure transparency to industry regulators to monitor system risk across the global markets. How ready are you?
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part II
- Do you need to protect data at rest (in storage), during transmission, and/or when accessed?
- Do some privileged users still need the ability to view the original sensitive data or does sensitive data need to be obfuscated at all levels?
- What is the granularity of controls that you need?
- Datafile level
- Table level
- Row level
- Field / column level
- Cell level
- Do you need to be able to control viewing vs. modification of sensitive data?
- Do you need to maintain the original characteristics / format of the data (e.g. for testing, demo, development purposes)?
- Is response time latency / performance of high importance for the application? This can be the case for mission critical production applications that need to maintain response times in the order of seconds or sub-seconds.
In order to help you determine which method of control is appropriate for your requirements, the following table provides a comparison of the different methods and their characteristics.
A combination of protection method may be appropriate based on your requirements. For example, to protect data in non-production environments, you may want to use persistent data masking to ensure that no one has access to the original production data, since they don’t need to. This is especially true if your development and testing is outsourced to third parties. In addition, persistent data masking allows you to maintain the original characteristics of the data to ensure test data quality.
In production environments, you may want to use a combination of encryption and dynamic data masking. This is the case if you would like to ensure that all data at rest is protected against unauthorized users, yet you need to protect sensitive fields only for certain sets of authorized or privileged users, but the rest of your users should be able to view the data in the clear.
The best method or combination of methods will depend on each scenario and set of requirements for your environment and organization. As with any technology and solution, there is no one size fits all.
Which Method of Controls Should You Use to Protect Sensitive Data in Databases and Enterprise Applications? Part I
- Which types of data should be protected?
- Which data should be classified as “sensitive?”
- Where is this sensitive data located?
- Which groups of users should have access to this data?
Because these questions come up frequently, it seems ideal to share a few guidelines on this topic.
When protecting the confidentiality and integrity of data, the first level of defense is Authentication and access control. However, data with higher levels of sensitivity or confidentiality may require additional levels of protection, beyond regular authentication and authorization methods.
There are a number of control methods for securing sensitive data available in the market today, including:
- Persistent (Static) Data Masking
- Dynamic Data Masking
- Retention management and purging
Encryption is a cryptographic method of encoding data. There are generally, two methods of encryption: symmetric (using single secret key) and asymmetric (using public and private keys). Although there are methods of deciphering encrypted information without possessing the key, a good encryption algorithm makes it very difficult to decode the encrypted data without knowledge of the key. Key management is usually a key concern with this method of control. Encryption is ideal for mass protection of data (e.g. an entire data file, table, partition, etc.) against unauthorized users.
Persistent or static data masking obfuscates data at rest in storage. There is usually no way to retrieve the original data – the data is permanently masked. There are multiple techniques for masking data, including: shuffling, substitution, aging, encryption, domain-specific masking (e.g. email address, IP address, credit card, etc.), dictionary lookup, randomization, etc. Depending on the technique, there may be ways to perform reverse masking - this should be used sparingly. Persistent masking is ideal for cases where all users should not see the original sensitive data (e.g. for test / development environments) and field level data protection is required.
Dynamic data masking de-identifies data when it is accessed. The original data is still stored in the database. Dynamic data masking (DDM) acts as a proxy between the application and database and rewrites the user / application request against the database depending on whether the user has the privilege to view the data or not. If the requested data is not sensitive or the user is a privileged user who has the permission to access the sensitive data, then the DDM proxy passes the request to the database without modification, and the result set is returned to the user in the clear. If the data is sensitive and the user does not have the privilege to view the data, then the DDM proxy rewrites the request to include a masking function and passes the request to the database to execute. The result is returned to the user with the sensitive data masked. Dynamic data masking is ideal for protecting sensitive fields in production systems where application changes are difficult or disruptive to implement and performance / response time is of high importance.
Tokenization substitutes a sensitive data element with a non-sensitive data element or token. The first generation tokenization system requires a token server and a database to store the original sensitive data. The mapping from the clear text to the token makes it very difficult to reverse the token back to the original data without the token system. The existence of a token server and database storing the original sensitive data renders the token server and mapping database as a potential point of security vulnerability, bottleneck for scalability, and single point of failure. Next generation tokenization systems have addressed these weaknesses. However, tokenization does require changes to the application layer to tokenize and detokenize when the sensitive data is accessed. Tokenization can be used in production systems to protect sensitive data at rest in the database store, when changes to the application layer can be made relatively easily to perform the tokenization / detokenization operations.
Retention management and purging is more of a data management method to ensure that data is retained only as long as necessary. The best method of reducing data privacy risk is to eliminate the sensitive data. Therefore, appropriate retention, archiving, and purging policies should be applied to reduce the privacy and legal risks of holding on to sensitive data for too long. Retention management and purging is a data management best practices that should always be put to use.
In 2012, Forbes published an article predicting an upcoming problem.
The Need for Scalable Enterprise Analytics
Specifically, increased exploration in Big Data opportunities would place pressure on the typical corporate infrastructure. The generic hardware used to run most tech industry enterprise applications was not designed to handle real-time data processing. As a result, the explosion of mobile usages, and the proliferation of social networks, was increasing the strain on the system. Most companies now faced real-time processing requirements beyond what the traditional model was designed to handle.
In the past two years, the volume of data and speed of data growth has grown significantly. As a result, the problem has become more severe. It is now clear that these challenges can’t be overcome by simply doubling or tripling their IT spending on infrastructure sprawl. Today, enterprises seek consolidated solutions that offer scalability, performance and ease of administration. The present need is for scalable enterprise analytics.
A Clear Solution Is Available
Informatica PowerCenter and Data Quality is the market leading data integration and data quality platform. This platform has now been certified by Oracle as an optimal solution for both the Oracle Exadata Database Machine and the Oracle SuperCluster.
As the high-speed on-ramp for data into Oracle Exadata, PowerCenter and Data Quality deliver up-to five times faster performance on data load, query, profiling and cleansing tasks. Informatica’s data integration customers can now easily reuse data integration code, skills and resources to access and transform any data from any data source and load it into Exadata, with the highest throughput and scalability.
Customers adopting Oracle Exadata for high-volume, high-speed analytics can now be confident with Informatica PowerCenter and Data Quality. With these products, they can ingest, cleanse and transform all types of data into Exadata with the highest performance and scale required to maximize the value of their Exadata investment.
Proving the Value of Scalable Enterprise Analytics
In order to demonstrate the efficacy of their partnership, the two companies worked together on a Proof Of Value (POV) project. The goal is to prove that using PowerCenter with Exadata would improve both performance and scalability. The project involved PowerCenter and Data Quality 9.6.1 and x4-2 Exadata Machine. Oracle 11g was considered for both standard Oracle and Exadata versions.
The first test conducted a 1TB load test to Exadata and standard Oracle in a typical PowerCenter use case. The second test consisted of querying 1TB profiling warehouse database in Data Quality use case scenario. Performance data was collected for both tests. The scalability factor was also captured. A variant of the TPCH dataset was used to generate the test data. The results were significantly higher than prior Exabyte 1TB test. In particular:
- The data query tests achieved 5x performance.
- The data load tests achieved a 3x-5x speed increase.
- Linear scalability was achieved with read/write tests on Exadata.
What Business Benefits Could You Expect?
Informatica PowerCenter and Data Quality, along-with Oracle Exadata, now provide the best-of-breed combination of software and hardware, optimized to deliver the highest possible total system performance. These comprehensive tools drive agile reporting and analytics, while empowering IT organizations to meet SLAs and quality goals like never before.
- Extend Oracle Exadata’s access to even more business critical data sources. Utilize optimized out-of-the-box Informatica connectivity to easily access hundreds of data sources, including all the major databases, on-premise and cloud applications, mainframe, social data and Hadoop.
- Get more data, more quickly into Oracle Exadata. Move higher volumes of trusted data quickly into Exadata to support timely reporting with up-to-date information (i.e. up to 5x performance improvement compared to Oracle database).
- Centralize management and improve insight into large scale data warehouses. Deliver the necessary insights to stakeholders with intuitive data lineage and a collaborative business glossary. Contribute to high quality business analytics, in a timely manner across the enterprise.
- Instantly re-direct workloads and resources to Oracle Exadata without compromising performance. Leverage existing code and programming skills to execute high-performance data integration directly on Exadata by performing push down optimization.
- Roll-out data integration projects faster and more cost-effectively. Customers can now leverage thousands of Informatica certified developers to execute existing data integration and quality transformations directly on Oracle Exadata, without any additional coding.
- Efficiently scale-up and scale-out. Customers can now maximize performance and lower the costs of data integration and quality operations of any scale by performing Informatica workload and push down optimization on Oracle Exadata.
- Save significant costs involved in administration and expansion. Customers can now easily and economically manage large-scale analytics data warehousing environments with a single point of administration and control, and consolidate a multitude of servers on one rack.
- Reduce risk. Customers can now leverage Informatica’s data integration and quality platform to overcome the typical performance and scalability limitations seen in databases and data storage systems. This will help reduce quality-of-service risks as data volumes rise.
Oracle Exadata is a well-engineered system that offers customers out-of-box scalability and performance on demand. Informatica PowerCenter and Data Quality are optimized to run on Exadata, offering customers business benefits that speed up data integration and data quality tasks like never before. Informatica’s certified, optimized, and purpose-built solutions for Oracle can help you enable more timely and trustworthy reporting. You can now benefit from Informatica’s optimized solutions for Oracle Exadata to make better business decisions by unlocking the full potential of the most current and complete enterprise data available. As shown in our test results, you can attain up to 5x performance by scaling Exadata. Informatica Data Quality customers can perform profiling 1TB datasets, which is unheard before. We urge you to deploy the combined solution to solve your data integration and quality problems today while achieving high speed business analytics in these days of big data exploration and Internet Of Things.
Listen to what Ash Kulkarni, SVP, at OOW14 has to say on how @InformaticaCORP PowerCenter and Data Quality certified by Oracle as optimized for Exadata can deliver up-to five times faster performance improvement on data load, query, profiling, cleansing and mastering tasks, for Exadata.
You probably know this already, but I’m going to say it anyway: It’s time you changed your infrastructure. I say this because most companies are still running infrastructure optimized for ERP, CRM and other transactional systems. That’s all well and good for running IT-intensive, back-office tasks. Unfortunately, this sort of infrastructure isn’t great for today’s business imperatives of mobility, cloud computing and Big Data analytics.
Virtually all of these imperatives are fueled by information gleaned from potentially dozens of sources to reveal our users’ and customers’ activities, relationships and likes. Forward-thinking companies are using such data to find new customers, retain existing ones and increase their market share. The trick lies in translating all this disparate data into useful meaning. And to do that, IT needs to move beyond focusing solely on transactions, and instead shine a light on the interactions that matter to their customers, their products and their business processes.
They need what we at Informatica call a “Data First” perspective. You can check out my first blog first about being Data First here.
A Data First POV changes everything from product development, to business processes, to how IT organizes itself and —most especially — the impact IT has on your company’s business. That’s because cloud computing, Big Data and mobile app development shift IT’s responsibilities away from running and administering equipment, onto aggregating, organizing and improving myriad data types pulled in from internal and external databases, online posts and public sources. And that shift makes IT a more-empowering force for business change. Think about it: The ability to connect and relate the dots across data from multiple sources finally gives you real power to improve entire business processes, departments and organizations.
I like to say that the role of IT is now “big I, little t,” with that lowercase “t” representing both technology and transactions. But that role requires a new set of priorities. They are:
- Think about information infrastructure first and application infrastructure second.
- Create great data by design. Architect for connectivity, cleanliness and security. Check out the eBook Data Integration for Dummies.
- Optimize for speed and ease of use – SaaS and mobile applications change often. Click here to try Informatica Cloud for free for 30 days.
- Make data a team sport. Get tools into your users’ hands so they can prepare and interact with it.
I never said this would be easy, and there’s no blueprint for how to go about doing it. Still, I recognize that a little guidance will be helpful. In a few weeks, Informatica’s CIO Eric Johnson and I will talk about how we at Informatica practice what we preach.
Configuring your Oracle environment for using PowerExchange CDC can be challenging, but there are some best practices you can follow that will greatly simplify the process. There are two major factors to consider when approaching this: latency requirements for your data and the ability to restart your environment.
Data Latency Requirements
The first factor that will effect latency of your data is the location of your PowerExchange CDC installation. From a best practice perspective, it is optimal to install the PowerExchange Listener on the source database server as this eliminates the need to pass data across the network and will provide the smallest amount of latency from source to target.
The volume of data that PowerExchange CDC has to process can also have a significant impact on performance. There are several items in addition to the changed data that can effect performance, including, but are not limited to, Oracle catalog dumps, Oracle workload monitor customizations and other non-Oracle tools that use the redo logs. You should conduct a review of all the processes that access Oracle redo logs, and make an effort to minimize them in terms both volume and frequency. For example, you could monitor the redo log switches and the creation of archived log files to see how busy the source database is. The size of your production archive logs and knowing how often they are being created will provide the information necessary to properly configure PowerExchange CDC.
Environment Restart Ability
When certain changes are made to the source database environment, the PowerExchange CDC process will need to be stopped and restarted. The amount of time this restart takes should be considered whenever this needs to occur. PowerExchange CDC must be restarted when any of the following changes occur:
- A change is made to the schema or a table that is part of the CDC process
- An existing Capture Registration is changed
- A change is made to the PowerExchange configuration files
- An Oracle patch is applied
- An Operating System patch or upgrade is applied
- A PowerExchange version upgrade or service pack is applied
If using the CDC with LogMiner, a copy of the Oracle catalog must be placed on the archive log in order to function properly. The frequency of these copies is site-specific and will have an impact on the amount of time it will take to restart the CDC process.
Once your PowerExchange CDC process is in production, any changes to the environment must have extensive impact analysis performed to ensure the integrity of the data and the transactions remains intact upon restart. Understanding the configurable parameters in the PowerExchange configuration files that will assist restart performance is of the utmost importance.
Even with the challenges presented when configuring PowerExchange CDC for Oracle, there are trusted and proven methods that can significantly improve your ability to complete this process and have real time or near real time access to your data. At SSG, we’re committed to always utilizing best practice methodology with our PowerExchange Baseline Deployments. In addition, we provide in-depth knowledge transfer to set end users up with a solid foundation for optimizing PowerExchange functionality.
Visit the Informatica Marketplace to learn more about SSG’s Baseline Deployment offerings.