Category Archives: Data Synchronization
This got me thinking: What is the biggest bottleneck in the delivery of business value today? I know I look at things from a data perspective, but data is the biggest bottleneck. Consider this prediction from Gartner:
“Gartner predicts organizations will spend one-third more on app integration in 2016 than they did in 2013. What’s more, by 2018, more than half the cost of implementing new large systems will be spent on integration. “
When we talk about application integration, we’re talking about moving data, synchronizing data, cleansing, data, transforming data, testing data. The question for architects and senior management is this: Do you have the Data Foundation for Execution you need to drive the business results you require to compete? The answer, unfortunately, for most companies is; No.
All too often data management is an add-on to larger application-based projects. The result is unconnected and non-interoperable islands of data across the organization. That simply is not going to work in the coming competitive environment. Here are a couple of quick examples:
- Many companies are looking to compete on their use of analytics. That requires collecting, managing, and analyzing data from multiple internal and external sources.
- Many companies are focusing on a better customer experience to drive their business. This again requires data from many internal sources, plus social, mobile and location-based data to be effective.
When I talk to architects about the business risks of not having a shared data architecture, and common tools and practices for enterprise data management, they “get” the problem. So why aren’t they addressing it? The issue is that they find that they are only funded to do the project they are working on and are dealing with very demanding timeframe requirements. They have no funding or mandate to solve the larger enterprise data management problem, which is getting more complex and brittle with each new un-connected project or initiative that is added to the pile.
Studies such as “The Data Directive” by The Economist show that organizations that actively manage their data are more successful. But, if that is the desired future state, how do you get there?
Changing an organization to look at data as the fuel that drives strategy takes hard work and leadership. It also takes a strong enterprise data architecture vision and strategy. For fresh thinking on the subject of building a data foundation for execution, see “Think Data-First to Drive Business Value” from Informatica.
* By the way, Informatica is proud to announce that we are now a sponsor of the MIT Center for Information Systems Research.
Configuring your Oracle environment for using PowerExchange CDC can be challenging, but there are some best practices you can follow that will greatly simplify the process. There are two major factors to consider when approaching this: latency requirements for your data and the ability to restart your environment.
Data Latency Requirements
The first factor that will effect latency of your data is the location of your PowerExchange CDC installation. From a best practice perspective, it is optimal to install the PowerExchange Listener on the source database server as this eliminates the need to pass data across the network and will provide the smallest amount of latency from source to target.
The volume of data that PowerExchange CDC has to process can also have a significant impact on performance. There are several items in addition to the changed data that can effect performance, including, but are not limited to, Oracle catalog dumps, Oracle workload monitor customizations and other non-Oracle tools that use the redo logs. You should conduct a review of all the processes that access Oracle redo logs, and make an effort to minimize them in terms both volume and frequency. For example, you could monitor the redo log switches and the creation of archived log files to see how busy the source database is. The size of your production archive logs and knowing how often they are being created will provide the information necessary to properly configure PowerExchange CDC.
Environment Restart Ability
When certain changes are made to the source database environment, the PowerExchange CDC process will need to be stopped and restarted. The amount of time this restart takes should be considered whenever this needs to occur. PowerExchange CDC must be restarted when any of the following changes occur:
- A change is made to the schema or a table that is part of the CDC process
- An existing Capture Registration is changed
- A change is made to the PowerExchange configuration files
- An Oracle patch is applied
- An Operating System patch or upgrade is applied
- A PowerExchange version upgrade or service pack is applied
If using the CDC with LogMiner, a copy of the Oracle catalog must be placed on the archive log in order to function properly. The frequency of these copies is site-specific and will have an impact on the amount of time it will take to restart the CDC process.
Once your PowerExchange CDC process is in production, any changes to the environment must have extensive impact analysis performed to ensure the integrity of the data and the transactions remains intact upon restart. Understanding the configurable parameters in the PowerExchange configuration files that will assist restart performance is of the utmost importance.
Even with the challenges presented when configuring PowerExchange CDC for Oracle, there are trusted and proven methods that can significantly improve your ability to complete this process and have real time or near real time access to your data. At SSG, we’re committed to always utilizing best practice methodology with our PowerExchange Baseline Deployments. In addition, we provide in-depth knowledge transfer to set end users up with a solid foundation for optimizing PowerExchange functionality.
Visit the Informatica Marketplace to learn more about SSG’s Baseline Deployment offerings.
Ah yes, the Old Mainframe. It just won’t go away. Which means there is still valuable data sitting in it. And that leads to a question that I have been asked about repeatedly in the past few weeks, about why an organization should use a tool like Informatica PowerExchange to extract data from a mainframe when you can also do it with a script that extracts the data as a flat file.
So below, thanks to Phil Line, Informatica’s Product Manager for Mainframe connectivity, are the top ten reasons to use PowerExchange over hand coding a flat file extraction.
1) Data will be “fresh” as of the time the data is needed – not already old based on when the extraction was run.
2) Any data extracted directly from files will be as the file held it, any additional processes needed to run in order to extract/transfer data to LUW could potentially alter the original formats.
3) The consuming application can get the data when it needs it; there wouldn’t be any scheduling issues between creating the extract file and then being able to use it.
4) There is less work to do if PowerExchange reads the data directly from the mainframe, data type processing as well as potential code page issues are all handled by PowerExchange.
5) Unlike any files created with ftp type processes, where problems could cut short the expected data transfer, PowerExchange/PowerCenter provide log messages so as to ensure that all data has been processed.
6) The consumer has the capacity only to select the data that is needed for the consumer application, use of filtering can reduce the amount of data being transferred as well as any potential security aspects.
7) Any data access of mainframe based data can be secured according to the security tools in place on the mainframe; PowerExchange is fully compliant to RACF, ACF2 & Top-Secret security products.
8) Using Informatica’s PowerExchange, along with Informatica consuming tools (PowerCenter, Mercury etc.) provides a much simpler and cleaner architecture. The simpler the architecture the easier it is to find problems as well as audit the processes that are touching the data.
9) PowerExchange generally can help avoid the normal bottlenecks associated to getting data off of the mainframe, programmers are not needed to create the extract processes, new schedules don’t need to be created to ensure that the extracts run, in the event of changes being necessary they can be controlled by the Business group consuming the data.
10) Helps control mainframe data extraction processes that are still being run but from which no one uses the generated data as the original system that requested the data has now become obsolete.
I have a little fable to tell you…
This fable has nothing to do with Big Data, but instead deals with an Overabundance of Food and how to better digest it to make it useful.
And it all started when this SEO copywriter from IT Corporation walked into a bar, pub, grill, restaurant, liquor establishment, and noticed 2 large crowded tables. After what seemed like an endless loop, an SQL programmer sauntered in and contemplated the table problem. “Mind if I join you?”, he said? Since the tables were partially occupied and there were no virtual tables available, the host looked on the patio of the restaurant at 2 open tables. “Shall I do an outside join instead?” asked the programmer? The host considered their schema and assigned 2 seats to the space.
The writer told the programmer to look at the menu, bill of fare, blackboard – there were so many choices but not enough real nutrition. “Hmmm, I’m hungry for the right combination of food, grub, chow, to help me train for a triathlon” he said. With that contextual information, they thought about foregoing the menu items and instead getting in the all-you-can-eat buffer line. But there was too much food available and despite its appealing looks in its neat rows and columns, it seemed to be mostly empty calories. They both realized they had no idea what important elements were in the food, but came to the conclusion that this restaurant had a “Big Food” problem.
They scoped it out for a moment and then the writer did an about face, reversal, change in direction and the SQL programmer did a commit and quick pivot toward the buffer line where they did a batch insert of all of the food, even the BLOBS of spaghetti, mash potatoes and jello. There was far too much and it was far too rich for their tastes and needs, but they binged and consumed it all. You should have seen all the empty dishes at the end – they even caused a stack overflow. Because it was a batch binge, their digestive tracts didn’t know how to process all of the food, so they got a stomach ache from “big food” ingestion – and it nearly caused a core dump – in which case the restaurant host would have assigned his most dedicated servers to perform a thorough cleansing and scrubbing. There was no way to do a rollback at this point.
It was clear they needed relief. The programmer did an ad hoc query to JSON, their Server who they thought was Active, for a response about why they were having such “big food” indigestion, and did they have packets of relief available. No response. Then they asked again. There was still no response. So the programmer said to the writer, “Gee, the Quality Of Service here is terrible!”
Just then, the programmer remembered a remedy he had heard about previously and so he spoke up. “Oh, it’s very easy just <SELECT>Vibe.Data.Stream from INFORMATICA where REAL-TIME is NOT NULL.”
Informatica’s Vibe Data Stream enables streaming food collection for real-time Big food analytics, operational intelligence, and traditional enterprise food warehousing from a variety of distributed food sources at high scale and low latency. It enables the right food ingested at the right time when nutrition is needed without any need for binge or batch ingestion.
And so they all lived happily ever after and all was good in the IT Corporation once again.
Download Now and take your first steps to rapidly developing applications that sense and respond to streaming food (or data) in real-time.
When the average person hears of cloning, my bet is that they think of the controversy and ethical issues surrounding cloning, such as the cloning of Dolly the sheep, or the possible cloning of humans by a mad geneticist in a rogue nation state. I would also put money down that when an Informatica blog reader thinks of cloning they think of “The Matrix” or “Star Wars” (that dreadful episode II Attack of the Clones). I did. Unfortunately.
But my pragmatic expectation is that when Informatica customers think of cloning, they also think of Data Cloning software. Data Cloning software clones terabytes of database data into a host of other databases, data warehouses, analytical appliances, and Big Data stores such as Hadoop. And just for hoots and hollers, you should know that almost half of all Data Integration efforts involve replication, be it snapshot or real-time, according to TDWI survey data. Survey also says… replication is the second most popular — or second most used — data integration tool, behind ETL.
Do your company’s cloning tools work with non-standard types? Know that Informatica cloning tools can reproduce Oracle data to just about anything on 2 tuples (or more). We do non-discriminatory duplication, so it’s no wonder we especially fancy cloning the Oracle! (a thousand apologies for the bad “Matrix” pun)
Just remember that data clones are an important and natural component of business continuity, and the use cases span both operational and analytic applications. So if you’re not cloning your Oracle data safely and securely with the quality results that you need and deserve, it’s high time that you get some better tools.
Send in the Clones
With that in mind, if you haven’t tried to clone before, for a limited time, Informatica is making Fast Clone database cloning trial software product available for a free download. Click here to get it now.
This week at Informatica World 2013, the Data Integration Hub (DIH) was announced. It is the first out-of-the-box application for managing data access and distribution across large and complex infrastructures. It simplifies application data integration projects through the innovative use of publish and subscribe methods to decouple source and destination applicaitons. Of course we are very excited about the DIH but why should you be? Well, let me tell you a story…
This week we got the news that for the fifth year in a row, Informatica Cloud has won the 2012 Salesforce.com AppExchange Customer Choice Award. Informatica Cloud Integration for Salesforce was recognized as the winner in the very crowded IT and Administration category, which includes administration and IT, data cleansing, integration, IT management and other applications.These awards are based on the number and quality of customer reviews on the AppExchange. (more…)
The Informatica Winter 2013 announcement included the following customer quote:
“The Winter 2013 release will accelerate the time it takes to access, integrate and deliver valuable data in order to meet our business imperatives.”
It was also noted that, “the new Informatica Cloud user interface will make the cloud integration solution even more user friendly.” There are a number of user experience enhancements with this upgrade, so I sat down with Joshua Vaughn, Principal User Experience Designer for Informatica Cloud, to learn more about the impetus behind the new design and features, what’s on the horizon for the future releases, and why user interface (UI) design is so important for cloud applications.
Informatica Cloud Winter 2013 has arrived. This is the fourteenth release of the company’s award-winning family of cloud integration applications and integration platform as a service (iPaaS), which has now expanded to include Informatica Cloud Master Data Management (MDM). In this post I’ll provide an overview of the new cloud integration and cloud data quality capabilities. Be sure to register for a 30 day trial and/or attend the release webinar on Thursday to see Informatica Cloud in action.