There are many agile development methods and practices including Extreme Programming (XP), Scrum, Test Driven Development, Continuous Integration, Lean Software Development and Pair Programming to name a few. Core principles that most agile methods prescribe include incremental iterations with short time frames, minimal planning, teamwork, collaboration, and process adaptability throughout the life-cycle of the project.
While the concepts are easy enough to grasp and make a lot of sense, putting the principles into practice is hard as evidenced by more than one agile project that has failed. So what are some of the best practices that enable success? To narrow down the list, the focus of this article is on data integration scenarios. And to move beyond concepts and make it more practical I will focus even more specifically on the Informatica platform.
The question this article will try to answer is “what are the top recommendations for agile methods in order to accelerate time-to-implementation for integration solutions based on the Informatica platform?” In no particular order or priority, there are seven techniques which come to mind.
First, perform data profiling of data sources early in the life-cycle – even during the requirements phase. Data profiling is in fact a form of prototyping since it supports rapid analysis of data sources and identifies common patterns of inconsistent or missing data. For example, if the integration scenario is to build a common view of customer, a quick data profiling exercise can help determine which data fields have sufficient quality for matching purposes.
Second, leverage metadata. This probably goes without saying for mature integration teams, but the reality is that this obvious step is often overlooked. For example, many application packages such as SAP or Oracle provide a wealth of metadata which is directly accessible from PowerCenter. Any source or target database that the integration team has ever worked with (if they’ve been doing it right) is also available in the metadata repository and combined with PowerCenter, integration mappings can be generated almost at the push of a button.
Third, stub out complex elements in early iterations. This goes for either source or target databases by using generic tables initially or for complex business rules or table lookups. The stubs can gradually be replaced with the real tables and the actual callouts to external modules to perform complex transformations or data enrichment in subsequent iterations.
Fourth, automate the change-management, configuration-management and software-deployment tasks. Lean Integration suggests that you should build in small increments and deploy changes frequently. Traditional manual processes are not conducive for this approach. The task of moving integration applications from the development environment to test and then to production should essentially be a “push button” process. The Team Based Development option for PowerCenter is a key pre-requisite. With a bit of an investment to develop some custom routines and integrated workflows to make it seamless, the resultant benefits include:
- Provides a detailed history of all code-configurations (releases)
- The evolution of integration objects can be tracked and any release of an integration application can be reproduced (point in time recovery)
- Reliable code-promotion through environment without the inherent problems of manual errors
- Automated (push button) backup, restore and rollback capabilities
Fifth, use Mapping Architect for Visio. This is yet another best kept secret of PowerCenter. It’s not really a secret, but it just isn’t promoted as much as it could be and the documentation could be better. It’s also a bit of a misnomer since it’s really more of a tool for integration developers and many data architects that aren’t developers will find it hard to learn and work with. That said, for those teams that do take the effort to learn it and use it, it is a tremendous tool for creating re-usable designs. In essence, it is a PowerCenter code generator for common integration patterns. For example:
- Straight source to staging – this is a 1:1 copy of the data without any transformations and can be 100% automatically generated from Visio templates.
- Change Data Capture (CDC) – this pattern creates a snapshot table that stores the prior state, extracted data is joined with the snapshot table and columns are compared to find the changed data. This pattern typically may generate about 90% of the logic and the remaining 10% or so is then manually modified.
- Synchronization of CDC – this pattern involves synchronizing changes back into the snapshot table and is a common pattern that can often generate 100%.
- Warehouse update – this pattern is the final step where data in the staging tables are loaded into the warehouse. The dynamic lookup cannot be handled automatically, but about 80% of the mappings can be generated and then modified manually to include dynamic lookup and other expressions.
The list could go on an on, but hopefully the point is clear. Mapping Architect for Visio is a great tool for automatically generating complex integration quickly, easily and consistently – and it comes as a standard option with PowerCenter Standard Edition.
Sixth, use a standard platform. This also goes without saying, but it’s worth repeating. Much time can be wasted on projects just getting started if the team that is assembled comes from different backgrounds and each with their own favorite tools. If a standard platform is mandated including a standard configuration of a development environment, then it can easily save 2-4 weeks of a project start-up effort.
Finally, use a common project methodology. Once again people don’t often think about a defined methodology as being an “agile” method, but the fact is that once you have a defined sequence of steps, deliverables, communication methods, defined roles, common terminology, etc., then everything speeds up. Just consider the pit stop in a NASCAR race. The teams can change tires, refuel, wash windows, and even make some mechanical adjustments, in literally seconds. This comes from having a defined and repeatable methodology (and lots of practice).
Putting all seven practices to work results in an Integration Factory – a highly efficient and repeatable process for quickly building custom integrations using an assembly line experiences.