In my previous post To Successfully Service-Orient, Data-Orient First!, I shared the input I received from architects and IT managers, to serve as a handy check-list for ensuring a solid foundation for success in service-oriented infrastructures.
The following are the data-orientation capabilities they recommended as a first step in successfully service-orienting an infrastructure:
- Easy access of all relevant data, including new or rapidly changing data sources
- Seamless processing of data as batch, change data capture or real-time, including handling large volumes of large data sets
- Proactive identification and resolution of data inaccuracies
- Application of complex data transformations on the data
- Delivery of data, exactly when it is needed, as a standards-based data service
We also discussed scenarios across industry verticals where there was a compelling need for timely and trusted data delivered as a service. However, what we did not discuss was how to data-orient a service-oriented infrastructure. In order to bring the best real-world information to you, I went back to the source – the same architects and IT managers who deal with complex data-centric issues on a day-to-day basis.
The recommendations were rather prescriptive in nature, centered on solving the problem holistically rather than looking for quick and dirty solutions, as should be the case. So let’s take a look at each of these recommendations and I will leave it to you to deduce what the effective solution should look like…
Recommendation #1: Start with a data integration platform that enables universal access to all data sources, be it structured, semi-structured, unstructured, cloud, partner, master, Web services…whatever it is. Also consider if you need to quickly on-board new data sources, or take advantage of data that changes rapidly. The bottom line is that there needs to be a way to include all the data you can possibly get to, in the fastest, secure and most reliable manner possible.
Recommendation #2: Make sure that the platform you choose to use can effectively support any latency of data processing, be it batch, near real-time or change data capture and real-time. Why? This is because in the real-world, data is processed at different latencies at different points in the enterprise. Without a single environment that can address all the nuances involved you will be forced to use and maintain a tangle of integration technologies.
Recommendation #3: Understand that large data volumes are different from large volumes of large data sets and plan accordingly. Technologies that handle large volumes of small messages typically do not do a good job of handling large volumes of large data sets. Bear in mind that a technology that handles large volumes of small messages has its own place in an SOA, so complement it by a technology that processes large volumes of large data sets efficiently.
Recommendation #4: Place a value on the importance of the trustworthiness of the data that is consumed by your applications and business processes. If you feel that it is inefficient and expensive to find out after-the-fact about inconsistencies and inaccuracies across all your data, then you definitely need a more proactive approach. Look for a platform that can provide integrated data profiling to proactively identify issues and also fix these issues, regardless of the complexity.
Recommendation #5: Make a list of all the complex transformations that you need to do on all your enterprise data. Any integration technology can typically handle simple format conversions, however, if you have the need for more complex transformations such as aggregation, joining, lookup, etc., or need to enable structure conversions, industry format conversions or even data masking or obfuscation of specific portions of the data, you probably need something more sophisticated.
Recommendation #6: Take a hard look at all your consuming applications. Take an inventory of the way each of these applications consume the data. Is it through Web services, or do they only understand SQL? Ideally, this should not matter to the data integration platform you choose. The data integration platform should be able to deliver timely and trusted data, in exactly the way it is needed by each application.
Recommendation #7: And finally, make sure that your applications are insulated from the underlying data sources. What this means is that if the underlying data sources change, you do not have to re-build the integrations. A solid data integration foundation would provide the necessary standards-based data abstraction to all the underlying data, so that, if and when things change in the data layer, the applications are adequately insulated.
Additionally, what I did note was a common thread across all the above suggestions, that there needs to be a single, integrated platform that can deliver all the capabilities outlined. This makes sense, as all these recommendations were made with time and cost savings in mind. If a separate technology were to be employed for each of these capabilities, the very basis for employing a service-oriented approach would be compromised, which would be to enable agility through simplicity and flexibility.
Oh, I almost forgot. There is one last prescription that the architects shared – probably one of the most important from an architect’s point of view – that the platform must support and drive the reuse of data integration logic. More on this highly important topic, in my next blog posting.