SOA's Last Mile Part II: SOA's Hidden Data-Centric Pitfalls
Posted in Architecture, Benefits, Data Integration, Data Services, SOA by David Lyle |![]() |
This blog post is part two of an ongoing series highlighting the importance of data in a Service-Oriented Architecture (SOA). I look forward to hearing your thoughts and input on the subject.
Last posting, I ranted about the fact that ‘data’ is finally a topic of discussion with respect to SOA initiatives. SOA provides business services that at their deepest level interact with data. What are the data-centric pitfalls that SOA can run into?
First off, data has meaning. While an enterprise ‘meaning’ can be presented by the services to outside consumers of those services, someone has to deal with the fact that the foundational business systems may have different meanings for the underlying data. The ‘transformation’ is frequently very important and complex.
Secondly, the meaning of data can change over time as the business changes. These changes will impact the services and the ‘transformations’ mentioned above. And sometimes these changes will affect the users of the services.
Thirdly, the quality of data is not perfect. How do you deal with these imperfections?
Fourthly, the systems of record for data are not usually neatly compartmentalized. At most complex enterprises, there isn’t just one Order Management system, or one HR system. The concepts of Customer, Policy, Employee, etc., can be spread across many heterogeneous systems, with overlapping responsibilities.
I’m sure there’s a fifth, a sixth, etc. But let’s just elaborate on these four.
Let’s start with the meaning of data. For example, a business term like Customer Level defines the historical importance this customer has to a company. The different values for Customer Level may be ‘Gold’, ‘Silver’, and ‘Bronze’. Each of these values has its own business definition. For instance, a ‘Gold’ customer might be any customer that has an average balance of over $5000 with us and has been a customer for over 8 years. This definition of ‘Gold’ is part of the meaning of the data. And ‘Gold’ may not physically be stored in any database, legacy or ERP system. ‘Gold’ may be arrived at whenever someone looks up a customer. But whether stored or calculated, the instructions for how ‘Gold’ is found might be very complex. My example is only a simple one.
Let’s continue assuming that this meaning changes over time. Let’s say the definition of a ‘Gold’ customer now involves looking at their credit status. Consumers of this information may not be affected, but the logic for delivering or transforming this value within the service logic would change. Further, let’s say that the business decided to add a new level called ‘Platinum’. This could have a ripple effect not just on the logic for how Customer Level is calculated or delivered as a whole, but more importantly it can have an impact on the consumers of services that use this Customer Level information. Once again, I’m using a simple, crude example that undersells the complexity of this problem. The reality of data, its meaning, and how often it changes is a significant challenge.
And what if the quality of the data was bad and the Customer Level could not be determined? What if the stored value of Customer Level for Acme Corp. was ‘Wood’ or non-existent? This is reality. Frequent SOA implementations and vendor demos seem to ignore this point. Unit test and demo data is always perfect and clean. We all know that the reality of business data is not so pretty. How do you handle bad data and how do you work with the business to continually improve data quality?
Finally, rarely is enterprise data neatly compartmentalized into single systems with non-overlapping business coverage. SOA can hide the location of Customer, Policy, Product, Orders, etc. to service consumers, but someone had to figure out how to make multiple, heterogeneous Order Management systems appear as one system. Someone had to create a mighty complex ‘DNS for data’ system to hide this complexity.
Many factors are making this problem worse unless data is given direct attention:
• Rising complexity of data - IT organizations are now handling more data, in more formats, from more partners, and more systems than ever before.
• Increasing business demands - Timely information fuels all the business initiatives that SOA was designed to support. SOA must be able to make data available when and how the business needs it—in batch, near real-time and real-time modes.
• Shrinking IT budgets - Every business initiative spawns a new IT project. And each IT project requires data integration. SOAs do not help IT organizations re-use data integration logic and skills across these projects to keep IT costs in check if data integration logic is buried in java within service code.
• Proliferating data quality issues - As complexity and agility increases, data quality ‘entropy’ increases. The more data quality is ignored, the worse the problem gets.
In this posting, I’ve talked about the data-centric pitfalls in SOA without talking about the solutions. How can these data-centric pitfalls in SOA be seamlessly handled? Read all about it in the next post.
Next up “SOA’s Last Mile Part III: How to Address SOA’s Hidden Data-Centric Pitfalls Effectively”






No Comments, Comment or Ping
Reply to “SOA's Last Mile Part II: SOA's Hidden Data-Centric Pitfalls”