I count many SOA projects amongst my journeys. Within those projects are common patterns of SOA success, and common patterns of SOA failure.
The common patterns of failure are around projects that ignore the data, and typically work from the services or application behavior, back to the data. This means defining and designing services without good metadata context, which leads to constant redevelopment of those services as the project progresses and the data is better understood.
The core problem is that SOA architects consider themselves “service architects” and thus don’t focus on the underlying data that’s a huge part of the problem domain. They don’t consider the data integration scenarios, including data latency, types of data integration, including batch and real time, and data quality, and they don’t support flexibility with the data-to-service bindings. The result is a SOA that only supports a portion of the requirements, and perhaps not the underlying business processes at all. Thus, the SOA is a failure, and perhaps dead in the eyes of those sponsoring the project.
In contrast, common patterns of success are SOA projects that start with the data. The SOA architects understand the metadata that’s common to all participating systems; perhaps they adjust the data physically and/or through abstraction layers, and build services with a complete understanding of the data, including how data integration will take place.
The idea is that you build your SOA through the basic building blocks of data, getting the data understood and architected correctly, and then and only then, looking to build services on top of that data, and extend those services into configurable processes. Working with a good foundation of data means that you’re not looping back to the data, making corrections and adjustments as you build the services. Moreover, the databases are designed from the ground up as being optimized for services, and thus a SOA, and therefore performance is typically not an issue, nor is data quality and integrity.
So, which pattern do you want to emulate as you work your SOA project?
What’s most surprising is how logical this seems, but without the proper thought leadership out there, the number of failed SOA projects that can be traced back to issues with data is growing quickly. While thought leaders are sounding the alarm, the rank-and-file SOA project leader is not getting the data-oriented message.
It’s very simple considering that it’s just a matter of following a few logical steps.
First, have a complete semantic understanding of your problem domain. It could be a single system, or a collection of systems, but you need to understand the metadata in detail, and how all of the data is related at a semantic level. For example, how many places is “customer” data stored, and how do they all link into a single definition of “customer?”
Second, understand the physical structure of the databases, including formats and issues around database design. Now is a good time to fix them, either through altering the physical database or through abstraction.
Finally, understand the efficiencies in your data, including how the databases will effectively produce result sets from queries. Make any changes needed to improve the data integration mechanisms and approaches. Make sure to understand, in detail, the type of data integration that will be part of your SOA.
Once you get the data down, the services and processes are easy…trust me.







One Comment
I would like to suggest that even once designers get the data semantics down, that data services are frequently designed in a fixed business process manner that ultimately limits data service reuse. For example, a designer may provide a data service to get purchase orders by date. But this style of data service is brittle and does not enable a dynamic infrastructure where business processes and how data is accessed can be changed more frequently. Instead, it seems that data services, similar to SQL, should be generic such that how data is access can be determined at run time (granted you may want to cache certain denomalized views and this takes some set up time). However, IMHO once this generic data service stack is in place, then the difference between business process automation and data warehousing only becomes a matter of orchestration a small set of generic pub/sub and req/reply services that are implemented over legacy applications and databases.