Category Archives: SOA
We discussed Big Data and Big Data integration last month, but the rise of Big Data and the systemic use of data integration approaches and technology continues to be a source of confusion. As with any evolution of technology, assumptions are being made that could get many enterprises into a great deal of trouble as they move to Big Data.
Case in point: The rise of big data gave many people the impression that data integration is not needed when implementing big data technology. The notion is, if we consolidate all of the data into a single cluster of servers, than the integration is systemic to the solution. Not the case.
As you may recall, we made many of the same mistakes around the rise of service oriented architecture (SOA). Don’t let history repeat itself with the rise of cloud computing. Data integration, if anything, becomes more important as new technology is layered within the enterprise.
Hadoop’s storage approach leverages a distributed file system that maps data wherever it sits in a cluster. This means that massive amounts of data reside in these clusters, and you can map and remap the data to any number of structures. Moreover, you’re able to work with both structured and unstructured data.
As covered in a recent Read Write article, the movement to Big Data does indeed come with built-in business value. “Hadoop, then, allows companies to store data much more cheaply. How much more cheaply? In 2012, Rainstor estimated that running a 75-node, 300TB Hadoop cluster would cost $1.05 million over three years. In 2008, Oracle sold a database with a little over half the storage (168TB) for $2.33 million – and that’s not including operating costs. Throw in the salary of an Oracle admin at around $95,000 per year, and you’re talking an operational cost of $2.62 million over three years – 2.5 times the cost, for just over half of the storage capacity.”
Thus, if these data points are indeed correct, Hadoop clearly enables companies to hold all of their data on a single cluster of servers. Moreover, this data really has no fixed structure. “Fixed assumptions don’t need to be made in advance. All data becomes equal and equally available, so business scenarios can be run with raw data at any time as needed, without limitation or assumption.”
While this process may look like data integration to some, the heavy lifting around supplying these clusters with data is always a data integration solution, leveraging the right enabling technology. Indeed, consider what’s required around the movement to Big Data systems additional stress and you’ll realize why strain is placed upon the data integration solution. A Big Data strategy that leverages Big Data technology increases, not decreases, the need for a solid data integration strategy and a sound data integration technology solution.
Big Data is a killer application that most enterprises should at least consider. The business strategic benefits are crystal clear, and the movement around finally being able to see and analyze all of your business data in real time is underway for most of the Global 2000 and the government. However, you won’t achieve these objectives without a sound approach to data integration, and a solid plan to leverage the right data integration technology.
Simplistic Approaches to Data Federation Solve (Only) Part of the Puzzle – We Need Data Virtualization (Part 2 of 3)
In my last post, I introduced the concept of data federation, which for now I would like to differentiate from data virtualization – a term that I’ll bring into focus in a bit. But first, we explored two issues: data accessibility and data latency. Within recent times, the sophistication of data accessibility services has matured greatly, to the point where one can somewhat abstract those accessibility services from the downstream consumer (or “reuser”) of data. (more…)
Apparently, everyone’s favorites words these days are “big data.” But just because some new tools and techniques promise the potential of absorbing and analyzing huge amounts of data from a variety of sources, it does not mean that installing Hadoop in your enterprise is going to automatically help you to get new insights from existing and “big data,” faster. (more…)
Sometimes when you want to sell SOA, you need to sell the concept and not the buzzword. Case in point, when I speak at a conference. If I talk about SOA patterns as a way to drive to a better architecture, I often see eyes begin to roll. However, if I say we’re looking to externalize services that will be meshed and re-meshed together to form business solutions, thus providing agility…the eyes light up. Funny thing is, I’m talking about the same thing. (more…)
Today, agility and timely visibility are critical to the business. No wonder CIO.com, states that business intelligence (BI) will be the top technology priority for CIOs in 2012. However, is your data architecture agile enough to handle these exacting demands?
In his blog Top 10 Business Intelligence Predictions For 2012, Boris Evelson of Forrester Research, Inc., states that traditional BI approaches often fall short for the two following reasons (among many others):
- BI hasn’t fully empowered information workers, who still largely depend on IT
- BI platforms, tools and applications aren’t agile enough (more…)
If you haven’t already, I think you should read The Forrester Wave™: Data Virtualization, Q1 2012. For several reasons – one, to truly understand the space, and two, to understand the critical capabilities required to be a solution that solves real data integration problems.
At the very outset, let’s clearly define Data Virtualization. Simply put, Data Virtualization is foundational to Data Integration. It enables fast and direct access to the critical data and reports that the business needs and trusts. It is not to be confused with simple, traditional Data Federation. Instead, think of it as a superset which must complement existing data architectures to support BI agility, MDM and SOA. (more…)
For too long, many enterprises have been attempting to sort through increasingly complex spaghetti architectures with point-to-point data integration. “They get to the point where when they want to introduce a new product or make a change, they have to touch 30 different systems,” says John Akred, data and platforms lead at Accenture Technology Labs. “That has real consequences in the marketplace for enterprises.”
John continued that Hadoop – an open-source software framework that enables applications to run across large arrays of nodes, accessing petabytes’ worth of data – will help organizations manage and scale up to the huge volumes of unstructured and semi-structured data now surging into organizations. I recently had the opportunity to join John, along with Julianna DeLua, Enterprise Solution Evangelist for Big Data from Informatica, for a discussion of Hadoop’s role in the emerging data as a platform paradigm. The session was the second session of the Hadoop Tuesdays Webinar series, sponsored by Informatica and Cloudera. (more…)
Data services, data services, data services. Do I sound like a broken record? Forgive me if I seem obsessed with the topic, but I truly believe that technology can change your enterprise, and allow IT to finally get a handle on data in the shortest amount of time.
The real value lies in SOA data services. These services allow enterprises to place an easy-to-configure layer between the source physical databases and those that wish to consume the data, either applications or humans. If this seems simple, why, you are right! It is. Why is it so simple? It’s because the complexity is hidden from you, including the access mechanisms to the physical data, the transformation of schemas from physical to abstract, and even the management of data quality and integrity.
So where is the value? There are three core points to consider here: (more…)
I was on the big data bandwagon before everyone began to jump on. The value is very clear. Simply put, it’s the ability to manage terabytes and terabytes of data as if it were just a small data set.
Big data is possible. We take a divide-and-conquer approach to processing queries and other data operations. The operations on data are divided up on many different servers, perhaps thousands of times, and then the results are recombined later when all of the operations are complete. This is technology seen in the most popular big data technology, Hadoop leveraging a map-reduce approach to manage data and operations on data. The larger database guys are picking up on this trend and moving in this direction as well. (more…)