Talking to architects about analytics at a recent event, I kept hearing the familiar theme; data scientists are spending 80% of their time on “data wrangling” leaving only 20% for delivering the business insights that will drive the company’s innovation. It was clear to everybody that I spoke to that the situation will only worsen. The coming growth everybody sees in data volume and complexity, will only lengthen the time to value.
Gartner recently predicted that:
“by 2015, 50% of organizations will give up on managing growth and will redirect funds to improve classification and analytics.”
Some of the details of this study are interesting. In the end, many organizations are coming to two conclusions:
- It’s risky to delete data, so they keep it around as insurance.
- All data has potential business value, so more organizations are keeping it around for potential analytical purposes.
The other mega-trend here is that more and more organizations are looking to compete on analytics – and they need data to do it, both internal data and external data.
From an architect’s perspective, here are several observations:
- The floodgates are open and analytics is a top priority. Given that, the emphasis should be on architecting to manage the dramatic increases in both data quantity and data complexity rather than on trying to stop it.
- The immediate architectural priority has to be on simplifying and streamlining your current enterprise data architecture. Break down those data silos and standardize your enterprise data management tools and processes as much as possible. As discussed in other blogs, data integration is becoming the biggest bottleneck to business value delivery in your environment. Gartner has projected that “by 2018, more than half the cost of implementing new large systems will be spent on integration.” The more standardized your enterprise data management architecture is, the more efficient it will be.
- With each new data type, new data tool (Hive, Pig, etc.), and new data storage technology (Hadoop, NoSQL, etc.) ask first if your existing enterprise data management tools can handle the task before people go out and create a new “data silo” based on the cool, new technologies. Sometimes it will be necessary, but not always.
- The focus needs to be on speeding value delivery for the business. And the key bottleneck is highly likely to be your enterprise data architecture.
Rather than focusing on managing data growth, the priority should be on managing it in the most standardized and efficient way possible. It is time to think about enterprise data management as a function with standard processes, skills and tools (just like Finance, Marketing or Procurement.)
Several of our leading customers have built or are building a central “Data as a Service” platform within their organizations. This is a single, central place where all developers and analysts can go to get trustworthy data that is managed by IT through a standard architecture and served up for use by all.
For more information, see “The Big Big Data Workbook”
*Gartner Predicts 2015: Managing ‘Data Lakes’ of Unprecedented Enormity, December 2014 http://www.gartner.com/document/2934417#
The start of the year is a great time to refresh and take a new look at your capabilities, goals, and plans for your future-state architecture. That being said, you have to take into consideration that the most scarce resource in your architecture is probably your own personal time.
Looking forward, here are three things that I would recommend that every architect do. I realize that all three of these relate to data, but as I have said in the eBook, Think “Data First” to Drive Business Value, we believe that data is the key bottleneck in your enterprise architecture in terms of slowing the delivery of business initiatives in support of your organization’s business strategy.
So, here are the recommendations. None of these will cost you anything if you are a current Informatica PowerCenter customer. And #2 and #3 are free regardless. It is only a matter of your time:
1. Take a look at the current Informatica Cloud offering and in particular the templating capabilities.
Informatica Cloud is probably much more capable than you think. The standard templating functionality supports very complex use cases and does it all from a very easy to use, no-coding, user interface. It comes with a strong library of integration stubs that can be dragged & dropped into Microsoft Viseo to create complex integrations. Once the flow is designed in Viseo, it can be easily imported into Informatica Cloud and from there users have a Wizard-driven UI to do the final customization for sources, targets, mappings, transformations, filters, etc. It is all very powerful and easy to use.
- YouTube: Building Custom templates https://www.youtube.com/watch?v=yHmFkxov6bs
- 30 day free Informatica Cloud trial. http://more.informatica.com/en/cloud_trial/org?offer=30day-ICwebPage
Why This Matters to Architects
- You will see how easy it is for new groups to get going with fairly complex integrations.
- This is a great tool for departmental or new user use, and it will be completely compatible with the rest of your Informatica architecture – not another technology silo for you to manage.
- Any mapping created for Informatica on-premise can also run on the cloud version.
2. Download Informatica Rev and understand what it can do for your analysts and “data wranglers.”
Your data analysts are spending 80% of their time managing their data and only 20% on the actual analysis they are trying to provide. Informatica Rev is a great way to prepare your data before use in analytics tools such as Qlik, Tableau, and others.
With Informatica Rev, people who are not data experts can access, mashup, prototype and cleanse their data all in a User Interface that looks like a spreadsheet and requires no previous experience in data tools.
- For a free Informatica Rev download https://rev.informatica.com/
- Informatica Rev (Project Springbok) demo https://www.youtube.com/watch?v=0F_58bHKDDs
Why This Matters for Architects
- Your data analysts are going to use analytics tools with or without the help of IT. This enables you to help them while ensuring that they are managing their data well and optimizing their productivity.
- This tool will also enable them to share their “data recipes” and for IT to be involved in how they access and use the organization’s data.
3. Look at the new features in PowerCenter 9.6. First, upgrade to 9.6 if you haven’t already, and particularly take a good look at these new capabilities that are bundled in every version. Many people we talk to have 9.6 but don’t realize the power of what they already own.
- Profiling: Discover and analyze your data quickly. Find relationships and data issues.
- Data Services: This presents any JDBC or ODBC repository as a logical data object. From there you can rapidly prototype new applications using these logical objects without worrying about the complexities of the underlying repositories. It can also do data cleansing on the fly.
- Webinar: Great Data by Design. https://www.brighttalk.com/webcast/10477/104939
- PowerCenter 9.6 deep dive demo https://www.brighttalk.com/webcast/10477/110535
Why This Matters for Architects
- The key challenge for IT and for Architects is to be able to deliver at the “speed of business.” These tools can dramatically improve the productivity of your team and speed the delivery of projects for your business “customers.”
Taking the time to understand what these tools can do in terms of increasing the productivity of your IT team and enabling your end users to self-service will make you a better business partner overall and increase your influence across the organization. Have a great year!
The current trend is that new types of data and new types of physical storage are changing all of that.
When I got back from my trip I found a TDWI white paper by Philip Russom that describes the situation very well in a white paper detailing his research on this subject; Evolving Data Warehouse Architectures in the Age of Big Data.
From an enterprise data architecture and management point of view, this is a very interesting paper.
- First the DW architectures are getting complex because of all the new physical storage options available
- Hadoop – very large scale and inexpensive
- NoSQL DBMS – beyond tabular data
- Columnar DBMS – very fast seek time
- DW Appliances – very fast / very expensive
- What is driving these changes is the rapidly-increasing complexity of data. Data volume has captured the imagination of the press, but it is really the rising complexity of the data types that is going to challenge architects.
- But, here is what really jumped out at me. When they asked the people in their survey what are the important components of their data warehouse architecture, the answer came back; Standards and rules. Specifically, they meant how data is modeled, how data quality metrics are created, metadata requirements, interfaces for data integration, etc.
The conclusion for me, from this part of the survey, was that business strategy is requiring more complex data for better analyses (example: realtime response or proactive recommendations) and business processes (example: advanced customer service). This, in turn, is driving IT to look into more advanced technology to deal with different data types and different use cases for the data. And finally, the way they are dealing with the exploding complexity was through standards, particularly data standards. If you are dealing with increasing complexity and have to do it better, faster and cheaper, they only way you are going to survive is by standardizing as much as reasonably makes sense. But, not a bit more.
If you think about it, it is good advice. Get your data standards in place first. It is the best way to manage the data and technology complexity. …And a chance to be the driver rather than the driven.
I highly recommend reading this white paper. There is far more in it than I can cover here. There is also a Philip Russom webinar on DW Architecture that I recommend.
- Home Hubs from Google, Samsung, and Apple (who did not attend the show but still had a significant impact).
- Home Hub Ecosystems providing interoperability with cars, door locks, and household appliances.
- Autonomous cars, and intelligent cars
- Wearable devices such as smart watches and jewelry.
- Drones that take pictures and intelligently avoid obstacles. …Including people trying to block them. There is a bit of a creepy factor here!
- The next generation of 3D printers.
- And the intelligent baby pacifier. The idea is that it takes the baby’s temperature, but I think the sleeper hit feature on this product is the ability to locate it using GPS and a smart phone. How much money would you pay to get your kid to go to sleep when it is time to do so?
Digital Strategies Are Gaining Momentum
There is no escaping the fact that the vast majority of companies out there have active digital strategies, and not just in the consumer space. The question is: Are you going to be the disruptor or the disruptee? Gartner offered an interesting prediction here:
“By 2017, 60% of global enterprise organizations will execute on at least one revolutionary and currently unimaginable business transformation effort.”
It is clear from looking at CES, that a lot of these products are “experiments” that will ultimately fail. But focusing too much on that fact is to risk overlooking the profound changes taking place that will shake out industries and allow competitors to jump previously impassible barriers to entry.
IDC predicted that the Internet of Things market would be over $7 Trillion by the year 2020. We can all argue about the exact number, but something major is clearly happening here. …And it’s big.
Is Your Organization Ready?
A study by Gartner found that 52% of CEOs and executives say they have a digital strategy. The problem is that 80% of them say that they will “need adaptation and learning to be effective in the new world.” Supporting a new “Internet of Things” or connected device product may require new business models, new business processes, new business partners, new software applications, and require the collection and management of entirely new types of data. Simply standing up a new ERP system or moving to a cloud application will not help your organization to deal with the new business models and data complexity.
Architect’s Call to Action
Now is the time (good New Year’s resolution!) to get proactive on your digital strategy. Your CIO is most likely deeply engaged with her business counterparts to define a digital strategy for the organization. Now is the time to be proactive in terms of recommending the IT architecture that will enable them to deliver on that strategy – and a roadmap to get to the future state architecture.
Key Requirements for a Digital-ready Architecture
Digital strategy and products are all about data, so I am going to be very data-focused here. Here are some of the key requirements:
- First, it must be designed for speed. How fast? Your architecture has to enable IT to move at the speed of business, whatever that requires. Consider the speed at which companies like Google, Amazon and Facebook are making IT changes.
- It has to explicitly directly link the business strategy to the underlying business models, processes, systems and technology.
- Data from any new source, inside or outside your organization, has to be on-boarded quickly and in a way that it is immediately discoverable and available to all IT and business users.
- Ongoing data quality management and Data Governance must be built into the architecture. Point product solutions cannot solve these problems. It has to be pervasive.
- Data security also has to be pervasive for the same reasons.
- It must include business self-service. That is the only way that IT is going to be able to meet the needs of business users and scale to the demands of the changes required by digital strategy.
For a webinar on connecting business strategy to the architecture of business transformation see; Next-Gen Architecture: A “Business First” Approach for Agile Architecture. With John Schmidt of Informatica and Art Caston, founder of Proact.
For next-generation thinking on enterprise data architectures see; Think “Data First” to Drive Business Value
For more on business self-service for data preparation and a free software download.
A couple comments on the importance of integration platforms like Informatica in an EDW/Hadoop environment.
- Hadoop does mean you can do some quick and inexpensive exploratory analysis with little or no ETL. The issue is that it will not perform at the level you need to take it to production. As the webinar points out, applying some structure to the data with columnar files (not RDBMS) will dramatically speed up query performance.
- The other thing that makes an integration platform more important than ever is the explosion of data complexity. As Dr. Kimball put it:
“Integration is even more important these days because you are looking at all sorts of data sources coming in from all sorts of directions.”
To perform interesting analyses, you are going to have to be able to join data with different formats and different semantic meaning. And that is going to require integration tools.
- Thirdly, if you are going to put this data into production, you will want to incorporate data cleansing, metadata management, and possibly formal data governance to ensure that your data is trustworthy, auditable, and has business context. There is no point in serving up bad data quickly and inexpensively. The result will be poor business decisions and flawed analyses.
For Data Warehouse Architects
The challenge is to deliver actionable content from the exploding amount of data available. You will need to be constantly scanning for new sources of data and looking for ways to quickly and efficiently deliver that to the point of analysis.
For Enterprise Architects
The challenge with adding Big Data to Your EDW Architecture is to define and drive a coherent enterprise data architecture across your organization that standardizes people, processes, and tools to deliver clean and secure data in the most efficient way possible. It will also be important to automate as much as possible to offload routine tasks from the IT staff. The key to that automation will be the effective use of metadata across the entire environment to not only understand the data itself, but how it is used, by whom, and for what business purpose. Once you have done that, then it will become possible to build intelligence into the environment.
For more on Informatica’s vision for an Intelligent Data Platform and how this fits into your enterprise data architecture see Think “Data First” to Drive Business Value
The white paper, “The Great Rethink: Building a Highly Responsive and Evolving Data Integration Architecture” by Claudia Imhoff and Joe McKendrick provides an interesting view of what such an architecture might look like. The paper describes how to move from ad hoc Data Integration to an Enterprise Data Architecture. The paper also describes an approach towards building architectural maturity and a next-generation enterprise data architecture that helps organizations to be more competitive.
Organizations that look to compete based on their data are searching for ways to design an architecture that:
- On-boards new data quickly
- Delivers clean and trustworthy data
- Delivers data at the speed required of the business
- Ensures that data is handled in secure way
- Is flexible enough to incorporate new data types and new technology
- Enables end user self-service
- Speeds up the speed of business value delivery for an organization
In my previous blog, Digital Strategy and Architecture, we discussed the demands that digital strategies are putting on enterprise data architecture in particular. Add to that the additional stress from business initiatives such as:
- Supporting new mobile applications
- Moving IT applications to the cloud – which significantly increases data management complexity
- Dealing with external data. One recent study estimates that a full 25% of the data being managed by the average organization is external data.
- Next-generation analytics and predictive analytics with Hadoop and No SQL
- Integrating analytics with applications
- Event-driven architectures and projects
- The list goes on…
The point here is that most people are unlikely to be funded to build an enterprise data architecture from scratch that can meet all these needs. A pragmatic approach would be to build out your future state architecture in each new strategic business initiative that is implemented. The real challenge of being an enterprise architect is ensuring that all of the new work does indeed add up to a coherent architecture as it gets implemented.
The “Great Rethink” white paper describes a practical approach to achieving an agile and responsive future state enterprise data architecture that will support your strategic business initiatives. It also describes a high level data integration architecture and the building blocks to achieving that architecture. This is highly recommended reading.
Also, you might recall that Informatica sponsored the Informatica Architect’s Challenge this year to design an enterprise-wide data architecture of the future. The contest has closed and we have a winner. See the site for details, Informatica Architect Challenge .
What is digitization?
It can take many forms. Here are a few types of digitization of business and examples:
|Products that add digital components||Sports equipment with sensors for immediate feedback|
|Products sold through digital channels||Conde Nast magazines|
|“Solutions” that are assembled and delivered in digital channels||USAA Insurance|
|Products that are entirely digital||Apple iTunes, eSurance, PayPal, Google|
|Companies monetizing their data||Healthcare clinical data|
The really interesting thing about digitization that you can see from some of the examples above is that it enables new competition to enter your space and competitors to leap industry boundaries. The concept of “barriers to entry” itself is eroding.
The Impact of Digitization on IT
Some interesting facts from MIT CISR’s research with Boards of Directors on digitization jumped out at me:
- Board members estimate that 32% of company’s revenues are under threat from digital disruption. This is a really stunning number when you think about it.
- Half of Board members believe that their board’s ability to oversee the strategic use of IT is “less than effective.”
- 26% of Boards hired consultants to evaluate major projects or the IT unit.
- 60% of Boards want to spend more time on digital issues next year.
The Impact of Digitization for Architects?
It boils down to two things:
- Architects need to deliver a digital platform to enable business agility in a time of increasing competition and disruption. This includes standardization around business processes, data, and the platform.
- Architects need to get more proactive in the strategy process for their organizations both in terms of the platforms and architecture and in terms of a general understanding of the challenges and opportunities that arise from digital disruption.
For more on enterprise data architecture, best practices and reference architectures see the eBook: Think “Data First” to Drive Business Value
We are way past the point where the architecture needs to be aligned with business goals and value delivery. That is necessary but no longer sufficient. We are now at the point where architecture needs to be central to the creation of an organization’s strategy process. Not to get hyperbolic, but anything less is risky for your career.
The Challenge: Digitization
I just came back from the MIT Center for Information Systems Research (CISR) research forum. One of the leading topics was digitization and how every business is becoming digitized. To those in the High Tech industry, this may be an “of course” topic, but to most other industries it is a wrenching change. Even those who are comfortable with the idea of digitization risk taking this too lightly.
The fact is that most products and services will have a digital component to them in the near future and an increasing number of products and services will be entirely digital. The fact is that digitization and the technologies that enable it are going to bring about a period of increased disruption. This will mean:
- New competitors. Examples: autonomous cars, sports equipment with embedded sensors that provide feedback, personal assistant fully capable of making decisions and taking action. Gartner is predicting that almost everything over $100 will have a sensor by the turn of the decade.
- New competitors jumping across industry boundaries. Examples: Apple iTunes and Google cars to name a few.
Why Architects Are Important
Architects are in a unique position to not only understand the technology trends driving this disruption, but they also to know how to leverage these trends to drive business value within their organizations. The very best architects are going to be those who are deeply involved in defining the organization strategy, not just figuring out how to implement it.
Evidence of Change
Many architects and CIOs currently report very little interest from upper management in IT. That is about to change, and quickly. At the MIT CISR forum I attended last week, they presented research around this area that is very telling:
- Half of Board of Directors members believe that their board’s ability to oversee the strategic use of IT is “less than effective.”
- 26% of Boards hired consultants to evaluate major projects or the IT unit.
- 60% of Boards want to spend more time on digital issues next year.
- Board members estimate that 32% of their company’s revenues are under threat from digital disruption.
That last bullet is the really interesting piece of research. 32% is a huge impact.
The Role of Data in Digitization
Digitization by its very nature is all about data. The winners in this space will be those that can manage and deliver relevant data the quickest. The question for architects is this: Do you have the architecture and agility to take advantage of the coming disruptions and opportunities? Are you actively advising your organization on how to leverage them? As we have documented in many previous blogs, many organizations are poorly positioned to manage their data as a discoverable and easily sharable asset. This will essential for:
- Delivering business initiatives and showing value faster (agility).
- Enabling business self-service so that IT is not the bottleneck in new analyses and decisions.
All of this requires new thinking around enterprise data architecture. For fresh thinking on this subject see Thinking “Data First” to Drive Business Value.
This got me thinking: What is the biggest bottleneck in the delivery of business value today? I know I look at things from a data perspective, but data is the biggest bottleneck. Consider this prediction from Gartner:
“Gartner predicts organizations will spend one-third more on app integration in 2016 than they did in 2013. What’s more, by 2018, more than half the cost of implementing new large systems will be spent on integration. “
When we talk about application integration, we’re talking about moving data, synchronizing data, cleansing, data, transforming data, testing data. The question for architects and senior management is this: Do you have the Data Foundation for Execution you need to drive the business results you require to compete? The answer, unfortunately, for most companies is; No.
All too often data management is an add-on to larger application-based projects. The result is unconnected and non-interoperable islands of data across the organization. That simply is not going to work in the coming competitive environment. Here are a couple of quick examples:
- Many companies are looking to compete on their use of analytics. That requires collecting, managing, and analyzing data from multiple internal and external sources.
- Many companies are focusing on a better customer experience to drive their business. This again requires data from many internal sources, plus social, mobile and location-based data to be effective.
When I talk to architects about the business risks of not having a shared data architecture, and common tools and practices for enterprise data management, they “get” the problem. So why aren’t they addressing it? The issue is that they find that they are only funded to do the project they are working on and are dealing with very demanding timeframe requirements. They have no funding or mandate to solve the larger enterprise data management problem, which is getting more complex and brittle with each new un-connected project or initiative that is added to the pile.
Studies such as “The Data Directive” by The Economist show that organizations that actively manage their data are more successful. But, if that is the desired future state, how do you get there?
Changing an organization to look at data as the fuel that drives strategy takes hard work and leadership. It also takes a strong enterprise data architecture vision and strategy. For fresh thinking on the subject of building a data foundation for execution, see “Think Data-First to Drive Business Value” from Informatica.
* By the way, Informatica is proud to announce that we are now a sponsor of the MIT Center for Information Systems Research.
Adrian gathered experts and built workgroups to dig into the issue and do root cause analysis. The workgroups came back with some pretty surprising results.
- Most people expected that “incorrect data” (missing, out of date, incomplete, or wrong data) would be the main problem. What they found was that this was only #5 on the list of issues.
- The #1 issue was “Too much data.” People working with the data could not find the data they needed because there was too much data available, and it was hard to figure out which was the data they needed.
- The #2 issue was that people did not know the meaning of data. And because people had different interpretations of the data, the often produced analyses with conflicting results. For example, “claims paid date” might mean the date the claim was approved, the date the check was cut or the date the check cleared. These different interpretations resulted in significantly different numbers.
- In third place was the difficulty in accessing the data. Their environment was a forest of interfaces, access methods and security policies. Some were documented and some not.
In one of the workgroups, a senior manager put the problem in a larger business context;
“Not being able to leverage the data correctly allows competitors to break ground in new areas before we do. Our data in my opinion is the ‘MOST’ important element for our organization.”
What started as a relatively straightforward data quality project became a more comprehensive enterprise data management initiative that could literally change the entire organization. By the project’s end, Adrian found himself leading the data strategy of the organization.
This kind of story is happening with increasing frequency across all industries as all businesses become more digital, the quantity and complexity of data grows, and the opportunities to offer differentiated services based on data grow. We are entering an era of data-fueled organizations where the competitive advantage will go to those who use their data ecosystem better than their competitors.
Gartner is predicting that we are entering an era of increased technology disruption. Organizations that focus on data as their competitive edge will have the advantage. It has become clear that a strong enterprise data architecture is central to the strategy of any industry-leading organization.
For more future-thinking on the subject of enterprise data management and data architecure see Think ‘Data First” to Drive Business Value