Category Archives: Data Integration Platform
Maybe the word “death” is a bit strong, so let’s say “demise” instead. Recently I read an article in the Harvard Business Review around how Big Data and Data Scientists will rule the world of the 21st century corporation and how they have to operate for maximum value. The thing I found rather disturbing was that it takes a PhD – probably a few of them – in a variety of math areas to give executives the necessary insight to make better decisions ranging from what product to develop next to who to sell it to and where.
Don’t get me wrong – this is mixed news for any enterprise software firm helping businesses locate, acquire, contextually link, understand and distribute high-quality data. The existence of such a high-value role validates product development but it also limits adoption. It is also great news that data has finally gathered the attention it deserves. But I am starting to ask myself why it always takes individuals with a “one-in-a-million” skill set to add value. What happened to the democratization of software? Why is the design starting point for enterprise software not always similar to B2C applications, like an iPhone app, i.e. simpler is better? Why is it always such a gradual “Cold War” evolution instead of a near-instant French Revolution?
Why do development environments for Big Data not accommodate limited or existing skills but always accommodate the most complex scenarios? Well, the answer could be that the first customers will be very large, very complex organizations with super complex problems, which they were unable to solve so far. If analytical apps have become a self-service proposition for business users, data integration should be as well. So why does access to a lot of fast moving and diverse data require scarce PIG or Cassandra developers to get the data into an analyzable shape and a PhD to query and interpret patterns?
I realize new technologies start with a foundation and as they spread supply will attempt to catch up to create an equilibrium. However, this is about a problem, which has existed for decades in many industries, such as the oil & gas, telecommunication, public and retail sector. Whenever I talk to architects and business leaders in these industries, they chuckle at “Big Data” and tell me “yes, we got that – and by the way, we have been dealing with this reality for a long time”. By now I would have expected that the skill (cost) side of turning data into a meaningful insight would have been driven down more significantly.
Informatica has made a tremendous push in this regard with its “Map Once, Deploy Anywhere” paradigm. I cannot wait to see what’s next – and I just saw something recently that got me very excited. Why you ask? Because at some point I would like to have at least a business-super user pummel terabytes of transaction and interaction data into an environment (Hadoop cluster, in memory DB…) and massage it so that his self-created dashboard gets him/her where (s)he needs to go. This should include concepts like; “where is the data I need for this insight?’, “what is missing and how do I get to that piece in the best way?”, “how do I want it to look to share it?” All that is required should be a semi-experienced knowledge of Excel and PowerPoint to get your hands on advanced Big Data analytics. Don’t you think? Do you believe that this role will disappear as quickly as it has surfaced?
In a previous blog post, I wrote about when business “history” is reported via Business Intelligence (BI) systems, it’s usually too late to make a real difference. In this post, I’m going to talk about how business history becomes much more useful when combined operationally and in real time.
E. P. Thompson, a historian pointed out that all history is the history of unintended consequences. His idea / theory was that history is not always recorded in documents, but instead is ultimately derived from examining cultural meanings as well as the structures of society through hermeneutics (interpretation of texts) semiotics and in many forms and signs of the times, and concludes that history is created by people’s subjectivity and therefore is ultimately represented as they REALLY live.
The same can be extrapolated for businesses. However, the BI systems of today only capture a miniscule piece of the larger pie of knowledge representation that may be gained from things like meetings, videos, sales calls, anecdotal win / loss reports, shadow IT projects, 10Ks and Qs, even company blog posts – the point is; how can you better capture the essence of meaning and perhaps importance out of the everyday non-database events taking place in your company and its activities – in other words, how it REALLY operates.
One of the keys to figuring out how businesses really operate is identifying and utilizing those undocumented RULES that are usually underlying every business. Select company employees, often veterans, know these rules intuitively. If you watch them, and every company has them, they just have a knack for getting projects pushed through the system, or making customers happy, or diagnosing a problem in a short time and with little fanfare. They just know how things work and what needs to be done.
These rules have been, and still are difficult to quantify and apply or “Data-ify” if you will. Certain companies (and hopefully Informatica) will end up being major players in the race to datify these non-traditional rules and events, in addition to helping companies make sense out of big data in a whole new way. But in daydreaming about it, it’s not hard to imagine business systems that will eventually be able to understand the optimization rules of a business, accounting for possible unintended scenarios or consequences, and then apply them in the time when they are most needed. Anyhow, that’s the goal of a new generation of Operational Intelligence systems.
In my final post on the subject, I’ll explain how it works and business problems it solves (in a nutshell). And if I’ve managed to pique your curiosity and you want to hear about Operational Intelligence sooner, tune in to to a webinar we’re having TODAY at 10 AM PST. Here’s the link.
As covered by Loraine Lawson, “When it comes to data, the U.S. federal government is a bit of a glutton. Federal agencies manage on average 209 million records, or approximately 8.4 billion records for the entire federal government, according to Steve O’Keeffe, founder of the government IT network site, MeriTalk.”
Check out these stats, in a December 2013 MeriTalk survey of 100 federal records and information management professionals. Among the findings:
- Only 18 percent said their agency had made significant progress toward managing records and email in electronic format, and are ready to report.
- One in five federal records management professionals say they are “completely prepared” to handle the growing volume of government records.
- 92 percent say their agency “has a lot of work to do to meet the direction.”
- 46 percent say they do not believe or are unsure about whether the deadlines are realistic and obtainable.
- Three out of four say the Presidential Directive on Managing Government Records will enable “modern, high-quality records and information management.”
I’ve been working with the US government for years, and I can tell that these facts are pretty accurate. Indeed, the paper glut is killing productivity. Even the way they manage digital data needs a great deal of improvement.
The problem is that the issues are so massive that’s it’s difficult to get your arms around it. Just the DOD alone has hundreds of thousands of databases on-line, and most of them need to exchange data with other systems. Typically this is done using old fashion approaches, including “sneaker-net,” Federal Express, FTP, and creaky batching extracts and updates.
The “digital data diet,” as Loraine calls it, really needs to start with a core understanding of most of the data under management. That task alone will take years, but, at the same time, form an effective data integration strategy that considers the dozens of data integration strategies you likely formed in the past that did not work.
The path to better data management in the government is one where you have to map out a clear path from here to there. Moreover, you need to make sure you define some successes along the way. For example, the simple reduction of manual and paper processes by 5 or 10 percent would be a great start. It’s something that would save the tax payers billions in a short period of time.
Too many times the government gets too ambitious around data integration, and attempts to do too much in too short an amount of time. Repeat this pattern and you’ll find yourself running in quicksand, and really set yourself up for failure.
Data integration is game-changing technology. Indeed, the larger you are, the more game-changing it is. You can’t get much larger than the US government. Time to get to work.
The beauty of Vibe framework is that it can reduce IT costs as well as friction between IT and business. Let us take a typical data integration project - for example, creating a simple report — which involves taking data from multiple source systems including mainframes, AIX and load it into forecasting tool (Oracle) for analyzing future demand. The project involves a – Business analyst who gathers requirements from business users and comes up with functional requirements specification document, Solution architect understands the functional requirements specification document and comes up with a high level design (HLD) document. The design document is fed back to business users for approval and then to Designer to create a low level design (LLD) document, designer creates LLD from HLD provided by solution architect (which serves as pseudo code) and Developer who does coding, unit and system integration testing, supports UAT and deployment. In a typical outsourcing model the business analyst and solution architect role is filled by a person sitting at onsite and designers and developers will be from offshore.
Let us say the total effort for this project is100 person days (that’s three months to get only a simple report). Out of 100 person days, 10 person days will be devoted for requirements gathering, 20 for high level design, 50 for low level design + development + unit testing, 5 for system integration testing, 10 for UAT and 5 for deployment. An addition of 30 person days will have to be devoted for change requests as there will be some changes in business requirements(like real time reporting, moving data into cloud …) during this time frame. So, in three months, we will get something, with lots of people involved. Let’s hope it’s what we need.
Now let us examine the effort with the Vibe framework. With Vibe, business analyst will be able to come up with a mapping (80% reusable code) in the Informatica analyst tool which can be directly imported to multiple platforms like Informatica’s Powercenter, hadoop or the cloud. So the effort for HLD and LLD is almost nil and development effort will be an average of 20 person days which involves importing the mapping, changing the sources and target and unit testing. All other aspects remaining the same, now with Vibe there will a 45-50% of effort savings. The effort for change request will also be less as the process is more agile and the code can be imported to multiple platforms.
Expected Effort savings with Vibe framework
Now let us look at the cost perspective. In the traditional scenario, Business analyst will be taking 20% of the total cost, 20% by the solution architect, 20% by designer and the final 40% will be developer costs. With Vibe, the role of solution architect and designer is taken care of by business analyst himself. The developer cost is also less as the mapping is partially created. So there will be 40-50% of savings in cost (both onsite cost as well as the total cost) as well.
Expected Cost savings with Vibe framework
In a traditional scenario most of the delays and cost overruns are due to the differences in what is needed Vs. what is delivered. Business users complain that IT doesn’t understand their requirements and when you ask the IT folks, they say the requirements were not properly documented and the business “assumed” that the functionality would be delivered. This miscommunication makes the job of Business analyst very challenging and also means a head ache for the C-level executives because they can’t get the information they need when they need it. With Vibe the challenges faced by business analyst for requirements documentation is highly reduced as he or she can directly show to the business the expected output (in our case- the sample output data for forecast report). From a CEO perspective, it is a “win-win” as the friction between the business and IT almost goes away and there is significant reduction in IT expenditure. He, or she, can happily concentrate more on increasing the business growth. Thus Vibe keeps everyone happy.
Sounds interesting? Get in touch with us to learn more about Vibe.
There are three reasons why we haven’t achieved 1-click data management in a corporate data marketplace. First, it wasn’t a problem until recently. The signs that we really needed to manage data as an asset across the enterprise only appeared about 20 years ago. Prior to that, data management occurred at the application system level and we didn’t need a separate focus on Information Asset Management (IAM) at the enterprise level. The past five years however have a seen a strong growing awareness of the challenges and need for IAM; to a large degree driven by big-data opportunities and data privacy and confidentiality concerns. (more…)
Shhhh… RulePoint Programmer Hard at Work
End of year. Out with the old, in with the new. A time where everyone gets their ducks in order, clears the pipe and gets ready for the New Year. For R&D, one of the gating events driving the New Year is the annual sales kickoff event where we present to Sales the new features so they can better communicate a products’ road map and value to potential buyers. All well and good. But part of the process is to fill out a Q and A that explains the product “Value Prop” and they only gave us 4 lines. I think the answer also helps determine speaking slots and priority.
So here’s the question I had to fill out -
FOR SALES TO UNDERSTAND THE PRODUCT BETTER, WE ASK THAT YOU ANSWER THE FOLLOWING QUESTION:
WHAT IS THE PRODUCT VALUE PROPOSITION AND ARE THERE ANY SIGNIFICANT DEPLOYMENTS OR OTHER CUSTOMER EXPERIENCES YOU HAVE HAD THAT HAVE HELPED TO DEFINE THE PRODUCT OFFERING?
Here’s what I wrote:
Informatica RULEPOINT is a real-time integration and event processing software product that is deployed very innovatively by many businesses and vertical industries. Its value proposition is that it helps large enterprises discover important situations from their droves of data and events and then enables users to take timely action on discovered business opportunities as well as stop problems while or before they happen.
Here’s what I wanted to write:
RulePoint is scalable, low latency, flexible and extensible and was born in the pure and exotic wilds of the Amazon from the minds of natives that have never once spoken out loud – only programmed. RulePoint captures the essence of true wisdom of the greatest sages of yesteryear. It is the programming equivalent and captures what Esperanto linguistically tried to do but failed to accomplish.
As to high availability, (HA) there has never been anything in the history of software as available as RulePoint. Madonna’s availability only pales in comparison to RulePoint’s availability. We are talking 8 Nines cubed and then squared ( ). Oracle = Unavailable. IBM = Unavailable. Informatica RulePoint = Available.
RulePoint works hard, but plays hard too. When not solving those mission critical business problems, RulePoint creates Arias worthy of Grammy nominations. In the wee hours of the AM, RulePoint single-handedly prevented the outbreak and heartbreak of psoriasis in East Angola.
One of the little known benefits of RulePoint is its ability to train the trainer, coach the coach and play the player. Via chalk talks? No, RulePoint uses mind melds instead. Much more effective. RulePoint knows Chuck Norris. How do you think Chuck Norris became so famous in the first place? Yes, RulePoint. Greenpeace used RulePoint to save dozens of whales, 2 narwhal, a polar bear and a few collateral penguins (the bear was about to eat the penguins). RulePoint has been banned in 16 countries because it was TOO effective. “Veni, Vidi, RulePoint Vici” was Julius Caesar’s actual quote.
The inspiration for Gandalf in the Lord of the Rings? RulePoint. IT heads worldwide shudder with pride when they hear the name RulePoint mentioned and know that they acquired it. RulePoint is stirred but never shaken. RulePoint is used to train the Sherpas that help climbers reach the highest of heights. RulePoint cooks Minute rice in 20 seconds.
The running of the bulls in Pamplona every year - What do you think they are running from? Yes, RulePoint. RulePoint put the Vinyasa back into Yoga. In fact, RulePoint will eventually create a new derivative called Full Contact Vinyasa Yoga and it will eventually supplant gymnastics in the 2028 Summer Olympic games.
The laws of physics were disproved last year by RulePoint. RulePoint was drafted in the 9th round by the LA Lakers in the 90s, but opted instead to teach math to inner city youngsters. 5 years ago, RulePoint came up with an antivenin to the Black Mamba and has yet to ask for any form of recompense. RulePoint’s rules bend but never break. The stand-in for the “Mind” in the movie “A Beautiful Mind” was RulePoint.
RulePoint will define a new category for the Turing award and will name it the 2Turing Award. As a bonus, the 2Turing Award will then be modestly won by RulePoint and the whole category will be retired shortly thereafter. RulePoint is… tada… the most interesting software in the world.
But I didn’t get to write any of these true facts and product differentiators on the form. No room.
Hopefully I can still get a primo slot to talk about RulePoint.
And so from all the RulePoint and Emerging Technologies team, including sales and marketing, here’s hoping you have great holiday season and a Happy New Year!
For some of you “old timers” in the IT industry, you will remember the days when we used to hand-code our own Database Management Systems. Of course today we just go out and buy a general purpose DBMS like MySQL, Oracle, dBASE, or IBM DB2 to name a few. Or, if we wind the clock back further, there was a time when we used to write our own operating systems. Today it comes with the hardware or we can buy an OS like UNIX, iOS, Linux, OS X, Windows, and IBM z/OS. And I can still remember hand-coding network protocols in the days before TCP/IP became ubiquitous. Today we select from UDP, HTTP, POP3, FTP, IMAP, RMI, SOAP and others. (more…)
Unlike some of my friends, History was a subject in high school and college that I truly enjoyed. I particularly appreciated biographies of favorite historical figures because it painted a human face and gave meaning and color to the past. I also vowed at that time to navigate my life and future under the principle attributed to Harvard professor Jorge Agustín Nicolás Ruiz de Santayana y Borrás that goes, “Those who cannot remember the past are condemned to repeat it.”
So that’s a little ditty regarding my history regarding history.
Forwarding now to the present in which I have carved out my career in technology, and in particular, enterprise software, I’m afforded a great platform where I talk to lots of IT and business leaders. When I do, I usually ask them, “How are you implementing advanced projects that help the business become more agile or effective or opportunistically proactive?” They usually answer something along the lines of “this is the age and renaissance of data science and analytics” and then end up talking exclusively about their meat and potatoes business intelligence software projects and how 300 reports now run their business.
Then when I probe and hear their answer more in depth, I am once again reminded of THE history quote and think to myself there’s an amusing irony at play here. When I think about the Business Intelligence systems of today, most are designed to “remember” and report on the historical past through large data warehouses of a gazillion transactions, along with basic, but numerous shipping and billing histories and maybe assorted support records.
But when it comes right down to it, business intelligence “history” is still just that. Nothing is really learned and applied right when and where it counted – AND when it would have made all the difference had the company been able to react in time.
So, in essence, by using standalone BI systems as they are designed today, companies are indeed condemned to repeat what they have already learned because they are too late – so the same mistakes will be repeated again and again.
This means the challenge for BI is to reduce latency, measure the pertinent data / sensors / events, and get scalable – extremely scalable and flexible enough to handle the volume and variety of the forthcoming data onslaught.
There’s a part 2 to this story so keep an eye out for my next blog post History Repeats Itself (Part 2)
Last year around this time, I wrote a blog about how the death of ETL was exaggerated. Time to revisit the topic briefly given a couple of interesting events that happened in the past few weeks.
First, one of the companies who had a senior executive that had claimed that ETL and the data integration layer was dead came by to visit. It turns out that the bold executive who claimed that everything they were doing had been migrated to Hadoop is no longer with that company. In addition, the thing they wanted to talk to us about what how they can more effectively build out the data warehouse and pull in mainframe, that’s right, mainframe data. It seems that old data sources never die, and they don’t even just fade away either. In fact, very little of what this company was doing was actually happening on Hadoop. Like I noted in my last blog, Hadoop is a lot like teenage sex.
Second, I gave a talk at a trade show on how new companies like Informatica were going to fill the ease of use gap on top of Hadoop by providing tooling so less skilled developers could also take advantage of Hadoop (for more on this topic, please check back on my blog titled “Dinner with my French Neighbor” ) . After my talk, a gentleman in his late 20’s came up to me and told me that he used to work for Aster Data, which was subsequently bought by Teradata. He had recently left to join a new startup. He used to think that the data integration layer would die away because you could easily use something like Aster to handle both the analytics queries and the data integration. Then after Aster was acquired by Teradata, he got to see an Informatica PowerCenter mapping that brought in a number of data sources, cleaned and integrated the data before moving it into Teradata. He told me that he hadn’t realized how complex real customer environments were and that there was no way that they could have done all of that integration in Aster. This is pretty typical of people who are new to the data space or who are building out Hadoop based startups. They don’t have to deal with legacy environments so they have no idea how messy they are until they finally see them first hand.
Third and last, someone from a startup company that I had talked to last year which has a visual data preparation and analytics environment on top of Hadoop sent me an email after Strata. I wasn’t at Strata, but he got my email address from one of my employees. He wanted to talk about partnering with us because their customers need to be able to handle more sophisticated data integration jobs ( connecting, cleansing, integrating, transforming, parsing etc) before their users can make use of the data. Only last year, this same company said that they were competing with Informatica because underneath their visualization layer, they had basic data integration transformation tools. As it turns out, basic wasn’t anywhere near enough so they are back talking to us about a partnership.
The point is that just because we can now dump all of our data into Hadoop, doesn’t mean it is integrated. If you take 10 legacy data sources plus internet data and sensor data and so on, and just dump it into Hadoop, it doesn’t make it integrated. It just makes it collocated. So while “ETL” in the classic sense will definitely change, the idea that there won’t be a data integration layer that exists to simplify and manage the integration of all of the old and new sources of data is just silly. That layer will continue to exist, it just might use a variety of technologies, including Hadoop, underneath as a storage and processing engine.
Regardless, I am happy to see that more and more companies are realizing that today’s data world is actually getting more complicated, not less complicated. The result, data fragmentation is only getting worse, so the future for data integration is only looking brighter.