Category Archives: Data Warehousing
I told the head of the Enterprise Data Warehouse at a large bank, “you don’t have a data warehouse, you have 50,000 tables.” The issue is that the bank built the EDW without the necessary fundamentals in place. It wasn’t for lack of money; in fact the EDW was one of the biggest “money sinks” in the bank. The problem is that it was sitting on a sinking foundation.
One version of the truth isn’t achieved by putting all your data in one big system or one big database – that’s impossible. An enterprise data warehouse is indeed part of the solution, but it needs to be built on a solid foundation. What does a solid foundation look like? Here are five pillars for one version of the truth. (more…)
Treating Big Data Performance Woes with the Data Replication Cure Blog Series – Part 1
“Big Data” is all the rage – it is virtually impossible to check out any information management media channel, online resource, or community of interest without having your eyeballs bathed in articles touting the benefits and inevitability of what has come to be known as big data. I have watched this transformation over the past few years as data warehousing and business analytics appliances have entered the mainstream. Pure and simple: what was the bleeding edge of technology twenty years ago in performance computing is now commonplace, with Hadoop being the primary platform (or more accurately, programming environment) for developing big data analytics applications. (more…)
I just came back from MicroStrategy World. There were many conversations about social, mobile, cloud and big data. There was strong interest in cloud, clear adoption of mobile, and some big data adoption. eHarmony had a great presentation about how they handle big data with Informatica, and how they’re starting to use Hadoop with Informatica HParser running on Hadoop for processing JSON.
But that wasn’t the number one conversation. The one topic that everyone was interested in – and I talked to nearly 100 customers and partners over four days – was creating new reports faster, or Agile BI. (more…)
Today, agility and timely visibility are critical to the business. No wonder CIO.com, states that business intelligence (BI) will be the top technology priority for CIOs in 2012. However, is your data architecture agile enough to handle these exacting demands?
In his blog Top 10 Business Intelligence Predictions For 2012, Boris Evelson of Forrester Research, Inc., states that traditional BI approaches often fall short for the two following reasons (among many others):
- BI hasn’t fully empowered information workers, who still largely depend on IT
- BI platforms, tools and applications aren’t agile enough (more…)
Gartner hosted a webinar on January 10, 2012: Gartner Worldwide IT Spending Forecast. One of the topics covered was industry IT spend for 2012.
In covering that topic they made a point of saying that due to severe flooding in Thailand, they expect storage to become in short supply (as much as a 29% global shortfall) through the end of 2012. It is expected that the price of storage/GB will increase as a result and supplies will fall short of demand. They recommended finding alternatives to purchasing storage to keep costs down. (more…)
If you haven’t already, I think you should read The Forrester Wave™: Data Virtualization, Q1 2012. For several reasons – one, to truly understand the space, and two, to understand the critical capabilities required to be a solution that solves real data integration problems.
At the very outset, let’s clearly define Data Virtualization. Simply put, Data Virtualization is foundational to Data Integration. It enables fast and direct access to the critical data and reports that the business needs and trusts. It is not to be confused with simple, traditional Data Federation. Instead, think of it as a superset which must complement existing data architectures to support BI agility, MDM and SOA. (more…)
Data warehouses are applications– so why not manage them like one? In fact, data grows at a much faster rate in data warehouses, since they integrate date from multiple applications and cater to many different groups of users who need different types of analysis. Data warehouses also keep historical data for a long time, so data grows exponentially in these systems. The infrastructure costs in data warehouses also escalate quickly since analytical processing on large amounts of data requires big beefy boxes. Not to mention the software license and maintenance costs of such a large amount of data. Imagine how many backup media is required to backup tens to hundreds of terabytes of data warehouses on a regular basis. But do you really need to keep all that historical data in production?
One of the challenges of managing data growth in data warehouses is that it’s hard to determine which data is actually used, which data is no longer being used, or even if the data was ever used at all. Unlike transactional systems where the application logic determines when records are no longer being transacted upon, the usage of analytical data in data warehouses has no definite business rules. Age or seasonality may determine data usage in data warehouses, but business users are usually loath to let go of the availability of all that data at their fingertips. The only clear cut way to prove that some data is no longer being used in data warehouses is to monitor its usage.
The “Dodd-Frank Wall Street Reform and Consumer Protection Act” has recently been passed by the US federal government to regulate financial institutions. Per this legislation, there will be more “watchdog” agencies that will be auditing banks, lending and investment institutions to ensure compliance. As an example, there will be an Office of Financial Research within the Federal Treasury responsible for collecting and analyzing data. This legislation brings with it a higher risk of fines for non-compliance. (more…)