Tag Archives: backup
While CIOs are urged to rethink of backup strategies following warnings from leading analysts that companies are wasting billions on unnecessary storage, consultants and IT solution vendors are selling “Big Data” narratives to these CIOs as a storage optimization strategy.
What a CIO must do is ask:
Do you think a Backup Strategy is same as a Big Data strategy?
Is your MO – “I must invest in Big Data because my competitor is”?
Do you think Big Data and “data analysis” are synonyms?
Most companies invest very little in their storage technologies, while spending on server and network technologies primarily for backup. Further, the most common mistake businesses make is to fail to update their backup policies. It is not unusual for companies to be using backup policies that are years or even decades old, which do not discriminate between business-critical files and the personal music files of employees.
Web giants like Facebook and Yahoo generally aren’t dealing with Big Data. They run their own giant, in-house “clusters” – collections of powerful servers – for crunching data. But, it appears that those clusters are unnecessary for many of the tasks which they’re handed. In the case of Facebook, most of the jobs engineers ask their clusters to perform are in the “megabyte to gigabyte” range, which means they could easily be handled on a single computer – even a laptop.
The necessity of breaking problems into many small parts, and processing each on a large array of computers, characterizes classic Big Data problems like Google’s need to compute the rank of every single web page on the planet.
In, Nobody ever got fired for buying a cluster, Microsoft Research points out that a lot of the problems solved by engineers at even the most data-hungry firms don’t need to be run on clusters. Why is that a problem? It is because, there are vast classes of problems for which these clusters are relatively inefficient, or a very inappropriate, solution.
Here is an example of a post exhorting readers to “Incorporate Big Data Into Your Small Business” that is about a quantity of data that probably wouldn’t strain Google Docs, much less Excel on a single laptop. In other words, most businesses are in dealing with small data. It’s very important stuff but it has little connection to the big kind.
Let us lose the habit of putting “big” in front of data to make it sound important. After all, supersizing your data, just because you can, is going to cost you a lot more and may yield a lot less.
So what is it? Big Data, small Data, or Smart Data?
Gregor Mendel uncovered the secrets of genetic inheritance with just enough data to fill a notebook. The important thing is gathering the right data, not gathering some arbitrary quantity of it.
Businesses retain information in an Enterprise data archiving either for compliance – adhere to data retention regulations – or because business users are afraid to let go of data they are used to having access to. Many IT have told us they retain data in archives because they are looking to cut infrastructure costs and do not have retention requirements clearly articulated from the business. As a result, enterprise data archiving has morphed into serving multiple purposes for IT –they can eliminate costs associated with maintaining aging data in production applications, allow business users to access the information on demand, all while adhering to some – if any known or defined – retention policies. (more…)
Data volumes are exploding. We see it all around us. The problem is that too much data can have a very negative impact on user productivity. Think about how long it takes to sift through emails after returning from vacation? Consider how long it takes to complete a purchase on an Ecommerce sight on Black Friday? The more data, the longer any of these processes take and the more time spent combing through more and more data. Informatica has been successfully working with Symantec and our customers through our partnership to help them find ways to control the impact of ‘too much data’. We are helping them to define projects that improve their ability to meet SLAs and application performance, reduce costs and mitigate any compliance risks – all while IT budgets remain relatively flat. (more…)
In a recent InformationWeek blog, “Big Data A Big Backup Challenge”, George Crump aptly pointed out the problems of backing up big data and outlined some best practices that should be applied to address them, including:
- Identifying which data can be re-derived and therefore doesn’t need to be backed up
- Eliminating redundancy, file de-duplication, and applying data compression
- Using storage tiering and the combination of online disk and tapes to reduce storage cost and optimize performance (more…)
Lean Data Management is a new approach to managing your data growth. It uses the “Lean” concept that originated with Toyota car manufacturing in the 1990’s. The “Lean” concept is based on maximizing efficiency, eliminating waste and providing more value to the customer. (See Informatica’s lean integration solutions as well as John Schmidt’s 10Weeks to Lean Integration blog series.)
As technology has evolved, industries consolidated, and corporations have grown, these organizations are faced with explosive data volumes called “Big Data”. Big Data is all the different types of data that are supported by IT organizations. Applying the “Lean” concept to managing application data will help you reduce the size of your Big Data by archiving live production databases, subsetting non-production databases and archiving/retiring legacy and redundant applications. Informatica’s Lean Data Management approach to reducing Big Data is an effective, comprehensive approach to addressing the challenges created by Big Data. It’s time to Make Big Data Small with Lean Data Management.
Here are the top 5 benefits for Making Big Data Small: (more…)
Earlier this month, during the 2011 Symantec Vision Conference (Las Vegas, NV), Symantec and Informatica announced an exciting new partnership to help customers rein in data growth and ensure compliance across all enterprise data. Symantec will resell Informatica Data Archive, Informatica’s database archiving solution to Enterprise Vault and Netbackup customers. By combining database archiving with Symantec’s market leading data protection, email and file archiving solutions, customers can now implement a universal approach to data archiving that spans structured and unstructured data. (more…)
In one of my earlier blogs, I wrote about why you still need database archiving, when you already partition your database. On a similar vein, many people also ask me why you still need to archive when you already have database compression to reduce your storage capacity and cost. The benefits of archiving, which you can’t achieve with just compression and/or partitioning are still the same:
- Archiving allows you to completely move data volumes out of the production system to improve response time and reduce infrastructure costs. Why keep unused data, even if compressed, on high cost server infrastructure when you don’t need to? Why add overhead to query processing when you can remove the data from being processed at all?
- Avoid server and software license upgrades. By removing inactive data from the database, you no longer require as much processing power and you can keep your existing server without having to add CPU cores and additional licenses for your database and application. This further eliminates costs.
- Reduce overall administration and maintenance costs. If you still keep unused data around in your production system, you still need to back it up, replicate it for high availability, clone it for non-production copies, recover it in the event of a disaster, upgrade it, organize and partition it, and consider it as part of your performance tuning strategy. Yes, it will take less time to backup, copy, restore, etc., since the data is compressed and is smaller, but why even include that data as part of production maintenance activities at all, if it’s infrequently used?
- Remove the multiplier effect. The cost of additional data volume in production systems is multiplied when you consider how many copies you have of that production data in mirrors, backups, clones, non-production systems, and reporting warehouses. The size multiplier is less since the data is compressed, but it’s still more wasted capacity in multiple locations. Not to mention the additional server, software license, and maintenance costs associated with the additional volumes in those multiple copies. So it’s best to just remove that data size at the source.
- Ensure compliance by enforcing retention and disposition policies. As I discussed in my previous blog on the difference between archiving and backup, archiving is the solution for long term data retention. Archiving solutions, such as Informatica Data Archive, have integration points with records management software or provide built-in retention management to enforce the retention of data for a specified period based on policies. During that period, the immutability and authenticity of the archived data is ensured, and when the retention period expires, records are automatically purged after the appropriate review and approval process. Regulated data needs to be retained long enough to comply with regulations, but keeping data for too long can also become a legal liability. So it’s important that expired records are purged in a timely manner. Just keeping data in production databases indefinitely doesn’t help you to reduce your compliance and legal risks.
Implementing enterprise application and database archiving is just plain best practices. The best way to improve performance and reduce infrastructure and maintenance costs is to reduce the data volume in your production systems. Why increase overhead when you don’t have to? Today’s archiving solutions allow you to maintain easy access to the data after archival, so there is no reason to keep data around just for the sake of accessibility. By moving inactive but regulated data to a central archival store, you can uniformly enforce retention policies. At the same time, you can reduce the time and cost of eDiscovery by making all types of data centrally and easily searchable.
The utilization of backup vs. archiving software for databases is often confused in many organizations. Customers often use backup for the purposes of archiving and vice versa. A survey conducted by Symantec Software recently indicates that 70% of enterprises are misusing backup, recovery, and archiving practices. The survey shows that 70% of the enterprises use their backup software to implement legal holds and 25% preserve the entire backup set indefinitely. Also, the survey respondents said 45% of their backup storage is due to legal holds. Additionally, nearly half of the enterprises surveyed are improperly using their backup and recovery software for archiving.
So what are the differences between the two types of solutions? What should each be used for and how are they complementary?