Category Archives: Data Archiving

How Informatica Tackles the Top 5 Big Data Challenges

Tackle the Top 5 Big Data Challenges

Tackle the Top 5 Big Data Challenges

In my last blog post I discussed the top 5 Big Data challenges listed below:

  • It’s difficult to find and retain resource skills to staff big data projects
  • It takes too long to deploy Big Data projects from ‘proof-of-concept’ to production
  • Big data technologies are evolving too quickly to adapt
  • Big Data projects fail to deliver the expected value
  • It’s difficult to make Big Data fit-for-purpose, assess trust, and ensure security

Informatica has extended its leadership in data integration and data quality to Hadoop with our Big Data Edition to address all of these Big Data challenges.

The biggest challenge companies’ face is finding and retaining Big Data resource skills to staff their Big Data projects.  One large global bank started their first Big Data project with 5 Java developers but as their Big Data initiative gained momentum they needed to hire 25 more Java developers that year.  They quickly realized that while they had scaled their infrastructure to store and process massive volumes of data they could not scale the necessary resource skills to implement their Big Data projects.  The research mentioned earlier indicates that 80% of the work in a Big Data project relates to data integration and data quality.  With Informatica you can staff Big Data projects with readily available Informatica developers instead of an army of developers hand-coding in Java and other Hadoop programming languages.  In addition, we’ve proven to our customers that Informatica developers are up to 5 times more productive on Hadoop than hand-coding and they don’t need to know how to program on Hadoop.  A large Fortune 100 global manufacturer needed to hire 40 data scientists for their Big Data initiative.  Do you really want these hard-to-find and expensive resources spending 80% of their time integrating and preparing data?

Another key challenge is that it takes too long to deploy Big Data projects to production.  One of our Big Data Media and Entertainment customers told me prior to purchasing the Informatica Big Data Edition that most of his Big Data projects had failed.  Naturally, I asked him why they had failed.  His response was, “We have these hot-shot Java developers with a good idea which they prove out in our sandbox environment.  But then when it comes time to deploy it to production they have to re-work a lot of code to make it perform and scale, make it highly available 24×7, have robust error-handling, and integrate with the rest of our production infrastructure.  In addition, it is very difficult to maintain as things change.  This results in project delays and cost overruns.”  With Informatica, you can automate the entire data integration and data quality pipeline; everything you build in the development sandbox environment can be immediately and automatically deployed and scheduled for production as enterprise ready.  Performance, scalability, and reliability are simply handled through configuration parameters without having to re-build or re-work any development which is typical with hand-coding.  And Informatica makes it easier to reuse existing work and maintain Big Data projects as things change. The Big Data Editions is built on Vibe our virtual data machine and provides near universal connectivity so that you can quickly onboard new types of data of any volume and at any speed.

Big Data technologies are emerging and evolving extremely fast. This in turn becomes a barrier to innovation since these technologies evolve much too quickly for most organizations to adopt before the next big thing comes along.  What if you place the wrong technology bet and find that it is obsolete before you barely get started?  Hadoop is gaining tremendous adoption but it has evolved along with other big data technologies where there are literally hundreds of open source projects and commercial vendors in the Big Data landscape.  Informatica is built on the Vibe virtual data machine which means that everything you built yesterday and build today can be deployed on the major big data technologies of tomorrow.  Today it is five flavors of Hadoop but tomorrow it could be Hadoop and other technology platforms.  One of our Big Data Edition customers, stated after purchasing the product that Informatica Big Data Edition with Vibe is our insurance policy to insulate our Big Data projects from changing technologies.  In fact, existing Informatica customers can take PowerCenter mappings they built years ago, import them into the Big Data Edition and can run on Hadoop in many cases with minimal changes and effort.

Another complaint of business is that Big Data projects fail to deliver the expected value.  In a recent survey (1), 86% Marketers say they could generate more revenue if they had a more complete picture of customers.  We all know that the cost of us selling a product to an existing customer is only about 10 percent of selling the same product to a new customer. But, it’s not easy to cross-sell and up-sell to existing customers.  Customer Relationship Management (CRM) initiatives help to address these challenges but they too often fail to deliver the expected business value.  The impact is low marketing ROI, poor customer experience, customer churn, and missed sales opportunities.  By using Informatica’s Big Data Edition with Master Data Management (MDM) to enrich customer master data with Big Data insights you can create a single, complete, view of customers that yields tremendous results.  We call this real-time customer analytics and Informatica’s solution improves total customer experience by turning Big Data into actionable information so you can proactively engage with customers in real-time.  For example, this solution enables customer service to know which customers are likely to churn in the next two weeks so they can take the next best action or in the case of sales and marketing determine next best offers based on customer online behavior to increase cross-sell and up-sell conversions.

Chief Data Officers and their analytics team find it difficult to make Big Data fit-for-purpose, assess trust, and ensure security.  According to the business consulting firm Booz Allen Hamilton, “At some organizations, analysts may spend as much as 80 percent of their time preparing the data, leaving just 20 percent for conducting actual analysis” (2).  This is not an efficient or effective way to use highly skilled and expensive data science and data management resource skills.  They should be spending most of their time analyzing data and discovering valuable insights.  The result of all this is project delays, cost overruns, and missed opportunities.  The Informatica Intelligent Data platform supports a managed data lake as a single place to manage the supply and demand of data and converts raw big data into fit-for-purpose, trusted, and secure information.  Think of this as a Big Data supply chain to collect, refine, govern, deliver, and manage your data assets so your analytics team can easily find, access, integrate and trust your data in a secure and automated fashion.

If you are embarking on a Big Data journey I encourage you to contact Informatica for a Big Data readiness assessment to ensure your success and avoid the pitfalls of the top 5 Big Data challenges.

  1. Gleanster Survey of 100 senior level marketers. The title of this survey is, Lifecycle Engagement: Imperatives for Midsize and Large Companies.  Sponsored by YesMail.
  2. “The Data Lake:  Take Big Data Beyond the Cloud”, Booz Allen Hamilton, 2013
FacebookTwitterLinkedInEmailPrintShare
Posted in Big Data, Data Archiving, Data Governance, Data Integration | Tagged , , | Leave a comment

Informatica Named a Leader in Gartner MQ for Data Archiving

Structured Data Archiving and Application Retirement

Gartner MQ for Structured Data Archiving and Application Retirement

For the first time, Gartner has released a Magic Quadrant for Structured Data Archiving and Application Retirement . This MQ describes an approach for how customers can proactively manage data growth. We are happy to report that Gartner has positioned Informatica as a leader in the space. This is based on our ‘completeness of vision’ and our ‘ability to execute.’

This magic quadrant focuses on what Gartner calls Structured Data Archiving. Data Archiving is used to index, migrate, preserve and protect application data in secondary databases or flat files. These are typically located on lower-cost storage, for policy-based retention.  Data Archiving makes data available in context of the originating business process or application. This is especially useful in the event of litigation or of an audit.

The Magic Quadrant calls out two use cases. These use cases are “live archiving of production applications” and “application retirement of legacy systems.”  Informatica refers to both use cases, together, as “Enterprise Data Archiving.” We consider this to be a foundational component of a comprehensive Information Lifecycle Management strategy.

The application landscape is constantly evolving. For this reason, data archiving is a strategic component of a data growth management strategy. Application owners need a plan to manage data as applications are upgraded, replaced, consolidated, moved to the cloud and/or retired. 

When you don’t have a plan in production, data accumulates in the business application. When this happens, performance bothers the business. In addition, data bloat bothers IT operations. When you don’t have a plan for legacy systems, applications accumulate in the data center. As a result, increasing budgets bother the CFO.

A data growth management plan must include the following:

  • How to cycle through applications and retire them
  • How to smartly store the application data
  • How to ultimately dispose data while staying compliant

Structured data archiving and application retirement technologies help automate and streamline these tasks.

Informatica Data Archive delivers unparalleled connectivity, scalability and a broad range of innovative options (i.e. Smart Partitioning, Live Archiving, and retiring aging and legacy data to the Informatica Data Vault), and comprehensive retention management and data reporting and visualization.  We believe our strengths in this space are the key ingredients for deploying a successful enterprise data archive.

For more information, read the Gartner Magic Quadrant for Structured Data Archiving and Application Retirement.

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Application Retirement, Data Archiving, Database Archiving, Governance, Risk and Compliance | Tagged , , , , , , | Leave a comment

Improve Oracle Performance While Streamlining IT Operations

jjjjjOracle DBAs are challenged with keeping mission critical databases up and running with predictable performance as data volumes grow.  Our customers are changing their approach to proactively managing Oracle performance while simplifying IT by leveraging our innovative Data Archive Smart Partitioning features.  Smart Partitioning leverages Oracle Database Partitioning, simplifying deploying and managing partitioning strategies.  DBAs have been able to respond to requests to improve business process performance without having to write any custom code or SQL scripts.

With Smart Partitioning, DBA’s have a new dialogue with business analysts – rather than wading in the technology weeds, they ask how many months, quarters or years of data are required to get the job done? And show – within a few clicks – how users can self-select how much gets processed when they run queries, reports or programs – basically showing them how they can control their own performance by controlling the volume of data they pull from the database.

Smart Partitioning is configured using easily understood business dimensions such as time, company, business unit etc. These dimensions make it easy to ‘slice’ data to meet the job at hand. Performance becomes manageable and under business control. Another benefit is in your non-production environments. Creating smaller sized, subset databases that are fully functional now fits easily into your cloning operations.

Finally, Informatica has been working closely with the Oracle Enterprise Solutions Group to align Informatica Data Archive Smart Partitioning with the Oracle ZS3 Appliance to maximize performance and savings while minimizing the complexity of implementing an Information Lifecycle Management strategy.

To learn more about our joint solutions, visit our website, read the recent press release here, download a Data Sheet, or watch a Informatica and Oracle joint webinar.

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Application Retirement, Data Archiving, Database Archiving | Tagged , , , , , | Leave a comment

Application Retirement: Old Applications, and Their Place In The Sun

obsolete_tech_large_sqareWhat springs to mind when you think about old applications? What happens to them when they outlived their usefulness? Do they finally get to retire and have their day in the sun, or do they tenaciously hang on to life?

Think for a moment about your situation and of those around you. From the time work started you have been encouraged and sometimes forced to think about, plan for and fund your own retirement. Now consider the portfolio your organization has built up over the years; hundreds or maybe thousands of apps, spread across numerous platforms and locations – A mix of home-grown with the best-in-breed tools or acquired from the leading application vendors.

Evaluating Your Current Situation

  • Do you know how many of those “legacy” systems are still running?
  • Do you know how much these apps are costing?
  • Is there a plan to retire them?
  • How is the execution tracking to plan?

Truth is, even if you have a plan, it probably isn’t going well.

Providing better citizen service at a lower cost

This is something every state and local organization aspires to do by reducing costs. Many organizations are spending 75% or more of their budgets on just keeping the lights on – maintaining existing applications and infrastructure. Being able to fully retire some, or many of these applications saves significant money. Do you know how much these applications are costing your organization? Don’t forget to include the whole range of costs that applications incur – including the physical infrastructure costs such as mainframes, networks and storage, as well as the required software licenses and of course the time of the people that actually keep them running. What happens when those with with Cobol and CICS experience retire? Usually the answer is not good news. There is a lot to consider and many benefits to be gained through an effective application retirement strategy.

August 2011 report by ESG Global shows that some 68% of organizations had over six or more legacy applications running and that 50% planned to retire at least one of those over the following 12-18 months. It would be interesting to see today’s situation and be able evaluate how successful these application retirement plans have been.

A common problem is knowing where to start. You know there are applications that you should be able to retire, but planning, building and executing an effective and success plan can be tough. To help this process we have developed a strategy, framework and solution for effective and efficient application retirement. This is a good starting point on your application retirement journey.

To get a speedy overview, take six minutes to watch this video on application retirement.

We have created a community specifically for application managers in our ‘Potential At Work’ site. If you haven’t already signed up, take a moment and join this group of like-minded individuals from across the globe.

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Application Retirement, Business Impact / Benefits, Data Archiving, Operational Efficiency, Public Sector | Tagged , , , , | Leave a comment

Data Archiving and Data Security Are Table Stakes

ILM Day at Informatica World 2014I recently met with a longtime colleague from the Oracle E-Business Suite implementation eco-system, now VP of IT for a global technology provider. This individual has successfully implemented data archiving and data masking technologies to eliminate duplicate applications and control the costs of data growth – saving tens of millions of dollars. He has freed up resources that were re-deployed within new innovative projects such as Big Data – giving him the reputation as a thought leader. In addition, he has avoided exposing sensitive data in application development activities by securing it with data masking technology – thus securing his reputation.

When I asked him about those projects and the impact on his career, he responded, ‘Data archiving and data security are table stakes in the Oracle Applications IT game. However, if I want to be a part of anything important, it has to involve Cloud and Big Data.’ He further explained how the savings achieved from Informatica Data Archive enabled him to increase employee retention rates because he was able to fund an exciting Hadoop project that key resources wanted to work on. Not to mention, as he transitioned from physical infrastructure to a virtual server by retiring legacy applications – he had accomplished his first step on his ‘journey to the cloud’. This would not have been possible if his data required technology that was not supported in the cloud. If he hadn’t secured sensitive data and had experienced a breach, he would be looking for a new job in a new industry.

Not long after, I attended a CIO summit where the theme of the conference was ‘Breakthrough Innovation’. Of course, Cloud and Big Data were main stage topics – not just about the technology, but about how it was used to solve business challenges and provide services to the new generation of ‘entitled’ consumers. This is the description of those who expect to have everything at their fingertips. They want to be empowered to share or not share their information. They expect that if you are going to save their personal information, it will not be abused. Lastly, they may even expect to try a product or service for free before committing to buy.

In order to size up to these expectations, Application Owners, like my long-time colleague, need to incorporate Data Archive and Data Masking in their standard SDLC processes. Without Data Archive, IT budgets may be consumed by supporting old applications and mountains of data, thereby becoming inaccessible for new innovative projects. Without Data Masking, a public breach will drive many consumers elsewhere.

To learn more about what Informatica has to offer and why data archiving and data masking are table stakes, join us at the ILM Day at Informatica World 2014 on May 12th in Las Vegas, NV.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Archiving, Data masking, Informatica World 2014 | Tagged , , , | Leave a comment

What types of data need protecting?

In my first article on the topic of citizens’ digital health and safety we looked at the states’ desire to keep their citizens healthy and safe and also at the various laws and regulations they have in place around data breaches and losses. The size and scale of the problem together with some ideas for effective risk mitigation are in this whitepaper.

The cost and frequency of data breaches continue to rise

The cost and frequency of data breaches continue to rise

Let’s now start delving a little deeper into the situation states are faced with. It’s pretty obvious that citizen data that enables an individual to be identified (PII) needs to be protected. We immediately think of the production data: data that is used in integrated eligibility systems; in health insurance exchanges; in data warehouses and so on. In some ways the production data is the least of our problems; our research shows that the average state has around 10 to 12 full copies of data for non-production (development, test, user acceptance and so on) purposes. This data tends to be much more vulnerable because it is widespread and used by a wide variety of people – often subcontractors or outsourcers, and often the content of the data is not well understood.

Obviously production systems need access to real production data (I’ll cover how best to protect that in the next issue), on the other hand non-production systems of every sort do not. Non-production systems most often need realistic, but not real data and realistic, but not real data volumes (except maybe for the performance/stress/throughput testing system). What need to be done? Well to start with, a three point risk remediation plan would be a good place to start.

1. Understand the non-production data using sophisticated data and schema profiling combined with NLP (Natural Language Processing) techniques help to identify previously unrealized PII that needs protecting.
2. Permanently mask the PII so that it is no longer the real data but is realistic enough for non-production uses and make sure that the same masking is applied to the attribute values wherever they appear in multiple tables/files.
3. Subset the data to reduce data volumes, this limits the size of the risk and also has positive effects on performance, run-times, backups etc.

Gartner has just published their 2013 magic quadrant for data masking this covers both what they call static (i.e. permanent or persistent masking) and dynamic (more on this in the next issue) masking. As usual the MQ gives a good overview of the issues behind the technology as well as a review of the position, strengths and weaknesses of the leading vendors.

It is (or at least should be) an imperative that from the top down state governments realize the importance and vulnerability of their citizens data and put in place a non-partisan plan to prevent any future breaches. As the reader might imagine, for any such plan to success needs a combination of cultural and organizational change (getting people to care) and putting the right technology – together these will greatly reduce the risk. In the next and final issue on this topic we will look at the vulnerabilities of production data, and what can be done to dramatically increase its privacy and security.

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Data Archiving, Data Governance, Data masking, Data Privacy, Public Sector | Tagged , , , , , | Leave a comment

Informatica ILM Nearline Achieves SAP-Certified Integration

Informatica announced yesterday the  Informatica ILM Nearline product is SAP-certified.  ILM Nearline helps IT organizations reduce costs of managing data growth in existing implementations of the SAP NetWeaver Business Warehouse (SAP NetWeaver BW) and SAP HANA. By doing so, customers can leverage freed budgets and resources to invest in its application landscape and data center modernization initiatives. Informatica ILM Nearline v6.1A for use with SAP NetWeaver BW and SAP HANA, available today, is purpose-built for SAP environments leveraging native SAP interfaces.

Data volumes are growing the fastest in data warehouse and reporting applications[1], yet a significant amount of it is rarely used or infrequently accessed. In deployments of SAP NetWeaver BW, standard SAP archiving can reduce the size of a production data warehouse database to help preserve its performance, but if users ever want to query or manipulate the archived data, the data needs to be loaded back into the production system disrupting data analytics processes and extending time to insight. The same holds true for SAP HANA.

To address this, ILM Nearline enables IT to migrate large volumes of largely inactive SAP NetWeaver BW or SAP HANA data from the production database or in memory store to online, secure, highly compressed, immutable files in a near-line system while maintaining end-user access.  The result is a controlled environment running SAP NetWeaver BW or SAP HANA with predictable, ongoing hardware, software and maintenance costs.  This helps ensure service-level agreements (SLAs) can be met while freeing up ongoing budget and resources so IT can focus on innovation.

Informatic ILM Nearline for use with SAP NetWeaver BW and SAP HANA has been certified with the following interfaces:

“Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is all about reducing the costs of data while keeping the data easily accessible and thus valuable,” said Adam Wilson, general manager, ILM, Informatica. “As data volumes continue to soar, the solution is especially game-changing for organizations implementing SAP HANA as they can use the Informatica-enabled savings to help offset and control the costs of their SAP HANA licenses without disrupting the current SAP NetWeaver BW users’ access to the data.”

Specific advantages of Informatica ILM Nearline include:

  • Industry-leading compression rates – Informatica ILM Nearline’s compression rates exceed standard database compression rates by a sizable margin. Customers typically achieve rates in excess of 90 percent, and some have reported rates as high as 98 percent.
  •  Easy administration and data access – No database administration is required for data archived by Informatica ILM Nearline. Data is accessible from the user’s standard SAP application screen without any IT interventions and is efficiently stored to simplify backup, restore and data replication processes.
  • Limitless capacity – Highly scalable, the solution is designed to store limitless amounts of data without affecting data access performance.
  • Easy storage tiering – As data is stored in a highly compressed format, the nearline archive can be easily migrated from one storage location to another in support of a tiered storage strategy.

Available now, Informatica ILM Nearline for use with SAP NetWeaver BW and SAP HANA is based on intellectual property acquired from Sand Technology in Q4 2011 and enhanced by Informatica.


[1] Informatica Survey Results, January 23, 2013 (citation from Enterprise Data Archive for Hybrid IT Webinar)

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Data Archiving, Database Archiving | Tagged , , , , , , , , , | Leave a comment

Oracle OpenWorld 2013 – Day 1 – Archive And Purge SIG

The Oracle Application User Group (OAUG) Archive and Purge Special Interest Group (SIG) held its semi-annual session first thing in the morning, Sunday September 22, 2013 – 8:00am.  The chairman of the SIG, Brian Bent, must have lost in the drawing straws contest for session times.  Regardless, attendance was incredibly strong and the topic, ‘Cleaning up your Oracle E-Business Suite Mess’, was well received.

From the initial audience survey, most attendees have made the jump to OEBS R12 and very few have implemented an Information Lifecycle Management (ILM) strategy. As organizations migrate to the latest version, the rate of data growth increases significantly such that performance takes a plunge, costs for infrastructure and storage spike, and DBAs are squeezed with trying to make due.

The bulk of the discussion was on what Oracle offers for purging Concurrent Programs.    The focus was on system tables – not functional archive and purge routines, like General Ledger or Accounts Receivable.  That will be a topic of another SIG day.

For starters, Oracle provides Concurrent Programs to purge administrative data.  Look for ‘Big Tables’ owned by APPLSYS for more candidates and search for the biggest tables / indexes.  Search for ‘PURGE’ on MyOracleSupport (MOS) – do your homework to decide if the Purge programs apply to you.  If you are concerned about deleting data, you can create an archive table, add an ‘on delete’ trigger to the original table, run the purge and automatically save the data in the archive table  (Guess what? This is a CUSTOMIZATION).

Some areas to look at include FND_Concurrent Requests and FND_LOBS.

FND_concurrent_Requests

-          Most customers purge data older than 7-30 days

-          Oracle recommends keeping this table under 25,000 rows

-          Consider additional Purges that delete data about concurrent requests that run frequently

FND_LOBS

-          DBAs do not delete from FND_LOBS; the only way to get rid of them is for Oracle to provide a concurrent Program for the module that users used to load them up

-          Can take an enormous amount of space and make exporting and importing your database take a long time

-          You can also look to store FND_LOBS as secure files, but requires advanced compression licenses

-          Log enhancement requests for more concurrent programs to clean up FND_LOBS

-          Look to third party solutions, such as Informatica

Other suggestions include WORKFLOW, but this requires more research.

For more information, join the Oracle Application User Group and sign up for the Archive and Purge Special Interest Group.

FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Data Archiving, Database Archiving | Tagged , , , | Leave a comment

Informatica ILM Nearline – Best Practices

In previous posts, we introduced the concept of the Informatica ILM Nearline and discussed how Informatica ILM Nearline could help your business. To recapitulate: the major advantage of Informatica ILM Nearline is its superior data access performance, which enables a more aggressive approach to migrating huge volumes of data out of the online repository to an accessible, highly compressed archive (on inexpensive 2nd and 3rd tier storage infrastructure).

Today, I will be considering the question of when an enterprise should consider implementing Informatica ILM Nearline. Broadly speaking, such implementations fall into two categories: they either offer a “cure” for an existing data management problem or represent a proactive implementation of data best practices within the organization.

Cure or Prevention?

The “cure” type of implementation is typically associated with a data warehouse or business application “rescue” project. This is undertaken when the production system grows to a point where database size causes major performance problems and affects the ability to meet Service Level Agreements (SLAs) and manage business processes in a timely manner. In these kinds of situation, it is mainly the operations division of the organization that is affected, and who demand an immediate fix that can take the form of an Informatica ILM Nearline implementation. The question here is: How quickly can the “cure” implementation stabilize performance and ensure satisfaction of SLAs?

On the other hand, the best practice approach, much like current practices related to healthy living, focuses on prevention rather than on curing. In this respect, best practices dictate that the Informatica ILM Nearline implementation should start as soon as some of the data in the production system becomes “infrequently accessed”, or “cold”. In data warehouses and data marts where the current month or two is being analyzed most often, this means data older than 90 days. For transactional systems the archiving cutoff may be a year or two, depending on typical length of your business processes. The main idea is to keep the size production databases from inflating for no good business reason and ‘nearlining’ the data as soon as possible without interrupting business operations or hurting the value of your data. Ultimately this should work to protect the enterprise from an operational crisis arising from deteriorating performance and unmet SLAs.

In order to better judge the impact of using either of these two approaches, it is important to understand the various steps involved in the “Nearlining” process. What do we find when we “dissect” the process of leveraging the Informatica ILM Nearline?

Dissecting the “Informatica ILM Nearline” Process

Informatica Informatica ILM Nearline involves multiple processes, whose performance characteristics can significantly influence the speed at which data is migrated out of the online database. The various processes are managed by the overall integrated nearline solution of Informatica coupled with a SAP Business Warehouse system:

  • The first step is to lock the data that is targeted by the archiving process, in order to ensure that the data is not modified while the process is going on. SAP Business Warehouse does it automatically and you execute Data Archive Processes (DAP) for the cold data.
  • Next comes the extraction of the data to be migrated. This is usually achieved via an SQL statement based on business rules for data migration. Often, the extraction can be performed using multiple extraction/consumer processes working in parallel.
  • The next step is to secure the newly extracted data, so that it is recoverable.
  • Then, the integrity of the extracted data must be validated (normally by comparing it to its online counterpart).

Database Housekeeping

  • Next, delete the online data that has been moved to nearline.
  • Then, reorganize the tablespace of the deleted data.
  • Finally, rebuild/reorganize the index associated with the online table from which data has been nearlined.

The Database Housekeeping process is often the slowest part of a Data Nearlining process, and thus can dictate the pace and scheduling of the implementation. In a production environment, the database housekeeping process is frequently decoupled from ongoing operations and performed over a weekend. It may be surprising to learn that deleting data can be a more expensive process than inserting it, but just ask an enterprise DBA about what is involved in deleting 1 TB from an Enterprise Data Warehouse and see what answer you get: for many, the task of fitting such a process into standard Batch Windows would be a nightmare.

So, it is easy to see that starting earlier in implementing Informatica ILM Nearline as a best practice can help to massively reduce not only the cost of the implementation, but also the time required to perform it. Therefore, the main recommendation to take away from this discussion is: Don’t wait too long to consider embarking on your Informatica ILM Nearline strategy!

That’s it for today. In my next post, I will take up the topic of which data should be initially considered as a candidate for migration.

FacebookTwitterLinkedInEmailPrintShare
Posted in Data Archiving | Tagged , , | Leave a comment

Exploring the Informatica ILM Nearline 6.1A

In today’s post, I want to write about the “Informatica ILM Nearline 6.1A″. Although this Nearline concept is not new, it is still not very known and represents the logical evolution of business applications,  data warehouses and information lifecycle approaches that have struggled to maintain acceptable performance levels in the face of the increasingly intense “data tsunami” that looms over today’s business world. Whereas older archiving solutions based their viability on the declining prices of hardware and storage, ILM Nearline 6.1A embraces the dynamism of a software and services approach to fully leverage the potential of large enterprise data architectures.

Looking back, we can now see that the older data management solutions presented a paradox: in order to mitigate performance issues and meet Service Level Agreements (SLA) with users, they actually prevented or limited ad-hoc access to data. On the basis of system monitoring and usage statistics, this inaccessible data was then declared to be unused, and this was cited as an excuse for locking it away entirely. In effect, users were told: “Since you can’t get at it, you can’t use it, and therefore we’re not going to give it to you”!

ILM Nearline 6.1A, by contrast, allows historical data to be accessed with near-online speeds, empowering business analysts to measure and perfect key business initiatives through analysis of actual historical details. In other words, ILM Nearline 6.1A gives you all the data you want, when and how you want it (without impacting the performance of existing warehouse reporting systems!).

Aside from the obvious economic and environmental benefits of this software-centric approach and the associated best practices, the value of ILM Nearline 6.1A can be assessed in terms of the core proposition cited by Tim O’Reilly when he coined the term “Web 2.0″:

“The value of the software is proportional to the scale and dynamism of the data it helps to manage.”

In this regard, ILM Nearline 6.1A provides a number of important advantages over prior methodologies:

Keeps data accessible: ILM Nearline 6.1A enables optimal performance from the online database while keeping all data easily accessible. This massively reduces the work required to identify, access and restore archived data, while minimizing the performance hit involved in doing so in a production environment.

Keeps the online database “lean”: Because data archived to the ILM Nearline 6.1A can still be easily accessed by users at near-online speeds, it allows for much more recent data to be moved out of the online system than would be possible with archiving. This results in far better online system performance and greater flexibility to further support user requirements without performance trade-offs. It is also a big win for customers moving their systems to HANA.

Relieves data management stress: Data can be moved to ILM Nearline 6.1A without the substantial ongoing analysis of user access patterns that is usually required by archiving products. The process is typically based on a rule as simple as “move all data older than x months from the ten largest InfoProviders”.

Mitigates administrative risk: Unlike archived data, ILM Nearline 6.1A data requires little or no additional ongoing administration, and no additional administrative intervention is required to access it.

Lets analysts be analysts: With ILM Nearline 6.1A, far less time is taken up in gaining access to key data and “cleansing it”, so much more time can be spent performing “what if” scenarios before recommending a course of action for the company. This improves not only the productivity but also the quality of work of key business analysts and statistical gurus.

Copes with data structure changes: ILM Nearline 6.1A can easily deal with data model changes, making it possible to query data structured according to an older model alongside current data. With archive data, this would require considerable administrative work.

Leverages existing storage environments: Compared to older archiving products/strategies, the high degree of compression offered by ILM Nearline 6.1A greatly increases the amount of information that can be stored as well as the speed at which it can be accessed.

Keeps data private and secure: ILM Nearline 6.1A has privacy and security features that protect key information from being seen by ad-hoc business analysts (for example: names, social security numbers, credit card information).

In short, ILM Nearline 6.1A offers a significant advantage over other nearline and archiving technologies. When data needs be removed from the online database in order to improve performance, but still needs to be readily accessible by users to conduct long-term analysis, historical reporting, or to rebuild aggregates/KPIs/InfoCubes for period-over-period analysis, ILM Nearline 6.1A is currently the only workable solution available.

In my next post, I’ll discuss more specifically how implementing the ILM Nearline 6.1A solution can benefit your business apps, data warehouses and your business processes.


FacebookTwitterLinkedInEmailPrintShare
Posted in Application ILM, Data Archiving | Tagged , , , , | 2 Comments