Tag Archives: Metadata Management
Are you interested in big data? There are many reasons to come to Informatica World 2015 but one very good reason is to learn how you can become Big Data Ready! Becoming Big Data Ready is about succeeding with analytics in a Big Data world. It’s about unleashing productivity, repeatability, and confidence. Join us at Informatica World on May 12th for the Big Data Ready Summit, the industry’s first event dedicated to making Big Data ready for analytics.
During the “Big Data Ready” Summit, you will learn how to successfully accelerate the transition from the traditional world of data management to the new world of Big Data analytics. Join us for an engaging discussion with customers, partners and industry leaders as we share best practices for data management, data governance, and data security in a Big Data world. Space is limited – add the Big Data Ready Summit to your schedule now before space runs out! If you haven’t already registered for Informatica World, register today.
After the Big Data Ready Summit, learn even more about Big Data with numerous breakout sessions throughout the week. These sessions cover everything from Hadoop-based data lakes and data hubs to Big Data integration pipelines, reference architectures, and best practices.
I encourage you to sign up for one of six Meet the Expert sessions which are small group meetings for business-level discussions around Big Data and next gen analytics in a Big Data world. If you want to see firsthand how all this Big Data technology works I encourage you to sign up for the Big Data Edition and Intelligent Data Lake Hands-On Labs.
To help start your planning, here is a listing of the Big Data Summit agenda, breakout sessions and other related activities this year. We look forward to seeing you at Informatica World 2015!
BIG DATA QUICK GUIDE FOR INFORMATICA WORLD 2015
Big Data Ready Summit, Tuesday, May 12
|1:30 pm – 1:40 pm||Welcome|
|Ash Parikh, VP Data Integration & Data Security Product Marketing, Informatica|
|1:40 pm – 2:25 pm||Keynote: The New World of Analytics 3.0|
|Tom Davenport, Distinguished Professor in Management and Information Technology at Babson College and best selling author|
|2:30 pm – 3:15 pm||Partner Panel: Succeeding with Big Data and Avoiding the Pitfalls|
|Steve Jones, Global Vice President – Big Data, Capgemini|
|Clarke Patterson, Senior Director of Product Marketing, Cloudera|
|Poornima Ramaswamy, AVP EIM Banking Financial Service, Cognizant|
|John Kreisa, VP Strategic Marketing, Hortonworks|
|3:30 pm – 4:15 pm||Customer Panel: The Big Data Journey – Traditional BI to Next Gen Analytics|
|Deepika Sinha, IT Manager Enterprise Business Solutions, Johnson & Johnson|
|Dave Beaudoin, VP Data Architecture, Transamerica|
|Christine Miesner, Manager E&P Data Management, Devon Energy|
|Thomas Reichel, Lead Architect, KPN|
|Pravin Darbare, Sr. Manager Data Integration & Visualization, Western Union|
|4:15 pm – 4:30 pm||Accelerate Big Data Projects with Informatica|
|Jeff Rydz, Senior Director Big Data Solutions, Informatica|
|4:35 pm – 5:20 pm||Topic TBD on the Big Data Ecosystem & Trends|
|Michael J. Franklin, Professor of Computer Science, U.C. Berkeley|
|5:20 pm – 5:30 pm||Closing Remarks|
|Amit Walia, SVP & GM Data Integration and Data Security, Informatica|
Breakout Sessions, Tuesday, May 12
|DI102||Intuit and the Hadoop-Based Enterprise Data Hub||11:30 am – 12:15 pm||Castellana 1|
|DI113||Managing Big Data with the Business Data Lake||11:30 am – 12:15 pm||Gracia 2|
|DI138||Putting Big Data to Work to Make Cancer History at MD Anderson Cancer Center||11:30 am – 12:15 pm||Gracia 4|
|MDM101||Spreading Like Wildfire: Fast, Accurate Big Data for Smart, Quick Decisions||11:30 am – 12:15 pm||Gracia 1|
|DI142||Data Integration in a Digital World||4:30 pm – 5:30 pm||Gracia 2|
Breakout Sessions, Wednesday, May 13
|IP&A111||Data Lake Architecture Panel Discussion||10:45 am – 11:45 am||Gracia 6|
|DI114||Integration Excellence at J&J with Big Data Services and Best Practices||2:00 pm – 2:45 pm||Gracia 4|
|DI117||Big Data Integration Pipelines at Cox Automotive||2:55 pm – 3:55 pm||Gracia 4|
|DI119||Moving From ‘Datasets’ to ‘Data Assets’ for Insights and Innovation||2:55 pm – 3:55 pm||Castellana 1|
|DI120||CTO Led Workshop – Best Fit Engineering in Analytic Architecture||4:05 pm – 4:50 pm||Gracia 4|
|DI123||Integrating with Hadoop: Best Practices for Using Informatica Big Data Edition||5:00 pm – 6:00 pm||Gracia 4|
|DI125||Advanced Scaling and Metadata Management with PowerCenter 9.6||5:00 pm – 6:00 pm||Castellana 1|
Meet the Experts, Wednesday, May 13
|Code||Meet The Experts||Time||Location|
|MTE107||Big Data – Delivering on the Promise of Big Data Analytics||12:00 pm – 12:50 pm||Castellana 2|
|MTE122||Big Data – Delivering on the Promise of Big Data Analytics||1:00 pm – 1:50 pm||Castellana 2|
|MTE137||Big Data – Delivering on the Promise of Big Data Analytics||2:55 pm – 3:55 pm||Castellana 2|
|MTE101||Analytics – Next Generation Information Architecture for a Big Data World||12:00 pm – 12:50 pm||Castellana 2|
|MTE116||Analytics – Next Generation Information Architecture for a Big Data World||1:00 pm – 1:50 pm||Castellana 2|
|MTE131||Analytics – Next Generation Information Architecture for a Big Data World||2:55 pm – 3:55 pm||Castellana 2|
Breakout Sessions, Thursday, May 14
|DI129||Extending and Modernizing Enterprise Data Architectures||10:10 am – 11:10 am||Gracia 4|
|DI130||Best Practices for Saving Millions by Offloading ETL/ELT to Hadoop with Big Data Edition and Vibe Data Stream||10:10 am – 11:10 am||Gracia 2|
|DI131||Big Data Real Time Streaming Analytics||10:10 am – 11:10 am||Gracia 1|
|DI136||How to run PowerCenter & Big Data Edition on AWS & connect Data as a Service||2:30 pm – 3:30 pm||Gracia 1|
|DI139||How to Enable Self-Service Analytics with Tableau and Informatica||2:30 pm – 3:30 pm||Gracia 2|
Over 100 hardcore data quality professionals gathered at IDQ Summit 2014, held in Richmond, Virginia from 10/6 – 10/9.
The agenda was fully packed. Over 40 sessions were launched during the two-day main conference period. In addition, there were two full-day of tutorials being held before and after the conference, covering topics from data quality for beginner to how to establish lean, agile data governance program. I am happy to report there was no shortage of passionate debates over various subject areas among us data quality maniacs. We “argued” about things like what exactly is “good quality” data; where do ethics stand in data governance practice; should data privacy become fundamental human right. We got an update on data management challenges in EU; shared data quality practices in various industries, and had a glance at data governance basics and operating models.
As a first timer for IDQ Summit, I found it refreshing to meet other data quality professional and hear their stories, it is also reaffirming that many of the practitioners in this space share similar views as we at Informatica has been advocating. The consensus on the importance of quality information will help increase the adoption of data quality tools and drive the development of true data –drive culture in many organizations. Data alone has little value, only the data that meets the quality requirement becomes real asset.
My key takeaways are the following:
- It is well understood that data quality is a business process and involves multiple stakeholders – business unit owns the data and makes requirements about the data they need; analyst prepares and manipulates the data for the business; IT enables an efficient data quality process by recommending and implementing the right tools for analysts and business users.
- Data Governance is a journey and takes in different forms. Operation-wise, it can be centralized, decentralized or hybrid, depending on the culture within a particular organization. Senior management sponsorship is critical in making it a success and sustainable. It is recommended to start the practice within business unit rather than IT, as business often has the ownership of the data and understand the context of the data.
- Quality of the data is not an absolute measure – it is largely agreed that quality of the data is only considered “good” when it meets the right requirement at the right time to the right people. When the requirement is no longer valid, or the person responsible for the data has moved away, or the timing has changed, then ”good” data is no longer good.
- Data quality means different things to different people (boy have we heard that before!). Therefore, it is suggested that persona should come to consideration when discussing data quality with people in different roles.
- Should ethics be considered in data governance practice? Some stated ethics could be different from one country to another, however, aligning our personal code of conduct with the professional code of conduct is a good thing. When in doubt, ask yourself: “Would I do this to my loved ones? If not, then why should I do this to others?” Loosely quoted from one of the speakers. Well said.
- Collaboration between IT and business is critical to the success of data quality process. Tools that help facilitate this joint effort are needed to enable a true data-driven culture.
- Metadata should be considered a key component while implementing data governance practice. I like the what was presented by Ron Klein, a seasoned metadata practitioner from KPMG, which says: “Metadata provides a pedigree to the information: what the information is, where it came from and how it got there, what systems use it, its relationships to other information”. Metadata is important, period.
I welcome your views on those topics and would love to hear your stories. Meanwhile, I invite you to visit our Linkedin group discussions on data quality and related topics here.
Do you remember NASA’s $125 million mistake in 1999? The Mars orbiter was lost as the result of a failed information transfer in which one engineering team used metric units while another used imperial units.
I remember because I could relate. After moving to the U.S. from Canada for graduate school, I had to communicate my height in feet and inches instead of meters and centimeters and give directions in miles instead of kilometers.
On a trip to Vancouver, Canada, Andrew Donaher reminded me about NASA’s costly mistake and how it could have been avoided with a business-friendly data governance program. Following much positive feedback from our last blog, I invited Andy to discuss data governance. You may recall that Andy is the Director of Information Management Strategy at Groundswell Group, a Western Canadian consulting firm that specializes in information management services.
Q. According to www.governyourdata.com, data governance is not about the data. It’s about the business processes, decisions, and stakeholder interactions that you want to enable. What’s your take on the value of data governance?
A: The goal of data governance should be to give people confidence in the data they use to make decisions or take actions. They benefit by not wasting time and energy vetting data or creating new processes. That is a huge value to the organization both in terms of risk mitigation and opportunity. At the absolute highest level, data governance is critical to establish trust and confidence in data.
Q. Explain how IT leaders could approach data governance the wrong way.
A. Typically data governance is approached from a restrictive, security-focused and policing perspective. I have found it much more productive to approach it from an enablement, conversational and guiding perspective. The benefit and value of the rules, policies and procedures associated with governance are that people do not have to re-invent the wheel every time. All those things are set up so people can leverage them to provide value faster.
Think back to when you were learning to ride a bike. Hopefully your parent didn’t stand at a distance barking instructions on what to do and what not to do. He or she started by holding the back of your bike so you felt stable and supported, providing you with guidance on how to do it, words of encouragement about what you’re doing well, and constructive advice on what you could be doing better. Then something would click and you’d get it. When you looked back with a smile on your face, feeling proud of yourself, you’d see your parent was no longer holding your bike. He or she was a few steps behind you smiling back while you rode your bike all by yourself!
Remember that feeling of confidence and elation? That is a form of governance too. It isn’t about shutting things down, it is about enabling and supporting. To do this properly you need to listen and understand what the goals are and what is important. I encourage IT leaders to work closely with line of business leaders to ensure trust and confidence in the data. Everyone should know how to get the proper data they need to help the organization move forward.
Q. Can you share some examples of data governance rules, policies and procedures that are more policing than enabling?
A. An example is when “Hold” or “No” are the default responses to every access request. Typically every database request submitted sits in a queue until an administrator reviews the access request and contacts the person with a series of questions that typically add little value. Sometimes the request is granted or it’s escalated for further investigation. While there is absolutely a level of security and policing that needs to occur on sensitive information, sometimes security and governance can unnecessarily become synonymous.
A potential policy alternative is first distinguishing between sensitivity in data structures and then codifying access policies. For example, imagine someone requests read-only access to a generally available schema in the enterprise data warehouse. This person has a particular job title and works in a particular department. Another person with the same role has similar access. The process requires an “approver” to manually review and approve the request. In this instance, you could set up the access request for automatic approval. The risk will have been mitigated through the applied rules, so you have the necessary governance, but you’ve enabled the business to move faster. That’s a win for everyone involved.
Q. Can you give some concrete advice about how to kick off a successful data governance initiative using an enabling approach?
A. I have two recommendations:
- Recruit Business Partners: Make certain you have some highly respected, experienced and motivated business partners to participate in the kick-off.
- Quantify the Value: As a group, quantify the value of risk mitigation and opportunity cost. For example
- To quantify the risk, measure the dollar value of a wrong metric going to the investor community, the impact on the market value and the percentage chance of it happening. Or quantify the executive team making a wrong decision based on incorrect information.
- To quantify the opportunity, calculate the value of speed-to-market, getting a product to customers quicker than a competitor. You should be able to find examples of how much it cost your organization when you launched a product before a competitor and when you launched a product after a competitor. You can leverage that in your calculation to ensure everyone knows exactly how important enablement is.
When you work collaboratively, business and IT will be on the same page. Business leaders should understand the pressures the IT group is under to protect corporate data. The IT team should understand the pressure business leaders are under to get answers to questions quickly to cut costs and find opportunities for growth in revenue and profits.
Q. Any tips on how to enable data governance processes with technology?
A. You may want to consider these two valuable elements to make data governance and analysis even easier:
- Metadata Manager provides a frame of reference or the context to give data meaning. It enables IT staff to manage technical metadata and perform an impact analysis of a proposed change before it is implemented. While root cause analysis enables business partners to dig into a term in a report to understand the source of the data and how it was moved and transformed before it was added to a report.
- Business Glossary maintains a standard set of business definitions, accountability for its terms and an audit trail for compliance. It enables business partners and IT to collaboratively manage business metadata. To use a healthcare example, does “Claim Paid Date” mean the date it was approved, the check was cut or the check cleared? Turn to Business Glossary to find out.
Q. Can you rescue a data governance initiative that was built based on a policing approach?
A. Absolutely. It takes effort and thought but it can absolutely be done. The key to doing it is realizing the opportunity cost of having people create their own business rules and metrics. While there is a cost to the wasted labor, the greatest cost is lost opportunities. If people are spending time trying to recreate rules and reconcile numbers, they won’t have time to focus on the game changing insight you get from predictive analytics or optimization, which is where the real competitive advantage lies.
Guest Post by Norman Steele, CDMP, CBIP, Enterprise Metadata Management at Fannie Mae
Of all the IT disciplines I’ve worked in over the past 30 years, nothing seems harder to standup than a metadata program that delivers business value day to day. When you turn a light on in a dirty closet, not to many folks want to dive in and clean it up. They would rather shut the door and deal with it later or not at all. Pretty much the same thing happens when a metadata team comes along and tells you that you can find out where your data is by importing your business glossaries and application data models into the repository and attempts to connect them together, only to find inconsistent practices across the organization reducing the ability to connect them.
Glossaries and data models had not been used this way in the past and this exposes numerous inconsistencies:
- Terms defined differently in parts of the organization
- Valid values defined inconsistently
- Multiple naming conventions for the same object
- Data types changing for like objects
- Use of logical names with syntactic structure instead of business terms
- Inability to connect precise business terms to high level concepts found in enterprise data models
- Business terms that do not align with the data model
- Logical side of the data model that does not align with the physical side
How to change this? The premise here is for a metadata team to be self sustaining, where the business asks to get their metadata into the repository and asks for guidance on how to improve its quality, requires collaboration with several groups:
- Governance’s ability to define standards and monitor metrics for adherence is invaluable. Working together, the factors behind each of the issues described earlier can be more effectively communicated through governance standards and best practices and metrics collected to measure progress.
- Business Units ability to see the benefit of the cleanup work, provide the resources not only to learn how to look at their processes a bit differently but make the investment to improve the metadata quality.
- Management support is crucial, especially during the initial phases where it takes time to get a tool configured to work with the design time artifacts the way the organization sees it, not just what the tool does out of the box.
- Every company implements its development process in slightly different ways so when using a tool to help manage your metadata, the tool needs to be configured, enhancements made to fill the gaps which can only happen when you have a good working relationship with the tool vendor.
Only when the collective efforts of these groups are brought to bear on the quality of metadata and the capabilities of the metadata tool can the organization start to see the benefits of connecting them together, something that the evangelists and visionaries tell us lies ahead.
- For more on some of these lessons learned, attend the Fannie Mae Breakout Session at Informatica World 2014.
- To learn more about the conference and keynotes, click here.
- To register for Informatica World, click here.
Views expressed are those of the author and do not necessarily represent those of Fannie Mae.
I’m glad you enjoyed my last letter explaining what data is and how people in my industry make a living managing it. After that letter, you confidently answered all data-related questions your knitting-circle friends could throw at you. But then Edward Snowden, former NSA contractor and world-renowned whistle-blower, came on the scene. Suddenly mainstream news anchors are talking about metadata.
I got your panicked voicemail and, as promised, I’m going to try to clarify what metadata is and how it relates to data. (more…)
A number of customers have asked me recently about the benefits of using a business glossary product over using a spreadsheet or Sharepoint. The discussion is worth sharing.
If you have a smaller company and all you need is a list of standard business terms to provide a common business vocabulary across the company, a spreadsheet or Sharepoint can work, …up to a point. The problem is that once your organization reaches a certain size, you are going to have trouble scaling the management of the business terms, making them available across a larger organization, and fostering collaboration based on the agree-upon business terms. (more…)
It’s important to note that I didn’t title this post “Implementing a Data Governance Architecture”. Data governance is not a technology space, tool – or architecture. As our data governance framework illustrates, tools and architecture represents but one of many facets needed to support an enterprise data governance competency. But once you’ve defined your vision and business case with a clear approach for managing the people, process and policy facets, technology can play a significant role in determining the ultimate success or failure of your data governance efforts. Complex and poorly integrated current state architectures present a significant obstacle to applying common standards for the delivery of trusted and secure data across the enterprise. Data architects play a pivotal role in enabling data governance by designing and evangelizing the data management reference architecture to support data quality and privacy requirements. In addition, these architects must recommend enabling technologies to support data governance and stewardship workflows that aid the core processes of discovery, definition, application and measurement and monitoring (Stay tuned – I’ll be sharing a lot more about these core data governance processes in a future post discussing the “Defined Processes” facet of our framework). Whatever you do, don’t fall into the all-too-common IT trap of selecting the tools before the goals, strategy and processes of data governance are in place. If you skip these steps and just try to build it, they (‘the business’) most assuredly will NOT come. (more…)
Any personal opinions on the health care mandate being irrelevant; I can’t help but be amused by the liberties taken by both major political parties on the definition of a “tax.” When Chief Justice Roberts’ gave the majority opinion that the individual health insurance mandate was constitutional under Congress’ power to tax, the political spin doctors went into overdrive. Everyone on both sides is simultaneously agreeing it is and is not a tax in order to promote their agendas – and has managed to confuse the heck out of the American public in the process. (This ABC News story prompted me to write about this).
I bring this up here because this national debate on the constitutionality of “Obamacare” and the definition of what constitutes a tax is no different from many of the politically-charged debates occurring within your organizations with passions running equally high and confusion reigning supreme. (more…)
Metadata has the same challenges as data. It is created in silos, there is lots of variation and inconsistency, it is growing exponentially and it is of little value if not managed. For example:
- An entity relationship diagram in a data modeling tool is metadata about a database.
- An application portfolio in an EA repository is metadata about systems.
- Server information in a CMDB is metadata about IT assets.
- Mapping information in Power Center is metadata about data lineage.
- Data profiles on a data quality scorecard is metadata about the quality of information.
- An XML schema in a service registry is metadata about canonical message format.
- A business glossary in a metadata repository is metadata definition of business data.
- The status of a new BI report on a project dashboard is metadata about how data is changing. (more…)
Consider this situation: Would you try to ride a bicycle blindfolded? You could probably pump the pedals and steer without trouble, but you would be lacking the visual feedback that the changes you are making in direction and velocity will keep you on your intended course and avoid harm.
This question undoubtedly sounds crazy, but people are making changes to their data integration environments every day without the tools in place to visualize the environment and to tell them the impact of proposed changes.
There are good tools available today to help with this problem.