Category Archives: General
I ended my previous blog wondering if awareness of Data Gravity should change our behavior. While Data Gravity adds Value to Big Data, I find that the application of the Value is under explained.
Exponential growth of data has naturally led us to want to categorize it into facts, relationships, entities, etc. This sounds very elementary. While this happens so quickly in our subconscious minds as humans, it takes significant effort to teach this to a machine.
A friend tweeted this to me last week: I paddled out today, now I look like a lobster. Since this tweet, Twitter has inundated my friend and me with promotions from Red Lobster. It is because the machine deconstructed the tweet: paddled <PROPULSION>, today <TIME>, like <PREFERENCE> and lobster <CRUSTACEANS>. While putting these together, the machine decided that the keyword was lobster. You and I both know that my friend was not talking about lobsters.
You may think that this maybe just a funny edge case. You can confuse any computer system if you try hard enough, right? Unfortunately, this isn’t an edge case. 140 characters has not just changed people’s tweets, it has changed how people talk on the web. More and more information is communicated in smaller and smaller amounts of language, and this trend is only going to continue.
When will the machine understand that “I look like a lobster” means I am sunburned?
I believe the reason that there are not hundreds of companies exploiting machine-learning techniques to generate a truly semantic web, is the lack of weighted edges in publicly available ontologies. Keep reading, it will all make sense in about 5 sentences. Lobster and Sunscreen are 7 hops away from each other in dbPedia – way too many to draw any correlation between the two. For that matter, any article in Wikipedia is connected to any other article within about 14 hops, and that’s the extreme. Completed unrelated concepts are often just a few hops from each other.
But by analyzing massive amounts of both written and spoken English text from articles, books, social media, and television, it is possible for a machine to automatically draw a correlation and create a weighted edge between the Lobsters and Sunscreen nodes that effectively short circuits the 7 hops necessary. Many organizations are dumping massive amounts of facts without weights into our repositories of total human knowledge because they are naïvely attempting to categorize everything without realizing that the repositories of human knowledge need to mimic how humans use knowledge.
For example – if you hear the name Babe Ruth, what is the first thing that pops to mind? Roman Catholics from Maryland born in the 1800s or Famous Baseball Player?
If you look in Wikipedia today, he is categorized under 28 categories in Wikipedia, each of them with the same level of attachment. 1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers.
Now imagine how confused a machine would get when the distance of unweighted edges between nodes is used as a scoring mechanism for relevancy.
If I were to design an algorithm that uses weighted edges (on a scale of 1-5, with 5 being the highest), the same search would yield a much more obvious result.
1895 births | 1948 deaths | American League All-Stars | American League batting champions | American League ERA champions | American League home run champions | American League RBI champions | American people of German descent | American Roman Catholics | Babe Ruth | Baltimore Orioles (IL) players | Baseball players from Maryland | Boston Braves players | Boston Red Sox players | Brooklyn Dodgers coaches | Burials at Gate of Heaven Cemetery | Cancer deaths in New York | Deaths from esophageal cancer | Major League Baseball first base coaches | Major League Baseball left fielders | Major League Baseball pitchers | Major League Baseball players with retired numbers | Major League Baseball right fielders | National Baseball Hall of Fame inductees | New York Yankees players | Providence Grays (minor league) players | Sportspeople from Baltimore | Maryland | Vaudeville performers .
Now the machine starts to think more like a human. The above example forces us to ask ourselves the relevancy a.k.a. Value of the response. This is where I think Data Gravity’s becomes relevant.
You can contact me on twitter @bigdatabeat with your comments.
If you’ve wondered why so many companies are eager to control data storage, the answer can be summed up in a simple term: data gravity. Ultimately, where data is determines where the money is. Services and applications are nothing without it.
Dave McCrory introduced his idea of Data Gravity with a blog post back in 2010. The core idea was – and is – Interesting. More recently, Data Gravity featured in this year’s EMC World keynote. But, beyond the observation that large or valuable agglomerations of data exert a pull that tends to see them grow in size or value, what is a recognition of Data Gravity actually good for?
As a concept, Data Gravity seems closely associated with current enthusiasm for Big Data. In addition, like Big Data, the term’s real-world connotations can be unhelpful almost as often as they are helpful. Big Data exhibits at least three characteristics, which are Volume, Velocity, and Variety. Various other V’s, including Value, is mentioned from time to time, but with less consistency. Yet, Big Data’s name says it’s all about size. The speed with which data must be ingested, processed, or excreted is less important. The complexity and diversity of the data doesn’t matter either.
On its own, the size of a data set is unimportant. Coping with lots of data certainly raises some not-insignificant technical challenges, but the community is actually doing a good job of coming up with technically impressive solutions. The interesting aspect of a huge data set isn’t its size, but the very different modes of working that become possible when you begin to unpick the complex interrelationships between data elements.
Sometimes, Big Data is the vehicle by which enough data is gathered about enough aspects of enough things from enough places for those interrelationships to become observable against the background noise. Other times, Big Data is the background noise, and any hope of insight is drowned beneath the unending stream of petabytes.
To a degree, Data Gravity falls into the same trap. More gravity must be good, right? And more mass leads to more gravity. Mass must be connected to volume, in some vague way that was explained when I was 11, and which involves STP. Therefore, bigger data sets have more gravity. This means that bigger data sets are better data sets. That assertion is clearly nonsense, but luckily, it’s not actually what McCrory is suggesting. His arguments are more nuanced than that, and potentially far more useful.
Instinctively, I like that the equation attempts to move attention away from ‘the application’ toward the pools of data that support many, many applications at once. The data is where the potential lies. Applications are merely the means to unlock that potential in various ways. So maybe notions of Potential Energy from elsewhere in Physics need to figure here.
But I’m wary of the emphasis given to real numbers that are simply the underlying technology’s vital statistics; network latency, bandwidth, request sizes, numbers of requests, and the rest. I realize that these are the measurable things that we have, but feel that more abstract notions of value need to figure just as prominently.
So I’m left reaffirming my original impression that Data Gravity is “interesting”. It’s also intriguing, and I keep feeling that it should be insightful. I’m just not — yet — sure exactly how. Is a resource with a Data Gravity of 6 twice as good as a resource with a Data Gravity of 3? Does a data set with a Data Gravity of 15 require three times as much investment/infrastructure/love as a data set scoring a humble 5? It’s unlikely to be that simple, but I do look forward to seeing what happens as McCrory begins to work with the parts of our industry that can lend empirical credibility to his initial dabbling in mathematics.
If real numbers show the equations to stand up, all we then need to do is work out what the numbers mean. Should an awareness of Data Gravity change our behavior, should it validate what gut feel led us to do already, or is it just another ‘interesting’ and ultimately self-evident number that doesn’t take us anywhere?
I don’t know, but I will continue to explore. You can contact me on twitter @bigdatabeat
The Rising CFO is Increasingly Business Oriented
At the CFO Rising West Conference on October 30th and 31st, there were sessions on managing capital expenditures, completing an IPO, and even managing margin and cash flow. However, the keynote presenters did not spend much of time on these topics. Instead, they focused on how CFOs need to help their firms execute better. Here is a quick summary of the suggestions made from CFOs in broadcasting, consumer goods, retail, healthcare, and medical devices.
The Modern CFO is Strategic
The Broadcasting CFO started his talk by saying he was not at the conference to share why CFOs need to move from being “bean counters to strategic advisors”. He said “let’s face it the modern CFO is a strategic CFO”. Agreeing with this viewpoint, the Consumer Goods CFO said that finance organizations have a major role to play in business transformation. He said that finance after all is the place to drive corporate improvement as well as business productivity and business efficiency.
CFOs Talked About Their Business’ Issues
The Retailer CFO talked like he was a marketing person. He said retail today is all about driving a multichannel customer experience. To do this, finance increasingly needs to provide real business value. He said, therefore, that data is critical to the retailer’s ability to serve customers better. He claimed that customers are changing how they buy, what they want to buy, and when they want to buy. We are being disrupted and it is better to understand and respond to these trends. We are trying, therefore, to build a better model of ecommerce.
Meanwhile, the Medical Devices CFO said that as a supplier to medical device vendors “what we do is compete with our customers engineering staffs”. And the Consumer Goods CFO added the importance of finance driving sustained business transformation.
CFOs Want To Improve Their Business’ Ability To Execute
The Medical Devices CFO said CFOs need to look for “earlier execution points”. They need to look for the drivers of behavior change. As a key element of this, he suggested that CFOs need to develop “early warning indicators”. He said CFOs need to actively look at the ability to achieve objectives. With sales, we need to ask what deals do we have in the pipe? At what size are these deals? And at what success rate will these deals be closed? Only with this information, can the CFO derive an expected company growth rate. He then asked CFOs in the room to identify themselves. With their hands in the air, he asked them are they helping to create a company that executes or not. He laid down the gauntlet for the CFOs in the room by then asserting that if you are not creating a company that executes then are going to be looking at cutting costs sooner rather than later.
The retailer CFO agreed with this CFO. He said today we need to focus on how to win a market. We need to be asking business questions including:
- How should we deploy resources to deliver against our firm’s value proposition?
- How do we know when we win?
CFOs Claim Ownership For Enterprise Performance Measurement
The Retail CFO said that finance needs to own “the facts for the organization”—the metrics and KPIs. This is how he claims CFOs will earn their seat at the CEOs table. He said in the past the CFO have tended to be stoic, but this now needs to change.
The Medical Devices CFO agreed and said enterprises shouldn’t be tracking 150 things—they need to pare it down to 12-15 things. They need to answer with what you measure—who, what, and when. He said in an execution culture people need to know the targets. They need measurable goals. And he asserted that business metrics are needed over financial metrics. The Consumer Goods CFO agreed by saying financial measures alone would find that “a house is on fire after half the house had already burned down”. The Healthcare CFO picked up on this idea and talked about the importance of finance driving value scorecards and monthly benchmarks of performance improvement. The broadcaster CFO went further and suggested the CFO’s role is one of a value optimizer.
CFOs Own The Data and Drive a Fact-based, Strategic Company Culture
The Retail CFOs discussed the need to drive a culture of insight. This means that data absolutely matters to the CFO. Now, he honestly admits that finance organizations have not used data well enough but he claims finance needs to make the time to truly become data centric. He said I do not consider myself a data expert, but finance needs to own “enterprise data and the integrity of this data”. He said as well that finance needs to ensure there are no data silos. He summarized by saying finance needs to use data to make sure that resources are focused on the right things; decisions are based on facts; and metrics are simple and understandable. “In finance, we need use data to increasingly drive business outcomes”.
CFOs Need to Drive a Culture That Executes for Today and the Future
Honestly, I never thought that I would hear this from a group of CFOs. The Retail CFO said we need to ensure that the big ideas do not get lost. We need to speed-up the prosecuting of business activities. We need to drive more exponential things (this means we need to position our assets and resources) and we need, at the same time, to drive the linear things which can drive a 1% improvement in execution or a 1% reduction in cost. Meanwhile, our Medical Device CFO discussed the present value, for example, of a liability for rework, lawsuits, and warranty costs. He said that finance leaders need to ensure things are done right today so the business doesn’t have problems a year from today. “If you give doing it right the first time a priority, you can reduce warranty reserve and this can directly impact corporate operating income”.
CFOs need to lead on ethics and compliance
The Medical Devices CFO said that CFOs, also, need to have high ethics and drive compliance. The Retail CFO discussed how finance needs to make the business transparent. Finance needs to be transparent about what is working and what is not working. The role of the CFO, at the same time, needs to ensure the integrity of the organization. The Broadcaster CFO asserted the same thing by saying that CFOs need to take a stakeholder approach to how they do business.
In whole, CFOs at CFO Rising are showing the way forward for the modern CFOs. This CFO is all about the data to drive present and future performance, ethics and compliance, and business transparency. This is a big change from the historical controller approach and mentality. I once asked a boss about what I needed to be promoted to a Vice President; my boss said that I needed to move from a technical specialist to a business person. Today’s CFOs clearly show that they are a business person first.
Solution Brief: The Intelligent Data Platform
CFOs Move to Chief Profitability Officer
CFOs Discuss Their Technology Priorities
The CFO Viewpoint upon Data
How CFOs can change the conversation with their CIO?
New type of CFO represents a potent CIO ally
Competing on Analytics
The Business Case for Better Data Connectivity
Are you in Sales Operations, Marketing Operations, Sales Representative/Manager, or Marketing Professional? It’s no secret that if you are, you benefit greatly from the power of performing your own analysis, at your own rapid pace. When you have a hunch, you can easily test it out by visually analyzing data in Tableau without involving IT. When you are faced with tight timeframes in which to gain business insight from data, being able to do it yourself in the time you have available and without technical roadblocks makes all the difference.
Self-service Business Intelligence is powerful! However, we all know it can be even more powerful. When needing to put together an analysis, we know that you spend about 80% of your time putting together data, and then just 20% of your time analyzing data to test out your hunch or gain your business insight. You don’t need to accept this anymore. We want you to know that there is a better way!
We want to allow you to Flip Your Division of Labor and allow you to spend more than 80% of your time analyzing data to test out your hunch or gain your business insight and less than 20% of your time putting together data for your Tableau analysis! That’s right. You like it. No, you love it. No, you are ready to run laps around your chair in sheer joy!! And you should feel this way. You now can spend more time on the higher value activity of gaining business insight from the data, and even find copious time to spend with your family. How’s that?
Project Springbok is a visionary new product designed by Informatica with the goal of making data access and data quality obstacles a thing of the past. Springbok is meant for the Tableau user, a data person would rather spend their time visually exploring information and finding insight than struggling with complex calculations or waiting for IT. Project Springbok allows you to put together your data, rapidly, for subsequent analysis in Tableau. Project Springbok tells you things about your data that even you may not have known. It does it through Intelligent Suggestions that it presents to the User.
Let’s take a quick tour:
- Project Springbok tells you, that you have a date column and that you likely want to obtain the Year and Quarter for your analysis (Fig 1)., And if you so wish, by a single click, voila, you have your corresponding years and even the quarters. And it all happened in mere seconds. A far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.
VALUE TO A MARKETING CAMPAIGN PROFESSIONAL: Rapidly validate and accurately complete your segmentation list, before you analyze your segments in Tableau. Base your segments on trusted data that did not take you days to validate and enrich.
- Then Project Springbok will tell you that you have two datasets that could be joined on a common key, email for example, in each dataset, and would you like to move forward and join the datasets (Fig 2)? If you agree with Project Springbok’s suggestion, voila, dataset joined in a mere few seconds. Again, a far cry from the 45 minutes it would have taken a fluent user of Excel to do using VLOOKUPS.
VALUE TO A SALES REPRESENTATIVE OR SALES MANAGER: You can now access your Salesforce.com data (Fig 3) and effortlessly combine it with ERP data to understand your true quota attainment. Never miss quota again due to a revenue split, be it territory or otherwise. Best of all, keep your attainment datatset refreshed and even know exactly what datapoint changed when your true attainment changes.
- Then, if you want, Project Springbok will tell you that you have emails in the dataset, which you may or may not have known, but more importantly it will ask you if you wish to determine which emails can actually be mailed to. If you proceed, not only will Springbok check each email for correct structure (Fig 4), but will very soon determine if the email is indeed active, and one you can expect a response from. How long would that have taken you to do?
VALUE TO A TELESALES REPRESENTATIVE OR MARKETING EMAIL CAMPAIGN SPECIALIST : Ever thought you had a great email list and then found out most emails bounced? Now, confidently determine which emails are truly ones will be able to email to, before you send the message. Email prospects who you know are actually at the company and be confident you have their correct email addresses. You can then easily push the dataset into Tableau to analyze the trends in email list health.
And, in case you were wondering, there is no training or install required for Project Springbok. The 80% of your time you used to spend on data preparation is now shrunk considerably, and this is after using only a few of Springbok’s capabilities. One more thing: You can even directly export from Project Springbok into Tableau via the “Export to Tableau TDE” menu item (Fig 5). Project Springbok creates a Tableau TDE file and you just double click on it to open Tableau to test out your hunch or gain your business insight.
Here are some other things you should know, to convince you that you, too, can only spend no more than 20% of you time on putting together data for your subsequent Tableau analysis:
- Springbok Sign-Up is Free
- Springbok automatically finds problems with your data, and lets you fix them with a single click
- Springbok suggests useful ways for you to combine different datasets, and lets you combine them effortlessly
- Springbok suggests useful summarizations of your data, and lets you follow through on the summarizations with a single click
- Springbok allows you to access data from your cloud or on-premise systems with a few clicks, and the automatically keep it refreshed. It will even tell you what data changed from the last time you saw it
- Springbok allows you to collaborate by sharing your prepared data with others
- Springbok easily exports your prepared data directly into Tableau for immediate analysis. You do not have to tell Tableau how to interpret the prepared data
- Springbok requires no training or installation
Go on. Shift your division of labor in the right direction, fast. Sign-Up for Springbok and stop wasting precious time on data preparation. http://bit.ly/TabBlogs
Are you going to be at Dreamforce this week in San Francisco? Interested in seeing Project Springbok working with Tableau in a live demonstration? Visit the Informatica or Tableau booths and see the power of these two solutions working hand-in-hand.Informatica is Booth #N1216 and Booth #9 in the Analytics Zone. Tableau is located in Booth N2112.
In my recent blog post, What Millennials Want, I mentioned that we are bringing you a special blog series, “Informatica Interns Ideas: 2014.” Today’s post is from Jacob Lauing, a Product Marketing Management Intern located in our Redwood City HQ office. Jacob is a journalism junior at California Polytechnic State University in San Luis Obispo (Cal Poly). When Jacob is not working at Informatica, you can find him editing news stories and managing a staff full of reporters as Editor in Chief of Mustang News, Cal Poly’s student newspaper.
5 Reasons Every College Student Should Intern at Informatica
As my 12-week internship at Informatica winds down, I’ve had a chance to reflect. My summer on Seaport Boulevard has been a productive one, with quality tech experience under my belt, a handful of industry connections and a ton of new friends.
If you’re reading this, perhaps you’re considering a stint at Informatica. Let me convince you.
It shouldn’t be too difficult.
1) Informatica’s products are essential to modern business
Coming from a non-technical background, I’ll admit that when I applied for this position, I was a little confused as to what Informatica actually did. The term “data integration” didn’t mean a whole lot to me. I figured – given the company’s heavily technical products – they didn’t have much of an impact outside of a niche market.
Boy, was I wrong.
Particularly because I’ve been working with Informatica’s customers, I realized they vary across all industries. From retail to healthcare to athletics, companies of all varietals use data to grow their business, and I’ve come to understand how important Informatica is in that process.
2) No stereotypical “intern” work
No, I never went on coffee runs.
It’s the fear of any intern, that he or she will be subjected to manual labor, acting as every employee’s slave.
That wasn’t the case this summer. From the start, I was put on projects that actually added value to the team. My competitive analysis research was used in an executive level board meeting to help Informatica position its products. My own web page designs will appear on the redesigned Informaticacloud.com. The copy I wrote promoting Informatica’s new release was published on Twitter.
I made a difference, and that’s an invaluable feeling.
3) Informatica is kind of a big deal…
The company just surpassed the $1 billion revenue mark, a major milestone that employees are quick to mention and clearly very proud of.
In the world of SaaS and data integration, Informatica is the top dog. If you’re looking to establish yourself and build a career in tech, it definitely wouldn’t hurt to have Informatica’s name on your resume.
Let’s not forget that most Informatica interns are still in college. We love to socialize and we love to have fun – there was no shortage.
With networking parties, San Francisco Giants games, trips to the Santa Cruz Beach Boardwalk and a boat cruise on the Bay, this summer was a blast. I got to meet a ton of other like-minded interns whom I consider friends.
I know we’ll keep in touch.
5) Informatica wants what’s best for you
I can’t speak to this point enough.
Some interns are destined to be software developers, and Informatica gave them the platform to do so. But other interns – me included – are still trying to figure out the whole career thing. Surprisingly, the folks running Informatica’s intern program were just as attentive to my needs.
Bottom line is they want what’s best for you.
Whether that entails a career at Informatica or a job elsewhere, they want to help you find your way. They want you to learn not only about an exciting industry and skillset, but also about yourself.
And that’s what college is all about, right?
In my recent blog post, What Millennials Want, I mentioned that we are bringing you a special blog series, “Informatica Interns Ideas: 2014.” Today’s post, “What Millennials Want – Putting My Potential to Work, at Work!”, comes from Amitha Narayanan, a Technical Writing Documentation Intern located in our Redwood City HQ office. Amitha is pursuing her Masters in Technical Communication at North Carolina State University. When Amitha is not authoring technical content, you can find her outdoors, exploring the city, reading books on park benches, or doing yoga.
Putting my potential to work, at work!
Have you ever had an Aha! moment that changed you forever? If yes, you know what I mean when I say “that feels great.” If no, I pray you do one day. I had two epiphanies this summer.
My mentor entrusted me with the task of creating video tutorials for our users of Big Data Trial Sandbox for Cloudera. The first time I heard the string of words, it was all Greek to me. I had to do voice narration too, which was my least favorite part. All in all, I was off to a shaky start! However, as the summer progressed, I made small steps into the world of Big Data. The Big Data videos gave me confidence in a BIG way. I now love the sound of my own voice and am positive that I can do this stuff! This experience single-handedly kicked years of me restricting myself from exciting possibilities.
Every time I hear people talk about their love for networking, I wish I felt the same way about it as they did. This summer, I got my wish. Informatica gave me the perfect non-threatening environment to interact and grow to be comfortable in my own skin. The networking event for interns turned out of be one of my most favorite events of the summer. From the prep tips to the informal setting to the absolute coolest senior management, the event was a wholesome package. For a person like me who tends to shy away from these, it was a revelation that networking does not have to be intimidating!
I now know that all of us have “wow factors” in us, sometimes it takes a bit of work to help us realize what they are.
In my recent blog post, What Do Millennials Want?, I mentioned that we are bringing you a special blog series, “Informatica Interns Ideas: 2014.” Today’s post comes from Sai Avala, a Software Engineering Intern on the Project Springbok Team located in our Redwood City HQ office. Sai is a rising senior at University of Texas at Austin (UT). He’s currently pursuing his degree in Computer Science and Certificates in Scientific Computing and Statistical Modeling. He helped found a mobile app development organization at UT, and teaches his own Android App development class every week. When Sai isn’t at work, he’s most likely hanging out with friends or coding up a new project. Oh yeah, he’s interning with Informatica for the second time!. His post today is titled “Making This Your Best Internship Yet.”
Like many of us, I had a number of choices when it came to my summer internship. And, I chose Informatica not once, but twice. That’s right; this is my second as an intern at Informatica. Why you ask? Good question.
When I was trying to make my decision, it was really pretty easy. My criteria were that I needed to feel comfortable approaching management with ideas, I needed to make an impact, and I needed to learn and have fun. And, based on my experience last summer, I knew I could find all of that at Informatica.
I like coming up with new ideas. Here, I feel comfortable approaching management to pitch my ideas and to give my suggestions. And, I know, based on my experience last year, that they take my suggestions seriously. One of the highlights of last summer was seeing the work I did actually integrated into the product.
At Informatica you have the ability to drive your career. You also have the ability to explore and cultivate your interests. Just recently, I was thinking about some new ways to market Project Springbok (One of Informatica’s new products). I saw the Chief Strategy Officer (Ivan Chong) walk by (he heads the team), stopped him, and started to pitch it to him. Next thing you know, we’re at my desk and we’re hammering the details out. It’s definitely rare to see this kind of interaction between an intern and someone on the Executive Leadership Team
Whether you’re interested in coming up with new ideas and strategies, software development, marketing, or sales, look for ways to grow your skills, I really feel like Once a company reaches the “mid-size” status or larger, it can be hard to find your place. But, you know what, that’s not a problem here at Informatica. There’s so many great people, that literally everyone here has the chance to shine.