Tag Archives: Big Data

British Cycling: A Big Data Champion?

Big_Data

British Cycling: A Big Data Champion?

I think I may have gone to too many conferences in 2014 in which the potential of big data was discussed.  After a while all the stories blurred into two main themes:

  1. Companies have gone bankrupt at a time when demand for their core products increased.
  2. Data from mobile phones, cars and other machines house a gold mine of value – we should all be using it.

My main take away from 2014 conferences was that no amount of data is a substitute for poor strategy, or lack of organisational agility to adapt business processes in times of disruption.  However, I still feel as an industry our stories are stuck in the phase of ‘Big Data Hype’, but most organisations are beyond the hype and need practicalities, guidance and inspiration to turn their big data projects into a success.  This is possibly due to a limited number of big data projects in production, or perhaps it is too early to measure the long term results of existing projects.  Another possibility is that the projects are delivering significant competitive advantage, so the stories will remain under wraps for the time being.

However, towards the end of 2014 I stumbled across a big data success story in an unexpected place.  It did (literally) provide competitive advantage, and since it has been running for a number of years the results are plain to see.  It started with a book recommendation from a friend.   ‘Faster’ by Michael Hutchinson is written as a self-propelled investigation as to the difference between world champion and world class althletes.  It promised to satisfy my slightly geeky tendency to enjoy facts, numerical details and statistics.  It did this – but it really struck me as a ‘how-to’ guide for big data projects.

Mr Hutchinson’s book is an excellent read as an insight into professional cycling by a professional cyclist.  It is stacked with interesting facts and well-written anecdotes, and I highly recommend the reading the book.  Since the big-data aspect was a sub-plot, I will pull out the highlights without distracting from the main story.

Here are the five steps I extracted for big data project success:

1. Have a clear vision and goal for your project

The Sydney Olympics in 2000 had only produced 4 medals across all cycling disciplines for British cyclists.  With a home Olympics set for 2012, British Cycling desperately wanted to improve this performance.  Specific targets were clearly set across all disciplines stated in times that an athlete needed to achieve in order to win a race.

2. Determine data the required to support these goals

Unlike many big data projects which start with a data set and then wonder what to do with it, British Cycling did this the other way around.  They worked out what they needed to measure in order to establish the influencers on their goal (track time) and set about gathering this information.  In their case this involved gathering wind tunnel data to compare & contrast equipment, as well as physiological data from athletes and all information from cycling activities.

3. Experiment in order to establish causality

Most big data projects involve experimentation by changing the environment  whilst gathering a sub-set of data points.  The number of variables to adjust in cycling is large, but all were embraced. Data (including video) was gathered on the effects of small changes in each component:  Bike, Clothing, Athlete (training and nutrition).

4. Guide your employees on how to use the results of the data

Like many employees, cyclists and coaches were convinced of the ‘best way’ to achieve results based on their own personal experience.  Analysis of data in some cases showed that the perceived best way, was in fact not the best way.   Coaching staff trusted the data, and convinced the athletes to change aspects of both training and nutrition.  This was not necessarily easy to do, as it could mean fundamental changes in the athlete’s lifestyle.

5. Embrace innovation

Cycling is a very conservative sport by nature, with many of the key innovations coming from adjacent sports such as triathlon.  Data however, is not steeped in tradition and does not have pre-conceived ideas as to what equipment should look like, or what constitutes an excellent recovery drink.  What made British Cycling’s big data initiatives successful is that they allowed themselves to be guided by the data and put the recommendations into practice.  Plastic finished skin suits are probably not the most obvious choice for clothing, but they proved to be the biggest advantage cyclist could get.  Far more than tinkering with the bike.  (In fact they produced so much advantage they were banned shortly after the 2008 Olympics.)

The results:  British Cycling won 4 Olympic medals in 2000, one of which was gold.  In 2012 they grabbed 8 gold, 2 silver and 2 bronze medals.  A quick glance at their website shows that it is not just Olympic medals they are wining – but medals won across all world championship events has increased since 2000.

To me, this is one of the best big data stories, as it directly shows how to be successful using big data strategies in a completely analogue world.  I think it is more insightful that the mere fact that we are producing ever-increasing volumes of data.  The real value of big data is in understanding what portion of all avaiable data will constribute to you acieving your goals, and then embracing the use the results of analysis to make constructive changes in daily activities.

But then again, I may just like the story because it involves geeky facts, statistics and fast bicycles.

Share
Posted in Big Data, Business Impact / Benefits, Data Integration, Data Quality, Data Security, Data Services | Tagged , , , | Leave a comment

“It’s not you, it’s me!” – says Data Quality to Big Data

“It’s not you, it’s me!” – says Data Quality to Big Data

“It’s not you, it’s me!” – says Data Quality to Big Data

I couldn’t help myself start this blog with George Costanza’s “You are giving me the – It’s not you, it’s me! – routine? I invented – It’s not you, it’s me …”

The thing that resonates today, in the odd context of big data, is that we may all need to look in the mirror, hold a thumb drive full of information in our hands, and concede once and for all It’s not the data… it’s us.

Many organizations have a hard time making something useful from the ever-expanding universe of big-data, but the problem doesn’t lie with the data: It’s a people problem.

The contention is that big-data is falling short of the hype because people are:

  1. too unwilling to create cultures that value standardized, efficient, and repeatable information, and
  2. too complex to be reduced to “thin data” created from digital traces.

Evan Stubbs describes poor data quality as the data analyst’s single greatest problem.


About the only satisfying thing about having bad data is the schadenfreude that goes along with it. There’s cold solace in knowing that regardless of how poor your data is, everyone else’s is equally as bad. The thing is poor quality data doesn’t just appear from the ether. It’s created. Leave the dirty dishes for long enough and you’ll end up with cockroaches and cholera. Ignore data quality and eventually you’ll have black holes of untrustworthy information. Here’s the hard truth: we’re the reason bad data exists.


I will tell you that most data teams make “large efforts” to scrub their data. Those “infrequent” big cleanups however only treat the symptom, not the cause – and ultimately lead to inefficiency, cost, and even more frustration.

It’s intuitive and natural to think that data quality is a technological problem. It’s not; it’s a cultural problem. The real answer is that you need to create a culture that values standardized, efficient, and repeatable information.

If you do that, then you’ll be able to create data that is re-usable, efficient, and high quality. Rather than trying to manage a shanty of half-baked source tables, effective teams put the effort into designing, maintaining, and documenting their data. Instead of being a one-off activity, it becomes part of business as usual, something that’s simply part of daily life.

However, even if that data is the best it can possibly be, is it even capable of delivering on the big-data promise of greater insights about things like the habits, needs, and desires of customers?

Despite the enormous growth of data and the success of a few companies like Amazon and Netflix, “the reality is that deeper insights for most organizations remain elusive,” write Mikkel Rasmussen and Christian Madsbjerg in a Bloomberg Businessweek blog post that argues “big-data gets people wrong.”


Big-data delivers thin data. In the social sciences, we distinguish between two types of human behavior data. The first – thin data – is from digital traces: He wears a size 8, has blue eyes, and drinks pinot noir. The second – rich data – delivers an understanding of how people actually experience the world: He could smell the grass after the rain, he looked at her in that special way, and the new running shoes made him look faster. Big-data focuses solely on correlation, paying no attention to causality. What good is thin “information” when there is no insight into what your consumers actually think and feel?


Accenture reported only 20 percent of the companies it profiled had found a proven causal link between “what they measure and the outcomes they are intending to drive.”

Now, I can contend they keys to transforming big-data to strategic value are critical thinking skills.

Where do we get such skills? People, it seems, are both the problem and the solution. Are we failing on two fronts: failing to create the right data-driven cultures, and failing to interpret the data we collect?

Twitter @bigdatabeat

Share
Posted in Architects, Big Data, Business Impact / Benefits, CIO, Data Governance, Data Quality, Data Transformation, Hadoop | Tagged , , , | Leave a comment

Dark Data in Government: Sounds Sinister

Dark Data in Government: Sounds Sinister

Dark Data in Government: Sounds Sinister

Anytime I read about something characterized as “dark”, my mind immediately jumps to a vision of something sneaky or sinister, something better left unsaid or undiscovered. Maybe I watched too many Alfred Hitchcock movies in my youth, who knows. However, when coupled with the word “data”, “dark” is anything BUT sinister. Sure, as you might agree, the word “undiscovered” may still apply, but, only with a more positive connotation.

To level set, let’s make sure you understand my definition of dark data. I prefer using visualizations when I can so, picture this: the end of the first Indiana Jones movie, Raiders of the Lost Ark. In this scene, we see the Ark of the Covenant, stored in a generic container, being moved down the aisle in a massive warehouse full of other generic containers. What’s in all those containers? It’s pretty much anyone’s guess. There may be a record somewhere, but, for all intents and purposes, the materials stored in those boxes are useless.

Applying this to data, once a piece of data gets shoved into some generic container and is stored away, just like the Arc, the data becomes essentially worthless. This is dark data.

Opening up a government agency to all its dark data can have significant impacts, both positive and negative. Here are couple initial tips to get you thinking in the right direction:

  1. Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
  2. Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
  3. Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
  4. Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
  5. Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
  6. Store it – it’s interesting how often agencies think this is the first step
  7. Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
  8. Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.

Clearly, there’s more to shining the light on dark data than I can offer in this post. If you’d like to take the next step to learning what is possible, I suggest you download the eBook, The Dark Data Imperative.

Share
Posted in Big Data, Data Warehousing, Enterprise Data Management, Governance, Risk and Compliance, Intelligent Data Platform, Public Sector | Tagged , , | Leave a comment

Rising DW Architecture Complexity

Rising DW Architecture Complexity

Rising DW Architecture Complexity

I was talking to an architect-customer last week at a company event and he was describing how his enterprise data warehouse architecture was getting much more complex after many years of relative calm and stability.  In the old days of yore, you had some data sources, a data warehouse (with single database), and some related edge systems.

The current trend is that new types of data and new types of physical storage are changing all of that.

When I got back from my trip I found a TDWI white paper by Philip Russom that describes the situation very well in a white paper detailing his research on this subject;  Evolving Data Warehouse Architectures in the Age of Big Data.

From an enterprise data architecture and management point of view, this is a very interesting paper.

  • First the DW architectures are getting complex because of all the new physical storage options available
    • Hadoop – very large scale and inexpensive
    • NoSQL DBMS – beyond tabular data
    • Columnar DBMS – very fast seek time
    • DW Appliances – very fast / very expensive
  • What is driving these changes is the rapidly-increasing complexity of data. Data volume has captured the imagination of the press, but it is really the rising complexity of the data types that is going to challenge architects.
  • But, here is what really jumped out at me. When they asked the people in their survey what are the important components of their data warehouse architecture, the answer came back; Standards and rules.  Specifically, they meant how data is modeled, how data quality metrics are created, metadata requirements, interfaces for data integration, etc.

The conclusion for me, from this part of the survey, was that business strategy is requiring more complex data for better analyses (example: realtime response or proactive recommendations) and business processes (example: advanced customer service).  This, in turn, is driving IT to look into more advanced technology to deal with different data types and different use cases for the data.  And finally, the way they are dealing with the exploding complexity was through standards, particularly data standards.  If you are dealing with increasing complexity and have to do it better, faster and cheaper, they only way you are going to survive is by standardizing as much as reasonably makes sense.  But, not a bit more.

If you think about it, it is good advice.  Get your data standards in place first.  It is the best way to manage the data and technology complexity.  …And a chance to be the driver rather than the driven.

I highly recommend reading this white paper.  There is far more in it than I can cover here. There is also a Philip Russom webinar on DW Architecture that I recommend.

Share
Posted in Architects, CIO | Tagged , , , , , , , , , , | Leave a comment

Is it the CDO or CAO or Someone Else?

Frank-Friedman-199x300A month ago, I shared that Frank Friedman believes CFOs are “the logical choice to own analytics and put them to work to serve the organization’s needs”. Even though many CFOs are increasingly taking on what could be considered an internal CEO or COO role, many readers protested my post which focused on reviewing Frank Friedman’s argument. At the same time, CIOs have been very clear with me that they do not want to personally become their company’s data steward. So the question becomes should companies be creating a CDO or CAO role to lead this important function? And if yes, how common are these two roles anyway?

Data analyticsRegardless of eventual ownership, extracting value out of data is becoming a critical business capability. It is clear that data scientists should not be shoe horned into the traditional business analyst role. Data Scientists have the unique ability to derive mathematical models “for the extraction of knowledge from data “(Data Science for Business, Foster Provost, 2013, pg 2). For this reason, Thomas Davenport claims that data scientists need to be able to network across an entire business and be able to work at the intersection of business goals, constraints, processes, available data and analytical possibilities. Given this, many organizations today are starting to experiment with the notion of having either a chief data officers (CDOs) or chief analytics officers (CAOs). The open questions is should an enterprise have a CDO or a CAO or both? And as important in the end, it is important to determine where should each of these roles report in the organization?

Data policy versus business questions

Data analyticsIn my opinion, it is the critical to first look into the substance of each role before making a decision with regards to the above question. The CDO should be about ensuring that information is properly secured, stored, transmitted or destroyed.  This includes, according to COBIT 5, that there are effective security and controls over information systems. To do this, procedures need to be defined and implemented to ensure the integrity and consistency of information stored in databases, data warehouses, and data archives. According to COBIT 5, data governance requires the following four elements:

  • Clear information ownership
  • Timely, correct information
  • Clear enterprise architecture and efficiency
  • Compliance and security

Data analyticsTo me, these four elements should be the essence of the CDO role. Having said this, the CAO is related but very different in terms of the nature of the role and the business skills require. The CRISP model points out just how different the two roles are. According to CRISP, the CAO role should be focused upon business understanding, data understanding, data preparation, data modeling, and data evaluation. As such the CAO is focused upon using data to solve business problems while the CDO is about protecting data as a business critical asset. I was living in in Silicon Valley during the “Internet Bust”. I remember seeing very few job descriptions and few job descriptions that existed said that they wanted a developer who could also act as a product manager and do some marketing as a part time activity. This of course made no sense. I feel the same way about the idea of combining the CDO and CAO. One is about compliance and protecting data and the other is about solving business problems with data. Peanut butter and chocolate may work in a Reese’s cup but it will not work here—the orientations are too different.

So which business leader should own the CDO and CAO?

Clearly, having two more C’s in the C-Suite creates a more crowded list of corporate officers. Some have even said that this will extended what is called senior executive bloat. And what of course how do these new roles work with and impact the CIO? The answer depends on organization’s culture, of course. However, where there isn’t an executive staff office, I suggest that these roles go to different places. Clearly, many companies already have their CIO function already reporting to finance. Where this is the case, it is important determine whether a COO function is in place. The COO clearly could own the CDO and CAO functions because they have a significant role in improving process processes and capabilities. Where there isn’t a COO function and the CIO reports to the CEO, I think you could have the CDO report to the CIO even though CIOs say they do not want to be a data steward. This could be a third function in parallel the VP of Ops and VP of Apps. And in this case, I would put the CAO report to one of the following:  the CFO, Strategy, or IT. Again this all depends on current organizational structure and corporate culture. Regardless of where it reports, the important thing is to focus the CAO on an enterprise analytics capability.

Related Blogs

Should we still be calling it Big Data?

Is Big Data Destined To Become Small And Vertical?

Big Data Why?

What is big data and why should your business care?

Author Twitter: @MylesSuer

Share
Posted in Big Data, CIO | Tagged , , , , , , | Leave a comment

Major Oil Company Uses Analytics to Gain Business Advantage

analytics case studies-GasAccording Michelle Fox of CNBC and Stephen Schork, the oil industry is in ‘dire straits’. U.S. crude posted its ninth-straight weekly loss this week, landing under $50 a barrel. The news is bad enough that it is now expected to lead to major job losses. The Dallas Federal Reserve anticipates that the Texas could lose about 125,000 jobs by the end of June. Patrick Jankowski, an economist and vice president of research at the Greater Houston Partnership, expects exploration budgets will be cut 30-35 percent, which will result in approximately 9,000 fewer wells being drilled. The problem is “if oil prices keep falling, at some point it’s not profitable to pull it out of the ground” (“When, and where, oil is too cheap to be profitable”, CNBC, John W. Schoen). job losses 

This means that a portion of the world’s oil supply will become unprofitable to produce. According to Wood Mackenzie, “once the oil price reaches these levels, producers have a sometimes complex decision to continue producing, losing money on every barrel produced, or to halt production, which will reduce supply”. The question are these the only answers?

Major Oil Company Uses Analytics to Gain Business Advantage

analytics case studiesA major oil company that we are working with has determined that data is a success enabler for their business. They are demonstrating what we at Informatica like to call a “data ready business”—a business that is ready for any change in market conditions. This company is using next generation analytics to ensure their businesses survival and to make sure they do not become what Jim Cramer likes to call a “marginal producer”.  This company has said to us that their success is based upon being able to extract oil more efficiently than its competitors.

Historically data analysis was pretty simple

analytics case studies-oil drillingTraditionally oil producers would get oil by drilling a new hole in the ground.  And in 6 months they would start getting the oil flowing commercially and be in business. This meant it would typically take them 6 months or longer before they could get any meaningful results including data that could be used to make broader production decisions.

Drilling from data

Today, oil is, also, produced from shale or fracking techniques.  This process can take only 30-60 days before oil producers start seeing results.  It is based not just on innovation in the refining of oil, but also on innovation in the refining of data from operational business decisions can be made. The benefits of this approach including the following:

Improved fracking process efficiency

analytics case studies-FrackingFracking is a very technical process. Producers can have two wells on the same field that are performing at very different levels of efficiency. To address this issue, the oil company that we have been discussing throughout this piece is using real-time data to optimize its oil extraction across an entire oil field or region. Insights derived from these allow them to compare wells in the same region for efficiency or productivity and even switch off certain wells if the oil price drops below profitability thresholds. This ability is especially important as the price of oil continues to drop.  At $70/barrel, many operators go into the red while more efficient data driven operators can remain profitable at $40/barrel.  So efficiency is critical across a system of wells.

Using data to decide where to build wells in the first place

When constructing a fracking or sands well, you need more information on trends and formulas to extract oil from the ground.  On a site with 100+ wells for example, each one is slightly different because of water tables, ground structure, and the details of the geography. You need the right data, the right formula, and the right method to extract the oil at the best price and not impact the environment at the very same time.

The right technology delivers the needed business advantage

analytics case studiesOf course, technology is never been simple to implement. The company we are discussing has 1.2 Petabytes of data they were processing and this volume is only increasing.  They are running fiber optic cables down into wells to gather data in real time. As a result, they are receiving vast amounts of real time data but cannot store and analyze the volume of data efficiently in conventional systems. Meanwhile, the time to aggregate and run reports can miss the window of opportunity while increasing cost. Making matters worse, this company had a lot of different varieties of data. It also turns out that quite of bit of the useful information in their data sets was in the comments section of their source application.  So traditional data warehousing would not help them to extract the information they really need. They decided to move to new technology, Hadoop. But even seemingly simple problems, like getting access to data were an issue within Hadoop.  If you didn’t know the right data analyst, you might not get the data you needed in a timely fashion. Compounding things, a lack of Hadoop skills in Oklahoma proved to be a real problem.

The right technology delivers the right capability

The company had been using a traditional data warehousing environment for years.  But they needed help to deal with their Hadoop environment. This meant dealing with the volume, variety and quality of their source well data. They needed a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn the internals of Hadoop. Early adopters of Hadoop and other Big Data technologies have had no choice but to hand-code using Java or scripting languages such as Pig or Hive. Hiring and retaining big data experts proved time consuming and costly. This is because data scientists and analysts can spend only 20 percent of their time on data analysis and the rest on the tedious mechanics of data integration such as accessing, parsing, and managing data. Fortunately for this oil producer, it didn’t have to be this way. They were able to get away with none of the specialized coding required to scale performance on distributed computing platforms like Hadoop. Additionally, they were able “Map Once, Deploy Anywhere,” knowing that even as technologies change they can run data integration jobs without having to rebuild data processing flows.

Final remarks

It seems clear that we live in an era where data is at the center of just about every business. Data-ready enterprises are able to adapt and win regardless of changing market conditions. These businesses invested in building their enterprise analytics capability before market conditions change. In this case, these oil producers will be able to produce oil at lower costs than others within their industry. Analytics provides three benefits to oil refiners.

  • Better margins and lower costs from operations
  • Lowers risk of environmental impact
  • Lower time to build a successful well

In essence, those that build analytics as a core enterprise capability will continue to have a right to win within a dynamic oil pricing environment.

Related links

Related Blogs

Analytics Stories: A Banking Case Study
Analytics Stories: A Financial Services Case Study
Analytics Stories: A Healthcare Case Study
Who Owns Enterprise Analytics and Data?
Competing on Analytics: A Follow Up to Thomas H. Davenport’s Post in HBR
Thomas Davenport Book “Competing On Analytics”

Solution Brief: The Intelligent Data Platform
Author Twitter: @MylesSuer

Share
Posted in Big Data, CIO, Data Quality | Tagged , , , | Leave a comment

Should We Still be Calling it Big Data?

shutterstock_227687962Several months ago, I was talking to some CIOs about their business problems. During these conversations, I asked them about their interest in Big Data. One sophisticated CIO recoiled almost immediately saying that he believes most vendors are really having a problem discussing “Big Data” with customers like him. It would just be so much easier if you guys would talk to me about helping my company with our structured data and unstructured data. At the same time, Gartner has found that 64% of enterprises surveyed say they’re deploying or planning to deploy a Big Data project. The problem is that 56% of those surveyed by Gartner are still struggling to determine how to get value out of big data projects and 23% are struggling with the definition of what is Big Data and what is not Big Data.

DavenportClearly, this says the term does not work with market and industry participants. To me this raises a question about the continued efficacy of the term. And now, Thomas Davenport, the author of “Competing on Analytics”, has suggested that we retire the term all together. Tom says that in his research “nobody likes the term”. He claims in particular that executives yearn for a better way to communicate what they are doing with data and analytics.

Tom suggests in particular that “Big Data” has five significant flaws:

1)      Big is relative. What is big today will not be so large tomorrow. Will we have to tall call the future version Big Big Data?

2)      Big is only one aspect of what is distinctive about the data in big data. Like my CIO friend said it is not as much about the size of data as it is about the nature of the data. Tom says bigness demands more powerful services, but a lack of structure demands different approaches to process the data.

3)      Big data is defined as having volume, variety, and velocity. But what do you call data that has variety and velocity but the data set is not “big”.

4)      What do you call the opposite of big data? Is it small data? Nobody likes this term either.

5)      Too many people are using “big data” incorrectly to mean any use of analytics, reporting, or conventional business intelligence.

QuestionTom goes onto say, “I saw recently, over 80 percent of the executives surveyed thought the term was overstated, confusing, or misleading”. So Tom asks why don’t we just stop using it. In the end, Tom struggles with ceasing his use of the term because the world noticed the name Big Data unlike other technological terms. Tom has even written a book on the subject—“Big Data at Work”. The question I have is do we in the IT industry want to really lose all the attention. It feels great to be in the cool crowd. However, CIOs that I have talked to say they are really worried about what will happen if their teams oversell Big Data and do not deliver tangible business outcomes. The reality Tom says it would be more helpful than saying, we are cool and we are working on big data to instead say instead we’re extracting customer transaction data from our log files in order to help marketing understand the factors leading to customer attrition”. I tend to agree with this thought but I would like to hear what you think? Should we as an industry retire the term Big Data?

Related links

Related Blogs

Is Big Data Destined To Become Small And Vertical?
Big Data Why?
What is big data and why should your business care?

Author Twitter: @MylesSuer

Share
Posted in Data Governance | Tagged , , | Leave a comment

There are Three Kinds of Lies: Lies, Damned lies, and Data

Lies, Damned lies, and Data

Lies, Damned lies, and Data

The phrase Benjamin Disraeli used in the 19th century was: There are three kinds of lies: lies, damned lies, and statistics.

Not so long ago, Google created a Web site to figure out just how many people had influenza. How they did this was by tracking “flu-related search queries”, “location of the query,” and applied it to an estimation algorithm. According to the website, at the flu season’s peak in January, nearly 11 percent of the United States population may have influenza. This means that nearly 44 million of us will have had the flu or flu-like symptoms. In its weekly report the Centers for Disease Control and Prevention put this at 5.6%, which means that less than 23 million of us actually went to the doctor’s office to be tested for flu or to get a flu-shot.

Now, imagine if I were a drug manufacturer. There is a theory about what went wrong. The problems may be due to widespread media coverage of this year’s flu season. Then add social media, which helped news of the flu spread quicker than the virus itself. In other words, the algorithm is looking only at the numbers, not at the context of the search results.

In today’s digitally connected world, data is everywhere: in our phones, search queries, friendships, dating profiles, cars, food, and reading habits. Almost everything we touch is part of a larger data set. The people and companies that interpret the data may fail to apply background and outside conditions to the numbers they capture.

Now, while we build our big data repositories, we have to spend some time to explain how we collected the data and under what context.

Twitter @bigdatabeat

Share
Posted in Big Data, Cloud Data Management, Data Governance, Data Transformation, Data Warehousing, Hadoop | Tagged , , , , | Leave a comment

Enterprise Architecture for 8th Graders

Enterprise Architecture for 8th Graders

Enterprise Architecture for 8th Graders

Communicating a complex topic like Enterprise Architecture in a simple way that avoids technical terms and buzzwords is not easy. This blog builds on earlier articles by Rob Karel than explains Big Data and Metadata to Mom and by John Schmidt that explains Integration to kids by focusing on Enterprise Architecture for an audience of 8th graders. The first thing I had to do was learn about the vocabulary and communication style of teenagers, so this internet article on “How 14 year old teenagers communicate” was helpful.  My findings, although not related to the topic of Enterprise Architecture, were revealing.

The article provided the following recommendations when talking to teenagers:

1. Talk with them and not at them.
2. Ask questions that go beyond “yes” or “no” answers to prompt more developed conversation.
3. Take advantage of time during car trips to talk with your teen.
4. Make time for sporting and school events, playing games, and talking about current events.

Let’s see how these recommendations could be applied:

1. Talk to them and not at them

Ask your teenager the following question – Have you ever heard about Enterprise Architecture?

“It is a way to help a company understand its customers and how its products are made and sold.  It helps managers improve the way the company works and how technology is used to help people do a better job”.

2. Ask questions that go beyond “yes” or “no” answers to prompt more developed conversation.

Ask your teenager the following question:  What do you want to do when you grow up? – Depending on the answer you may need to customize the text below.

“Enterprise Architecture helps you understand the needs of (the industry selected by your teenager).  It will then tell you the typical activities that employees do and the systems and technologies that are used to simplify those activities”.

3. Take advantage of time during car trips to talk with your teen.

Imagine the following scenario with your teenager – we can do the following exercise. Let’s assume your teenager wants be part of an advertising agency for the entertainment industry.

“We should count the number of billboards on the side of the road and note how many are movie advertisements. I am interested in your opinion of which advertising style sparks your interest in a specific movie”.

“If we do that we can then discuss the different activities that are required in making that advertising material and how to make the images speak to you. Enterprise Architecture also does that. It helps you understand the activities required in any business, step by step, allowing you to create templates or graphics that represent any industry”.

4. Make time for sports and school events, play games, and talk about current events.

Another sample conversation to support this recommendation could be:

“Let’s go to the movies this week. Once you select one you would like to see, see if you can identify why you choose this particular movie over the other ones. If you think of the billboards that we saw, can you remember what motivated or influenced you?.  We can understand together what the designers of those images were creating in the visual experience. Perhaps you have new ideas or suggestions on how they could have done it better? With the help of Enterprise Architecture companies identify more efficient activities to generate more business”.

After researching the topic I realized that we could apply these recommendations to share Enterprise Architecture with our business partners. Perhaps the readers of this blog can help by using these recommendations with their teenagers at home to explain the basic concepts of Enterprise Architecture and collectively create a simpler way to talk about Enterprise Architecture.

Share
Posted in Enterprise Data Management, Hadoop | Tagged , , | Leave a comment

Jumping on the Internet of Things (IoT) Band Wagon?

Jumping on the IoT band wagon

IoT and the Smart Home

There is a new “Band Wagon” out there and it’s not Big Data. If you were at this year’s CES Show this past week, it would have been impossible even with a “Las Vegas-size” hangover not to have heard the hype around the Internet of Things (IoT).  The Internet of Things includes anything and everything that is connected to the Internet and able to communicate and share information with other “smart” devices. This year as well as last it was about home appliances, fitness and health monitors, home security systems, Bluetooth enabled toothbrushes, sensors in shoes to monitor weight and mileage, thermostats that monitor humidity and sound, to kitchen utensils that can track and monitor the type of food you cook and eat.

If you ask me, all these devices and the IoT movement is both cool and creepy. Cool in the sense that networking technology has both matured and become affordable for devices to transmit data for companies to turn into actionable intelligence. IoT is creepy in the sense where do I really want someone monitoring what I cook or how many times I wake up and night?  Like other hype cycles or band wagons, there are different opinions as to the size of the IoT market.  Gartner expects it to include nearly 26 billion devices, with a “global economic value-add” of $1.9 trillion by 2020.  The question is whether the Internet of Things is truly transformational to our daily lives?  The answer to that really depends on being able to harness all that data into information. Just because my new IoT toothbrush can monitor and send data on how many times I brush my teeth, it doesn’t provide any color whether that makes me healthier or have a prettier smile :).

To help answer these questions, here are examples and potential use cases of leveraging all that Big Data from Small devices of the IoT world:

  • Mimo’s Smart Baby Monitor is aimed at helping to prevent SIDS, the Mimo monitor is a new kind of infant monitor that provides parents with real-time information about their baby’s breathing, skin temperature, body position, and activity level on their smartphones.
  • GlowCaps fit prescription bottles and via a wireless chip provide services that help people stick with their prescription regimen; from reminder messages, all the way to refill and doctor coordination.
  • BeClose offers a wearable alarm button and other discrete wireless sensors placed around the home, the BeClose system can track your loved one’s daily routine and give you peace of mind for their safety by alerting you to any serious disruptions detected in their normal schedule.
  • Postscapes provides technology a suite of sensors and web connectivity help save you time and resources by keeping plants fed based on their actual growing needs and conditions while automating much of the labor processes.
  • OnFarm solution combines real-time sensor data from soil moisture levels, weather forecasts, and pesticide usage from farming sites into a consolidated web dashboard. Farmers can use this data with advanced imaging and mapping information to spot crop issues and remotely monitor all of the farms assets and resource usage levels.
  • Banks and auto lenders are using cellular GPS units that report location and usage of financed cars in addition to locking the ignitions to prevent further movement in the case of default.
  • Sensors on farm equipment now provides real-time intelligence on how many hours trackers are used, the weather conditions to predict mechanical problems, and measuring the productivity of the farmer to predict trends in the commodity market.

I can see a number of other potential use cases for IoT including:

  • Health devices not only sending data but receiving data from other IoT devices to provide real time recommendations on workout routines based on weather data received from real-time weather sensors, food intake from kitchen devices, to nutritional information based on vitamins and medications consumed by the wearer.
  • Credit card banks leveraging their GPS tracking device data from auto loan customers to combine it with credit card data to deliver real-time offers on merchant promotions while on the road.
  • GPS tracking devices on hotel card keys to track where you go, eat, entertain to deliver more customized services and offers while one is on a business trip or vacation.
  • Boxing gloves transmitting the impact and force of a punch to monitor for athlete concussions.

What does this all mean?

The Internet of Things has changed the way we live and do business and will continue to shape the future hopefully in a positive way. Harnessing all of that Big Data from Small devices does not come easily. Every device that generates data sends it to some central system through WiFi or cellular network.  Once in that central system, it needs to be access, translated, transformed, cleansed, and standardized for business use with data from other systems that run the business.  For example:

  • Access, transform, and validate data from IoT with data generated from other business applications. Formats and values will be often different and change over time and needs to be rationalized and standardized for downstream business use. Otherwise, you end up with a bunch of Alphas and Numerics that make no sense.
  • Data quality and validation: Just because a sensor can send data, it does not mean it will send the right data or data that is right for a business user trying to make sense of it. GPS data requires accurate coordinate data. If any value is transmitted incorrectly, it is important to identify those errors; more importantly correct it so the business can take action.  This is especially important when combining like values (e.g. Weather status = Cold, Wet, Hot however the device is sending A,B, C)
  • Shared with other systems: Once your data is ready to be consumed by new and existing analytic applications, marketing systems, CRM, or your fraud surveillance systems, it needs to be available in in real-time if required, in the right format, and structure as required by those  applications and doing it in a way that is seamless, automated, and does not require heavy IT lifting.

In closing, IoT’s future is bright along with the additional insights gained from all that data.  Consider it Cool or Creepy one thing is for sure, the IoT band wagon is in full swing!

Share
Posted in Big Data, Vibe | Tagged , , | Leave a comment