Category Archives: Professional Services
Over and over, when talking with people who are starting to learn Data Science, there’s a frustration that comes up: “I don’t know which programming language to start with.”
Moreover, it’s not just programming languages; it’s also software systems like Tableau, SPSS, etc. There is an ever-widening range of tools and programming languages and it’s difficult to know which one to select.
I get it. When I started focusing heavily on data science a few years ago, I reviewed all of the popular programming languages at the time: Python, R, SAS, D3, not to mention a few that in hindsight, really aren’t that great for analytics like Perl, Bash, and Java. I once read a suggestion to use arcane tools like UNIX’s AWK and SED.
There are so many suggestions, so much material, so many options; it becomes difficult to know what to learn first. There’s a mountain of content, and it’s difficult to know where to find the “gold nuggets”; the things to learn that will bring you the high return on time investment.
That’s the crux of the problem. The fact is – time is limited. Learning a new programming language is a large investment in your time, so you need to be strategic about which one you select. To be clear, some languages will yield a very high return on your investment. Other languages are purely auxiliary tools that you might use only a few times per year.
Let me make this easy for you: learn R first. Here’s why:
R is becoming the “lingua franca” of data science
R is becoming the lingua franca for data science. That’s not to say that it’s the only language, or that it’s the best tool for every job. It is, however, the most widely used and it is rising in popularity.
As I’ve noted before, O’Reilly Media conducted a survey in 2014 to understand the tools that data scientists are currently using. They found that R is the most popular programming language (if you exclude SQL as a “proper” programing language).
Looking more broadly, there are other rankings that look at programming language popularity in general. For example, Redmonk measures programming language popularity by examining discussion (on Stack Overflow) and usage (on GitHub). In their latest rankings, R placed 13th, the highest of any statistical programming language. Redmonk also noted that R has been rising in popularity over time.
A similar ranking by TIOBE, which ranks programming languages by the number of search engine searches, indicates a strong year over year rise for R.
Keep in mind that the Redmonk and TIOBE rankings are for all programming languages. When you look at these, R is now ranking among the most popular and most commonly used over all.
It’s often said that 80% of the work in data science is data manipulation. More often than not, you’ll need to spend significant amounts of your time “wrangling” your data; putting it into the shape you want. R has some of the best data management tools you’ll find.
The dplyr package in R makes data manipulation easy. It is the tool I wish I had years ago. When you “chain” the basic dplyr together, you can dramatically simplify your data manipulation workflow.
ggplot2 is one of the best data visualization tools around, as of 2015. What’s great about ggplot2 is that as you learn the syntax, you also learn how to think about data visualization.
I’ve said numerous times, that there is a deep structure to all statistical visualizations. There is a highly structured framework for thinking about and creating all data visualizations. ggplot2 is based on that framework. By learning ggplot2, you will learn how to think about visualizing data.
Finally, there’s machine learning. While I think most beginning data science students should wait to learn machine learning (it is much more important to learn data exploration first), machine learning is an important skill. When data exploration stops yielding insight, you need stronger tools.
When you’re ready to start using (and learning) machine learning, R has some of the best tools and resources.
One of the best, most referenced introductory texts on machine learning, An Introduction to Statistical Learning, teaches machine learning using the R programming language. Additionally, the Stanford Statistical Learning course uses this textbook, and teaches machine learning in R.
Summary: Learn R, and focus your efforts
Once you start to learn R, don’t get “shiny new object” syndrome.
You’re likely to see demonstrations of new techniques and tools. Just look at some of the dazzling data visualizations that people are creating.
Seeing other people create great work (and finding out that they’re using a different tool) might lead you to try something else. Trust me on this: you need to focus. Don’t get “shiny new object” syndrome. You need to be able to devote a few months (or longer) to really diving into one tool.
And as I noted above, you really want to build up your competence in skills across the data science workflow. You need to have solid skills at least in data visualization and data manipulation. You need to be able to do some serious data exploration in R before you start moving on.
Spending 100 hours on R will yield vastly better returns than spending 10 hours on 10 different tools. In the end, your time ROI will be higher by concentrating your efforts. Don’t get distracted by the “latest, sexy new thing.”
Whether you are establishing a new outsourced delivery model for your integration services or getting ready for the next round of contract negotiations with your existing supplier, you need a way to hold the supplier accountable – especially when it is an exclusive arrangement. Here are four key metrics that should be included in the multi-year agreement. (more…)
If your goal is to implement a world class Integration Competency Center (ICC) or COE, the best people you could find to make up the team already work for you. If you don’t currently have technical superstars on your team, you can still have a leading-edge world-class ICC that will “wow” your internal customers every time. You don’t need a world-class team to have a world-class competency center……you need a world-class management system. (more…)
They say people are resistant to change. I disagree. People are resistant to uncertainty. Once people are certain that a change is to their benefit, they will change so fast it will make your head spin. It would be a mistake however to underestimate the challenges of changing an organization from one where integration is a collaboration between two project silos to one where integration is a sustainable strategy with a common infrastructure based on strict standards and shared by everyone. (more…)
A CIO told me “After five years with an integration Center of Excellence, I expect them to be excellent. They aren’t.” But so what? The IT organization has lots of things to focus on. Is integration excellence really essential? (more…)
The cover of the September 10 issue of ComputerWorld caught my attention; the headline was Rebirth of Re-Engineering. I was intrigued how the analysts and pundits would spin Business Process Reengineering since I hadn’t seen the BPR acronym much since it fell out of favor around the turn of the century. As it turns out, the NEW BPR is all about Lean and Agile and is being led by IT. Wow! (more…)
In my previous post I discussed effective stakeholder management and communications as a key enabler of successful data quality delivery. In this blog, I will discuss the importance of demonstrated project management fundamentals.
Large-scale, complex enterprise Data Quality and Data Management efforts are characterized by numerous activities and tasks being performed iteratively by multiple resources, across multiple work streams, with high volume units of work (i.e. dozens of source systems and data objects, hundreds of tables, thousands of data elements, hundreds of thousands of data defects and millions of records). Without the means to effectively define, plan and manage these efforts, success is nearly impossible. (more…)
This week we had the privilege of participating in two significant conferences taking place in San Francisco. I was on a CMO panel at the B2B Digital Edge Live conference (#DELiveSF), while my colleague Daniel West presented at the Forrester Annual Enablement Forum (#tse12). I found it intriguing how both conferences focused on the same end-result … the “Customer”.
In some respects this is quite surprising given one normally associates Enablement with the process of training sales on how to sell, while marketing always talks about promoting thought leadership into the social network or generating leads from prospects. So why the change?
I think the answer here relates to how both disciplines are moving forward in this modern era driven by social networking. No longer is it just a one-way dialog between vendor and customer – you know, where the vendor promotes products & services via a web-site, or advertises in a magazine. It is now imperative that there is a two-way dialog. Customers are no longer silent! They talk, and they discuss – both good and bad. Vendors need to focus on ensuring their customers are successful. This means focusing on “listening” to their customers and understanding what total customer success means to them – whether it is online, in user groups, at events or in one-to-one meetings. Interestingly, this is one of the fundamental tenents of cloud computing through which service is paramount in order to drive repeatable subscription revenues.
Hence the focus of enablement must shift from simply training sales, and move to enabling sales to foster relationships with customers in order to deliver solutions that really deliver on key business imperatives. The entire value delivery chain (from first contact through to sale, implementation and ongoing success) must be aligned and working for customer success – because vendors are now visibly under the microscope and increasingly being compared and discussed in public. Several comments jumped out at me from the live conference twitter stream (#tse12):
- Certify sales people on talking to buyers, not talking about products.
- 86% of business buyers engage in web research independent of sales cycle.
- 8 months ago, enablement was nice to have, now it is recognized as a must have
- Sales Enablement = Make your customer a hero. That’s why I use “future advocate” and NOT “prospect”.
Strong words indeed which then align with the role of modern marketing teams – Engaging with customers through their chosen social networks to discuss their needs and help position solutions for their success. The role of marketing then becomes increasingly focused on finding the early stage researchers as they engage on social networks and leverage online assets. The role of marketing has now moved to that of engaging online, embracing customers and engaging in ongoing dialog. Again, several topics jumped out from the live conference twitter stream (#DELiveSF):
- Enable B2B salespeople to do what they do best, with digital at the core: data, content, mobile, social, CRM.
- B2B marketing: Start with audience design. Target the influencers of the influencers & create content in places they seek it.
- Marketing direction for digital: brands need to become publishers. Content is king!
- Digital Edge Live: Control social mess before it controls you.
That last point is key – a significant problem is that this modern world of online proactive marketing has become complicated. At the B2B Digital marketing conference, we were asked by the moderator, Kate Maddox, on what our greatest challenges were in digital marketing. Three topics that interested me:
- Joining the dots between out-bound email marketing with social media to nurture customers and prospects efficiently.
- The cultural change associated with evolving from an old-fashioned traditional organization to a leading social enterprise.
- Understanding where user groups now exist – on traditional web-sites – or beyond in the social network of linkedIn, Facebook and other networks.
Marketing and Enablement are evolving rapidly into adjacent displines linked with a common goal of embracing the customer and ensuring that the entire value delivery chain is focused on their success – because without their success we are simply fooling ourselves into believing we are building a sustainable and successful business model.
What do you think?
This article explores Agile Data Integration and Business Intelligence practices and contrasts leading practices and technologies. First some definitions.
Agile DI is the application of agile techniques (iterative/incremental development, cross-functional self-organizing teams, rapid/flexible response to change, etc.) to address data integration challenges such as migrating data between systems or consolidating data from multiple systems. Agile BI is the application of agile techniques to address business intelligence challenges such as identifying and analyzing data to support better business decision-making. These two disciplines sometimes overlap or support each other. For example, you might use Agile DI to move data into a data warehouse and Agile BI to get it out of the warehouse in a useful form. (more…)