Big Data Analytics Talent is Just Waiting to be Tapped

Last year, data analytics guru Tom Davenport dismissed the notion that there’s a shortage of data scientist skills. Yes, they were, at one point, as “rare as vegetarian dogs.” Lately, however, while “there may not be a glut of data scientists, but they are much easier to find and hire than they used to be. If you can offer only a decent salary, you’ll probably be OK. And even the most traditional business can hire them these days.”

He observes that there are now more than 100 degree programs across the United States, and, contrary to perceptions, a master’s degree – not a Ph.D. – will suffice. Often, corporate training can bring people up to speed with essential skills. “In short, there is no excuse for not building a data science capability,” Davenport says.

There is also a need to separate the roles involved in managing and preparing data analytics – which can be divided into two broad categories: data science or data engineering. Aashu Virmani, chief marketing officer at in-database analytics software company Fuzzy Logix, recently explored these distinctions with Adrian Bridgwater in a recent Forbes article. “In the most simple of terms, data engineers worry about data infrastructure while data scientists are all about analysis,” Virmani states.

Virmani also explored the qualities that make a good data scientist or good data engineer. Data scientists, Virmani says, “may not have a ton of programming experience but their understanding of one or more analytics frameworks is essential.” He also says that a large part of their role is hypothesis testing, but the key is letting the data tell its own story. “Visualizing the data is just as important as being a good statistician, so the effective data scientist will have knowledge of some visualization tools and frameworks,” he explains, adding that “the best data scientists have a restless curiosity which compels them to try and fail in the process of knowledge discovery.”

Data engineers, on the other hand, need to understand database technology and leading database brands, Virmani says. “In addition to knowing the database technology, the data engineer has an idea of the data schema and organization – how their company’s data is structured, so he or she can put together the right data sets from the right sources for the scientist to explore.”

Most importantly, the two sides need to work together – collaboration is essential. “A data scientist is typically someone with a math and probability background, who also knows how to program,” says Jesse Anderson advocates in a separate article. “Data scientists are often familiar with big data technologies, in order to run algorithms at scale. A data science team is multidisciplinary, just like a data engineering team. The role of data engineer typically requires a strong background in programming and distributed systems, whereas the role of a data scientist typically requires a stronger background in math, analysis, and probabilities.”

Data scientists, he continues, “aren’t just there to just make equations and throw them over the fence to the data engineering team—data scientists need to have some level of programming. If the throw-it-over-the-fence scenario becomes the perception or reality, there can be a great deal of animosity between the teams.”

The key is in today’s organizations, there is a kaleidoscope of skills. The challenge is to match and align them in a way that the business moves forward.