DevOps for Data

DevOpsInformationWeek’s Lisa Morgan just made the call – data specialists need to get some DevOps religion. DevOps, which is attaining almost religious status within many technology-driven organizations, is about one simple yet effective idea: it’s better to have consistent cadence of deliverables moving through the pipeline, versus unpredictable spurts of development. Developers, known to work crazy hours on crazy schedules, are encouraged to work closely with operations teams, known for their discipline and adherence to tight schedules.

To some extent, data professionals also may have some of that tendency toward crazy schedules in their work habits. The idea is to channel all this energy and intellect into a regular heartbeat of insights moving through the organization.

“DevOps has promoted more collaboration between developers and IT operations. Data scientists and data science teams face similar challenges, which DevOps concepts can help address,” Morgan writes. “As the pace of business continues to accelerate, software and data science teams find themselves under pressure to deliver more business value in less time. DevOps closes the gap by promoting collaboration between development and IT operations and enabling project visibility across development and IT operations, which accelerates the delivery of better quality software.”

Victor Hugo Certuche, an enterprise agile coach, also has weighed in on the topic, with a similar message: let’s dispose of the worn, latent approaches to data management and put things on a faster, more productive track. “Data warehouses create the illusion that ‘if we build it they will come,’” he states.

Data professionals often “lack the cross-functional collaboration and support they need to ensure their work is timely and actually provides business value,” says Morgan. “In addition, their algorithms and models don’t always operate as they should in production because conditions or the data have changed.”

She quotes Michael Fauscette of G2 Crowd, who emphasizes the urgency of DevOps practices. “For all the work data scientists put into designing, testing and optimizing their algorithms, the real tests come when they are put into use,” he said “From Facebook’s newsfeed to stock market ‘flash crashes,’ we see what happens when algorithms go bad. The best algorithms must be continuously tested and improved.”

For his part, Certuche, calls data warehousing “one of those remaining silos that still need a major Agile shake up.” He advocates changing the term to “data services system” to underscore the emphasis on continuous delivery where and when it is needed. At the end of it, the only thing that matters is the customer, he emphasizes. Practices such as Agile and DevOps can speed up delivery to customers while ensuring quality.

Morgan says benefits to a DevOps structure include an ability to achieve more predictability. “Like application software, models may run well in a lab environment, but perform differently when applied in production,” she states. “One reason a model may fail to generalize is overfitting, which occurs when a model is so complex that it starts finding patterns in noise.”

Bringing DevOps to the data world also helps enable more consistent processes. “DevOps provides developers and IT operations professionals with visibility into what the other is doing to enable lifecycle views and approaches versus the traditional hand-offs that tend to cause dissention, finger-pointing and rework,” says Morgan. “Data scientists are involved in problem-solving lifecycles from the formation of a hypothesis to hypothesis testing, data collection, analysis and insights, but they may lack the collaboration and support they need from other parts of the organization.”