5 Steps to MLOps at Scale

Last Published: Jul 12, 2022 |
Preetam Kumar
Preetam Kumar

Product Marketing Manager

AI/ML Trends and Challenges

 

Why we need MLOps - AI/ML trends and challenges

They said it couldn't be done. A computer program beating the world champion of the game of Go? Not going to happen, right? Wrong. AlphaGo1 machine learning (ML) and artificial intelligence (AI) technology beat the world champ. Five years later, AlphaFold is using AI/ML to solve some of the core challenges in the field of biology.2 As efforts like these pave a path to the future, 85% of AI projects still fail3 because they cannot deliver on their intended business promises. For ML models to be successful, data science projects must be able to address these three challenges:

1. Trusted and governed data – Companies rely on high-quality, trusted data. This helps to create trustworthy insights. From there, they can make critical business decisions. The entire enterprise must be able to access trusted data. This includes data scientists, data stewards, data analysts and data engineers. Also, developers and business users must have the access as well. A comprehensive, intelligent and complete data management solution is needed. This lets companies both democratize and govern access.

2. A multi-cloud ML strategy – Companies are modernizing their legacy systems in the public cloud; including Amazon Web Services (AWS), Microsoft Azure and Google Cloud. This provides access to pre-trained models and helps them innovate. Innovation like this is not possible in an on-premises environment. For example, the latest ML libraries and specialized hardware are not available on-premises. But it is hard to stick to a single cloud. More than 76% of organizations use multi-cloud. Companies often build different ML applications in multiple clouds.4 For example, customers can build an app that uses Google Vision APIs on Google Cloud, and the rest of their apps can run on AWS or Azure.

3. AI/ML automation – Data automation does the mundane data management work for you. It frees up your data engineering and data science teams to focus on high-value initiatives and tasks. Then data scientists can do what they were trained to do. Automating the entire machine learning workflow is critical. This includes data acquisition, exploration and integration. It also includes data model training and deployment, as well as data monitoring. As a result, you can reuse, configure and deploy repeatable patterns.

Solution – What Is MLOps and What Are Its Benefits?

Machine learning operations (MLOps) is the process of streamlining machine learning models. MLOps focuses on data model deployment, operationalization and execution. This standard set of practices for machine learning operationalization lets you enable the full power of AI at scale. It also lets you deliver trusted, machine-led decisions in real-time. MLOps emulates the concept of DevOps in merging machine learning and operationalization. It combines model development and operations technologies. These are essential to high-performing AI solutions.

Many organizations follow the process of build, test and train ML models. But how can you provide continuous feedback? This is especially important once the models are in production. Data scientists can't be responsible for the management of an end-to-end machine learning pipeline. It would be best to have a team with the right mix of technical skillsets to manage the orchestration. MLOps provides the framework to operationalize the model development process. This establishes a continuous delivery cycle of models. These models form the basis for AI-based systems.

The benefits of implementing machine learning operationalization include the ability to:

  • Deliver business values for data science projects
  • Improve the efficiency of the data science team
  • Allow ML models to run more predictably with better results
  • Help enterprises improve revenue and operational efficiency
  • Speed up the digital transformation journey

Industry Use Cases

MLOps can solve business issues by addressing many industries' use cases including:

Banking and Finance

Retail

Healthcare

Manufacturing

Fraud detection and prevention

Sales

Preventive patient care

Predictive maintenance

Customer onboarding

Product usage and retention forecasting

Drug discovery

Product design

Customer experience

Customer lifetime value

ICU monitoring

Smart energy consumption

Portfolio management

Upsell and cross-sell

Cancer diagnosis

Supply chain management

Credit risk assessment and management

Audience segmentation

 

Quality control

Customer churn predictions

Weather forecasting

 

 

Blockchain

Inventory management

 

 

Algorithmic stock trading

Next-best action

 

 

Credit scoring

 

 

 

Loan processing

 

 

 

 

MLOps in 5 Steps with Informatica

Informatica Intelligent Data Management Cloud (IDMC) can help accelerate your data science initiatives. It delivers next-generation scalable AI/ML and actionable analytics. Informatica IDMC has purpose-built connectors for thousands of endpoints. This provides native connectivity. It helps with both metadata and data for a variety of use cases and latency / SLA needs. Informatica IDMC also provides a best-in-class ETL and ELT engine. This helps process data in the most efficient way as per the use case. Here are five easy steps to onboard new data sources into a cloud data lake with Informatica:

Step 1: Identify the business problem and acquire data. Identify trusted data from various sources. Data sources can include:

  • Internet of Things (IoT) devices
  • Machine logs
  • Relational databases
  • Mainframe systems
  • On-premises data warehouses
  • Applications

Then load them into a cloud data lake. Informatica's Enterprise Data Catalog helps you identify trusted data. Informatica’s Cloud Mass Ingestion lets you ingest trusted data into the cloud data lake. Informatica's unique AI-driven intelligent metadata discovery lets you quickly discover your data assets. Then you can apply them to a data pipeline. A data engineer can search for inventory data.

Next, the data can be added to the mapping. Cloud Mass Ingestion lets you mass ingest your continuous data. These data sources can include data from files, messaging streams, databases change data capture (CDC), and applications into cloud targets. All it takes is a simple and intuitive four-step, wizard-based approach. Cloud Mass Ingestion also provides optional transformation capabilities. These are applied during data ingestion so you can avoid unnecessary hops.

Step 2: Curating, cleansing and preparing data. Once your data is ingested into a cloud data lake, you can cleanse and match your data and standardize rules. This ensures that your data is clean and ready to consume. Informatica Cloud Data Quality has easy-to-use drag and drop configuration. With it, you can rapidly build, test and run data quality plans. Cloud Data Quality also lets you continuously monitor your data quality. This ensures that the correct data is used for your machine learning model. You can use pre-built, out-of-the-box Cloud Data Integration transformation templates. This means less time spent on implementing manually coded and error-prone logic. Advanced transformations include:

  • Hierarchy transformation
  • Built-in integration for data quality
  • Machine learning transactions for operationalizing ML models
  • Data masking transformation

Step 3: Build machine learning models. Data scientists can operationalize their learning models. This lets them build and test models. They can use their preferred development tools, including the "Jupyter" notebook. To run the models, they can use Informatica's Spark-based data integration engine. For model development, Informatica runs on Advanced Serverless deployment. Informatica is the industry's first solution to run on advanced serverless deployment. It provides a pipeline for cleansed training data. Informatica’s CLAIRE engine applies the industry-leading metadata capabilities. CLAIRE speeds up and automates your core data management. It also provisions a Spark serverless cluster. To run jobs at scale, you can apply auto-scaling and auto-tuning. This means better performance. And, when you process the job, you have more effective cost management. In addition, Informatica has customized and incorporated layers of innovations. These include run time optimizations, advanced data management and elastic operations.

Step 4: Deploying machine learning models. Data engineers get more flexibility during deployment. They can consume and deploy the machine learning model in the production environment. It runs in serverless mode for predictive analytics and to send recommendations. These can include custom SMS alerts and next best action. With Informatica, engineers reuse the training data pipeline for processing data for inference. Serverless deployment frees data science and engineering teams from managing infrastructure. Then they can focus on model efficiency.

Step 5: Model monitoring. DataOps teams can monitor model performance. This allows continued value delivery. They can also leverage Informatica’s built-in monitoring and alerting capabilities. These automate monitoring and management of your models. From there, you can automate ML model operationalization.

Summary

MLOps is vital to operationalize data science use cases and deliver initiative results. MLOps drives business value and speeds up digital transformation. Informatica end-to-end machine learning operationalization works on any platform and any cloud, multi-cloud and multi-hybrid.

To learn more, watch this demo video on how to operationalize your ML models at scale or sign up for the 30-day trial today.

 

1https://www.deepmind.com/research/highlighted-research/alphago

2https://www.deepmind.com/research/highlighted-research/alphafold

3https://www.techrepublic.com/article/why-85-of-ai-projects-fail/

4https://www.zdnet.com/article/cloud-computing-in-the-real-world-the-challenges-and-opportunities-of-multicloud/

First Published: Jun 13, 2021