What You Should Know About Serverless

Serverless is a cloud architecture that allows you to be free of managing servers, virtual machines (VMs), or containers. Serverless does not mean there are no servers involved (servers are still used for running applications). It simply means that you do not have to interact with or control the servers involved in the architecture. Serverless allows you to focus on the design and objectives of your application.

4 key characteristics of serverless

These four characteristics help classify what is true serverless:

  • No servers to manage – you are not managing any servers, VMs, or containers
  • Consumption-based pricing – you pay for the resources that you consume and do not pay when your application is sitting idle
  • Auto-tuning and auto-scaling – the application should be able to scale to handle new jobs automatically
  • High availability and automatic recovery – the application should have built-in high availability, recovery, and fault-tolerance

Who are the top serverless providers?

There are many serverless computing providers, but three offerings stand out: AWS Lambda, Microsoft Azure, and Google Cloud Platform. Offerings from these three vendors share similar advantages, but some qualities that make each one special.

AWS Lambda: AWS was one of the first vendors to offer serverless computing. AWS Lambda is a service that runs your code on Amazon EC2. With Lambda, you’re not hosting the code, and you’re not charged when the code is not used. You pay only for computation time. Your code can sit idle for months, and if you don’t run it, Amazon will not charge you. (See Informatica solutions for AWS.)

Microsoft Azure: Microsoft’s serverless offering is Azure Functions. Using Azure Functions, a user can create and upload code and then define triggers or events that will execute the code. Triggers can come from a wide range of sources, including another user’s application or other cloud services, such as databases, events, and notification hubs. Azure Functions has a usage-based billing policy. (See Informatica solutions for Microsoft Azure.)

Google Cloud Platform: Google Cloud Functions is Google’s serverless offering. Using Cloud Functions, a user writes simple functions that are attached to events triggered from Google Cloud Platform infrastructure and services. The Cloud Function is triggered when an event being watched is fired, and the code executes in a fully managed environment. Google Cloud Function services are priced per-function. (See Informatica solutions for Google Cloud Platform.)

What are the advantages of serverless for data management?

Each of the serverless options from Amazon, Microsoft, Google has advantages where the user can create and upload code, which can be automatically executed in a serverless manner where you no longer need to worry about managing servers, services, and infrastructure. All of that is handled automatically. This, of course, still has cost as you are spinning servers and you will find cost savings because you don’t have long running servers or not having dedicated operations personnel operating your pipeline.

So, how do we fit this ETL (extract, transform, load) or data management into this concept? Most of the providers have various data storage options that can be linked with your code. For instance, AWS Lambda can trigger your code each time a file is uploaded to Amazon S3 or events streamed to Amazon Kinesis or written to Amazon DynamoDB. All you have to do is supply is code to process that data. As you can see, serverless is a powerful shift in data management and how ETL is performed.

Serverless using Informatica Data Engineering products

Informatica supports serverless deployments using Amazon EMR, Microsoft Azure HDInsight, and Databricks clusters with data engineering products. Once a developer builds mappings using Informatica Data Engineering Integration, customers have an option to run mappings in an existing cluster for on-premises deployment or serverless using the cluster auto-deployment option.

The cluster auto-deployment option can

  • Deploy an Amazon EMR cluster on AWS
  • Deploy a HDInsight cluster on Microsoft Azure
  • Deploy a Databricks cluster
  • Auto scale

The figure below shows what serverless execution means in our data engineering workflow.

  1. The first task of the workflow is a “create cluster task” which provides a template to provide serverless properties from users like AWS or Microsoft Azure accounts, minimum/maximum worker nodes, ability to pick spot instances and other runtime properties including auto-tuning policy
  2. The next tasks are the Informatica mappings that are configured to run on the cluster created in the first step automatically
  3. The “delete cluster task” is an optional task where the user can terminate the cluster upon the successful execution of Informatica mappings

Serverless using Informatica IICS Cloud Data Integration Elastic service

A data management system is serverless, if one can ingest data, cleanse data, and enrich data without ever having to think about servers. The key aspects of a serverless data management system should satisfy the four characteristics described above.

Informatica’s Cloud Data Integration Elastic (CDI Elastic) service satisfies all the four characteristics of serverless

  1. No servers to manage —  Informatica’s CDI Elastic uses Spark on Kubernetes to process data integration mappings at scale in a serverless Spark engine.
  2. Informatica offers consumption bases pricing for CDI Elastic
  3. Auto-tuning and scaling, The Spark cluster for CDI Elastic uses cloud infrastructure with the intelligent CLAIRE engine based auto-tuning and auto-scaling, created in customers’ virtual cloud network.
  4. Built-in high availability and automatic recovery – while the cloud provider offers high availability and recovery for the underlying infrastructure, the CDI Elastic Spark cluster provides automatic high availability and recovery for the Spark Kubernetes cluster.

As you can see, serverless is a powerful shift in how data engineering jobs are performed. However, there are other nuances like exception, handling, deserialization, transformations, retrying, and monitoring which, need to be implemented. Informatica solves this problem through the Informatica Intelligent Cloud Service Cloud Data Integration Elastic service.

Informatica CDI Elastic virtualizes the runtime environment, enabling developers to focus more on mapping development rather than on infrastructure-related provisioning and management. Thus, from the perspective of an Informatica developer, CDI Elastic allows developers to build and run mappings, and task flows without thinking about servers.

CDI Elastic also provides customers to deploy their next-gen analytics solutions, providing them the ability to ingest, cleanse, and process big data in the cloud using serverless technology.

Future-proof with serverless and Informatica

When deploying cloud applications, you should consider serverless deployments first and only consider the alternatives if serverless does not meet your demands. Serverless offers consumption-based pricing, auto-tuning and auto-scaling, high availability, all without requiring a dedicated administrator to manage the environment.

Informatica is the Enterprise Cloud Data Management leader, and with the evolution of data management using serverless, Informatica helps you future-proof your solutions using IICS and data engineering solutions. We have many enhancements planned in CDIE with integration with the intelligent CLAIRE engine. Please stay tuned for more blogs on this topic.

To find out if Informatica’s CDIE service (formerly Integration at Scale) is right for you, try CDIE free for 30 days.

To learn more about serverless, read How to go Hadoop-less with Informatica Data Engineering and Databricks.

Comments