Data analytics with Amazon EMR Serverless

Neha Tomar
2 min readDec 27, 2021

--

Amazon EMR is used by tens of thousands of customers to run open-source frameworks like as Apache Spark, Hive, and Presto for large-scale distributed data processing operations, interactive SQL queries, and machine learning applications. Customers simply describe the framework they want to execute with Amazon EMR Serverless, and Amazon EMR Serverless provisions, manages, and scales compute and memory resources up and down as workload needs vary. Customers may begin utilizing Amazon EMR Serverless by choosing an open-source framework and submitting their task using Amazon EMR APIs, the AWS Command Line Interface (AWS CLI), or the AWS Management Console.

Amazon EMR Serverless is a Serverless option on Amazon EMR that allows data analysts and engineers to run open-source big data analytics frameworks without having to configure, manage, or scale clusters or servers. All Amazon EMR’s features and benefits will be available without the need for experts to plan and manage clusters.

Benefits:

Easily run frameworks: Simply select the open-source framework you want to run for your application, such as Apache Spark, Hive, or Presto, and EMR Serverless automatically provisions and manages the underlying compute and memory resources.

Scale on demand: Run analytics workloads at any scale with automatic on-demand scaling that resizes resources in seconds to meet changing data volumes and processing requirements.

Cost Optimization: EMR Serverless automatically scales up and down to provide the right amount of capacity for application. We will pay only for the services we used, and we can minimize concerns about over- or under-provisioning.

How it works:

Create your application: Choose the open-source framework and version you want to use.

Submit jobs: Submit jobs to your application via APIs, AWS Management Console, or EMR Studio. You can also submit jobs using workflow orchestration services like AWS Step Functions, Apache Airflow, or AWS Managed Workflows for Apache Airflow.

Debug jobs: Use familiar open-source tools such as Spark UI and Tez UI to monitor and debug jobs.

Use cases:

Variable workloads: As workload demands change, scale application resources seamlessly without having to preconfigure how much compute and memory you need.

Interactive data analysis: Choose the option to pre-initialize application resources and enable sub-second response time for interactive data analysis in EMR Studio.

Development and test environments: Spin up a development and test environment quickly and easily, automatically scale with unpredictable usage, and get products to market faster.

--

--

Neha Tomar
Neha Tomar

No responses yet