“ I have checked the documents of AWS to get into deep dive on amazon elastic mapreduce service platform with amazon ec2 instance. In terms of cost, need to pay for emr service, amount of storage and data transferred in and out of service for s3 bucket, amazon ec2 instance.”
Amazon Elastic MapReduce is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service and Amazon Dynamodb.
In this post, you will experience how to deep dive on amazon elastic mapreduce service platform with amazon ec2 instance. Here I have created an amazon emr service cluster with iam roles, key pair and s3 bucket.
Architecture Overview
The architecture diagram shows the overall deployment architecture with data flow, amazon emr service, s3 bucket, iam service role, ec2 instances.
Solution overview
The blog post consists of the following phases:
Create of Amazon EMR Service Cluster with Required Configurations
Output of Emr Cluster as Submit of Spark Application as a Step Option
Phase 1: Create of Amazon EMR Service Cluster with Required Configurations
- Create a key pair, iam roles and s3 bucket with required data. Open the console of Amazon emr service, create a cluster with amazon emr running on amazon ec2 option. Specify the cluster name as Emr Cluster and choose the required parameters as choice of application, instance type, ebs volume, networking, cluster logs s3 location, key pair, emr service role, ec2 instance profile for emr role.
Phase 2: Output of Emr Cluster as Submit of Spark Application as a Step Option
Clean-up
Delete of Amazon EMR Cluster, IAM Roles, S3 Bucket, Key Pair.
Pricing
I review the pricing and estimated cost of this example.
Cost of Amazon Elastic MapReduce service = $0.048 per hour for EMR m5.xlarge = $(0.048x1.086) = $0.05
Cost of Amazon Elastic Compute Cloud = $0.192 per On Demand Linux m5.xlarge Instance Hour = $(0.192x1.334) = $0.26
Cost of Amazon Simple Storage Service = $0.0
Total Cost = $0.31
Summary
In this post, I showed “how to deep dive on amazon elastic mapreduce service platform with amazon ec2 instance”.
For more details on Amazon EMR Service, Checkout Get started Amazon EMR Service, open the Amazon EMR Service console. To learn more, read the Amazon EMR Service documentation.
Thanks for reading!
Connect with me: Linkedin