Example: confidence

Amazon EMR - Management Guide

Amazon EMRM anagement GuideAmazon EMR Management GuideAmazon EMR: Management GuideCopyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights 's trademarks and trade dress may not be used in connection with any product or service that is not Amazon 's, in any mannerthat is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon . All other trademarks notowned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored EMR Management GuideTable of ContentsWhat Is Amazon EMR?

Amazon EMR Management Guide Overview What Is Amazon EMR? Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as

Tags:

  Amazon, Guide, Management, Amazon emr management guide, Amazon emr

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Amazon EMR - Management Guide

1 Amazon EMRM anagement GuideAmazon EMR Management GuideAmazon EMR: Management GuideCopyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights 's trademarks and trade dress may not be used in connection with any product or service that is not Amazon 's, in any mannerthat is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon . All other trademarks notowned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored EMR Management GuideTable of ContentsWhat Is Amazon EMR?

2 1 Overview .. 1 Understanding Clusters and 1 Submitting Work to a Cluster .. 2 Processing Data .. 2 Understanding the Cluster Lifecycle .. 4 Cost Savings .. 4 AWS Integration .. 5 Deployment .. 5 Scalability and Flexibility .. 5 Reliability .. 6 Security .. 7 Management Interfaces .. 7 Architecture .. 8 Cluster Resource Management .. 8 Data Processing Frameworks .. 9 Applications and Programs .. 9 Getting Started .. 11 Step 1: Set Up Prerequisites .. 11 Sign Up for AWS .. 11 Create an Amazon S3 Bucket .. 12 Create an Amazon EC2 Key Pair.

3 12 Step 2: Launch The 12 Launch the Sample 12 Summary of Quick Options .. 13 Step 3: Allow SSH Access .. 16 Step 4: Run a Hive Script to Process Data .. 17 Understanding The Data And 17 Submit the Hive Script as a Step .. 18 View the Results .. 19 Step 5: Clean Up Resources .. 20 Using EMR 22 Creating a Notebook .. 23 Creating Clusters for Notebooks .. 23 Notebook Limits per 24 Creating a Cluster When You Create a Notebook .. 24 Using an Existing Amazon EMR Cluster .. 24 Working with Notebooks .. 25 Understanding Notebook 25 Working with the Notebook Editor.

4 26 Changing 26 Deleting Notebooks and Notebook 27 Sharing Notebook 28 Setting Up Spark User Impersonation .. 28 Using the Spark Job Monitoring 29 Security .. 29 IAM Policy Actions for EMR Notebooks .. 30 Using Tags to Control User Permissions .. 31 Specifying EC2 Security Groups .. 36iiiAmazon EMR Management GuideSpecifying the AWS Service Role .. 37 Plan and Configure Clusters .. 39 Configure Cluster Location and Data Storage .. 39 Choose an AWS Region .. 39 Work with Storage and File Systems .. 40 Prepare Input Data .. 43 Configure an Output Location.

5 51 Use EMR File System (EMRFS) .. 55 Consistent 56 Authorizing Access to EMRFS Data in Amazon S3 .. 70 Specifying Amazon S3 Encryption Using EMRFS Properties .. 71 Control Cluster Termination .. 78 Configuring a Cluster to Auto-Terminate or Continue .. 79 Using Termination Protection .. 79 Working with AMIs .. 83 Using the Default 84 Using a Custom 85 Specifying the Amazon EBS Root Device Volume Size .. 90 Configure Cluster Software .. 91 Create Bootstrap Actions to Install Additional Software .. 92 Configure Cluster Hardware and Networking .. 96 Understand Node Types.

6 96 Configure EC2 Instances .. 97 Configure Networking .. 102 Configure Instance Fleets or Instance Groups .. 111 Guidelines and Best Practices .. 122 Configure Cluster Logging and Debugging .. 126 Default Log 127 Archive Log Files to Amazon S3 .. 127 Enable the Debugging Tool .. 129 Debugging Option 130 Tag Clusters .. 130 Tag Restrictions .. 131 Tag Resources for Billing .. 132 Add Tags to a New Cluster .. 132 Adding Tags to an Existing Cluster .. 132 View Tags on a Cluster .. 133 Remove Tags from a Cluster .. 134 Drivers and Third-Party Application Integration.

7 134 Use Business Intelligence Tools with Amazon EMR .. 134 Security .. 136 Use IAM Policies to Allow and Deny User Permissions .. 137 Amazon EMR Actions in User-Based IAM Policies .. 137 Use Managed Policies for User Access .. 138 Use Inline Policies for User Permissions .. 140 Use Cluster Tagging with IAM Policies for Cluster-Specific Control .. 140 Use Kerberos Authentication .. 144 Supported Applications .. 145 Configure Kerberos .. 145 Configure a Cluster-Dedicated KDC .. 150 Configure a Cross-Realm Trust .. 152 Use an Amazon EC2 Key Pair for SSH Credentials .. 157 Encrypt Data in Transit and At Rest.

8 157 Encryption Options .. 157 Create Keys and Certificates for Data Encryption .. 161 Configure IAM Roles for EMRFS .. 164 How IAM Roles for EMRFS Work .. 165ivAmazon EMR Management GuideSet Up a Security Configuration with IAM Roles for EMRFS .. 165 Control Network Traffic with Security Groups .. 168 Working With Amazon EMR-Managed Security Groups .. 169 Working With Additional Security Groups .. 172 Specifying Security Groups .. 173 Use Security Configurations to Set Up Cluster Security .. 175 Create a Security Configuration .. 175 Specify a Security Configuration for a Cluster.

9 186 Configure IAM Roles for Amazon EMR Permissions to AWS Services .. 187 EMR Role .. 187 EMR Role for EC2 .. 187 Automatic Scaling Role .. 187 Service-Linked Role .. 188 Use Default IAM Roles and Managed Policies .. 188 Allow Users and Groups to Create and Modify Roles .. 194 Customize IAM Roles .. 194 Resource-Based Policies for AWS Glue .. 196 Use IAM Roles with Applications That Call AWS Services Directly .. 196 Using the Service-Linked Role .. 197 Manage 203 View and Monitor a 203 View Cluster Status and 203 Enhanced Step Debugging .. 208 View Application History.

10 210 View Log 211 View Cluster Instances in Amazon EC2 .. 215 CloudWatch Events and Metrics .. 216 View Cluster Application Metrics with 236 Logging Amazon EMR API Calls in AWS CloudTrail .. 236 Connect to the Cluster .. 238 Connect to the Master Node Using SSH .. 239 View Web Interfaces Hosted on Amazon EMR Clusters .. 243 Terminate a Cluster .. 251 Terminate a Cluster Using the Console .. 252 Terminate a Cluster Using the AWS CLI .. 252 Terminate a Cluster Using the API .. 253 Scaling Cluster Resources .. 253 Using Automatic Scaling in Amazon EMR .. 254 Manually Resizing a Running Cluster.


Related search queries