menu
arrow_back

Analyze Big Data with Hadoop

Analyze Big Data with Hadoop

1 hour 1 Credit

SPL-166 - Version 1.0.4

© 2019 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited.

Errors or corrections? Email us at aws-course-feedback@amazon.com.

Other questions? Contact us at https://aws.amazon.com/contact-us/aws-training/

Overview

Amazon EMR is a managed service that makes it fast, easy, and cost-effective to run Apache Hadoop and Spark to process vast amounts of data. Amazon EMR also supports powerful and proven Hadoop tools such as Presto, Hive, Pig, HBase, and more.

In this lab, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes. You will start by launching an Amazon EMR cluster and then use a HiveQL script to process sample log data stored in an Amazon S3 bucket. HiveQL is a SQL-like scripting language for data warehousing and analysis. You can then use a similar setup to analyze your own log files.

This lab is based on the Analyze Big Data with Hadoop project.

Topics Covered

By the end of this lab, you will be able to:

  • Launch a fully functional Hadoop cluster using Amazon EMR
  • Define the schema and create a table for sample log data stored in Amazon S3
  • Analyze the data using a HiveQL script and write the results back to Amazon S3
  • Download and view the results on your computer

Lab Pre-requisites

  • IT Experience: Prior experience with Hadoop is recommended, but not required, to complete this lab
  • AWS Experience: Basic familiarity with Amazon S3 and Amazon EC2 key pairs is suggested, but not required, to complete this project

Duration

The lab will take approximately 20 minutes to complete.

Start Lab

Notice the lab properties below the lab title:

  • setup - The estimated time to set up the lab environment
  • access - The time the lab will run before automatically shutting down
  • completion - The estimated time the lab should take to complete
  1. At the top of your screen, launch your lab by clicking Start Lab

If you are prompted for a token, use the one distributed to you (or credits you have purchased).

A status bar shows the progress of the lab environment creation process. The AWS Management Console is accessible during lab resource creation, but your AWS resources may not be fully available until the process is complete.

  1. Open your lab by clicking Open Console

This will automatically log you into the AWS Management Console.

Please do not change the Region unless instructed.

Common login errors

Error : Federated login credentials

If you see this message:

  • Close the browser tab to return to your initial lab window
  • Wait a few seconds
  • Click Open Console again

You should now be able to access the AWS Management Console.

Error: You must first log out

If you see the message, You must first log out before logging into a different AWS account:

  • Click click here
  • Close your browser tab to return to your initial Qwiklabs window
  • Click Open Console again

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Amazon Web Services Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
home
Home
school
Catalog
menu
More
More