Cloudera Developer for Apache Hadoop Training Course

Training » Categories » Cloudera Apache Hadoop » Cloudera Developer for Apache Hadoop

Course Summary

This four-day training course is for developers who want to learn to program and use Apache Hadoop to build powerful data processing applications.

[top] Duration

4 days.

[top] Objectives

  • How MapReduce and the Hadoop Distributed File System work
  • How to write MapReduce code in Java or other programming languages
  • What issues to consider when developing MapReduce jobs
  • How to implement common algorithms in Hadoop
  • Best practices for Hadoop development and debugging
  • How to leverage other project such as Apache Hive, Apache Pig, Sqoop and Oozie
  • Advanced Hadoop API topics required for real-world data analysis

[top] Prerequisites

This course is designed for developers with some programming knowhow (preferably Java). Existing knowledge of Hadoop is not required.

Additional Notes

Download the full agenda for Cloudera's Developer Training for Apache Hadoop.

Hands-On Exercises

Throughout the course, students write Hadoop code and perform other Hands-On Exercises to solidify their understanding of the concepts being presented.

Certification Exam

Following the training, attendees will be given a voucher good for one certification exam attempt to become a Cloudera Certified Developer for Apache Hadoop (CCDH). Learn more about the CCDH Certification Exam here: http://university.cloudera.com/certification.html

[top] Outline

Introduction

The Motivation For Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands-On Exercise
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

Writing a MapReduce Program

  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API
  • Using Eclipse for Rapid Development
  • Hands-on exercise
  • The New MapReduce API

Integrating Hadoop Into The Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from RDBMSs With Sqoop
  • Hands-on exercise
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

Delving Deeper Into The Hadoop API

  • More about ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With Combiners
  • The configure and close methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache
  • Hands-On Exercise

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Machine Learning With Mahout
  • Term Frequency – Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise

Using Hive and Pig

  • Hive Basics
  • Pig Basics
  • Hands-on exercise

Practical Development Tips and Techniques

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode For Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands-On Exercise

More Advanced MapReduce Programming

  • Custom Writables and WritableComparables
  • Saving Binary Data using SequenceFiles and Avro Files
  • Creating InputFormats and OutputFormats
  • Hands-On Exercise

Joining Data Sets in MapReduce

  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins

Graph Manipulation in Hadoop

  • Introduction to graph techniques
  • Representing graphs in Hadoop
  • Implementing a sample algorithm: Single Source Shortest Path

Creating Workflows With Oozie

  • The Motivation for Oozie
  • Oozie’s Workflow Definition Format
  • Hands-On Exercise

Training Schedule

Location May 2013 Jun 2013 Jul 2013 Aug 2013
Training Choice - Melbourne     Jul 9 - Jul 12
 
Training Choice Sydney   Jun 11 - Jun 14
  Aug 6 - Aug 9

Classes in bold are guaranteed to run!