To earn a CCDH: Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410) certification, candidates must pass the following test:
Test Name: Cloudera Certified Developer for Apache Hadoop CDH4 (CCD-410)
Number of Questions: 60
Time Limit: 90 minutes
Passing Score: 67%
English Release Date: November 1, 2012
Cloudera Certified Developer for Apache Hadoop CCD-410 is designed to test a candidate’s fluency with the concepts and skills required in the following areas:
Core Hadoop Concepts (25%)
Recognize and identify Apache Hadoop daemons and how they function both in data storage and processing under both CDH3 and CDH4. Understand how Apache Hadoop exploits data locality. Given a big data scenario, determine the challenges to large-scale computational models and how distributed systems attempt to overcome various challenges posed by the scenario. Identify the role and use of both MapReduce v1 (MRv1) and MapReduce v2 (MRv2 / YARN) daemons.
Storing Files in Hadoop (7%)
Analyze the benefits and challenges of the HDFS architecture, including how HDFS implements file sizes, block sizes, and block abstraction. Understand default replication values and storage requirements for replication. Determine how HDFS stores, reads, and writes files. Given a sample architecture, determine how HDFS handles hardware failure.
Job Configuration and Submission (7%)
Construct proper job configuration parameters, including using JobConf and appropriate properties. Identify the correct procedures for MapReduce job submission. How to use various commands in job submission (“hadoop jar” etc.)
Job Execution Environment (10%)
Given a MapReduce job, determine the lifecycle of a Mapper and the lifecycle of a Reducer. Understand the key fault tolerance principles at work in a MapReduce job. Identify the role of Apache Hadoop Classes, Interfaces, and Methods. Understand how speculative execution exploits differences in machine configurations and capabilities in a parallel environment and how and when it runs.
Input and Output (6%)
Given a sample job, analyze and determine the correct InputFormat and OutputFormat to select based on job requirements. Understand the role of the RecordReader, and of sequence files and compression.
Job Lifecycle (18%)
Analyze the order of operations in a MapReduce job, how data moves from place to place, how partitioners and combiners function, and the sort and shuffle process.
Data processing (6%)
Analyze and determine the relationship of input keys to output keys in terms of both type and number, the sorting of keys, and the sorting of values. Given sample input data, identify the number, type, and value of emitted keys and values from the Mappers as well as the emitted data from each Reducer and the number and contents of the output file(s).
Key and Value Types (6%)
Given a scenario, analyze and determine which of Hadoop’s data types for keys and values are appropriate for the job. Understand common key and value types in the MapReduce framework and the interfaces they implement.
Common Algorithms and Design Patterns (7%)
Evaluate whether an algorithm is well-suited for expression in MapReduce. Understand implementation and limitations and strategies for joining datasets in MapReduce. Analyze the role of DistributedCache and Counters.
The Hadoop Ecosystem (8%)
Analyze a workflow scenario and determine how and when to leverage ecosystems projects, including Apache Hive, Apache Pig, Sqoop and Oozie. Understand how Hadoop Streaming might apply to a job workflow.
You use the hadoop fs -put command to add sales.txt to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes within your cluster. When and how will the cluster handle replication following the failure of one of these nodes?
A. The cluster will make no attempt to re-replicate this block.
B. This block will be immediately re-replicated and all other HDFS operations on the cluster will halt while this is in progress.
C. The block will remain under-replicated until the administrator manually deletes and recreates the file.
D. The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the DataNodes.
You need to write code to perform a complex calculation that takes several steps. You have decided to chain these jobs together and develop a custom composite class for the key that stores the results of intermediate calculations. Which interface must this key implement?
You are developing an application that uses a year for the key. Which Hadoop-supplied data type would be most appropriate for a key that represents a year?
E. None of these would be appropriate. You would need to implement a custom key.