1. Consider the following statements:
Statement 1: The Job Tracker is hosted inside the master and it receives the job execution request from the client.
Statement 2: Task tracker is the MapReduce component on the slave machine as there are multiple slave machines.
A. Only statement 1 is true
B. Only statement 2 is true
C. Both statements are true
D. Both statements are false
Answer: C) Both statements are true
2. _______________is the slave/worker node and holds the user data in the form of Data Blocks.
A. NameNode
B. Data block
C. Replication
D. DataNode
Answer: D) DataNode
Explanation: The NameNode acts as the main server that manages the namespace of the file system, primarily managing client access to these files, and also keeping track of the location of the data on the DataNode and the basic distribution location of blocks. On the other hand, DataNode is a slave / worker node and stores user data in the form of data blocks.
3. ________works as a master server that manages the file system namespace and basically regulates access to these files from clients, and it also keeps track of where the data is on the Data Nodes and where the blocks are distributed essentially.
A. Name Node
B. Data block
C. Replication
D. Data Node
Answer: A) Name Node
Explanation: Namenode, as the main server, manages the namespace of the file system, and basically regulates the client's access to these files. At the same time, it also tracks the location of the data on the data node and the basic distribution location of the block. On the other hand, data nodes are slave/work nodes, which contain user data in the form of data blocks.
4. The number of maps in MapReduce is usually driven by the total size of
A. Inputs
B. Outputs
C. Tasks
D. None of the mentioned
Answer: A) Inputs
Explanation: The map, written by the user takes a pair of entry and produces a series of intermediate keys / value pairs. The MapReduce Library groups together all the intermediate values associated with the same intermediate key "I 'and pass them to the function reduce.
5. True or False ?
The main duties of task tracker are to break down the receive job that is big computations in small parts, allocate the partial computations that is tasks to the slave nodes monitoring the progress and report of task execution from the slave.
A. True
B. False
Answer: B) False
Explanation: The task tracker will communicate the progress and report the results to the job tracker.
6. Point out the correct statement in context of YARN:
A. YARN is highly scalable.
B. YARN enhances a Hadoop compute cluster in many ways
C. YARN extends the power of Hadoop to incumbent and new technologies found within the data center
D. All of the mentioned
Answer: D) All of the mentioned
7. Consider the pseudo-code for MapReduce's WordCount example (not shown here). Let's now assume that you want to determine the frequency of phrases consisting of 3 words each instead of determining the frequency of single words. Which part of the (pseudo-)code do you need to adapt?
A. Only map()
B. Only reduce()
C. map() and reduce()
D. The code does not have to be changed
Answer: A) Only map()
Explanation: The map function takes a value and outputs key:value pairs.
For instance, if we define a map function that takes a string and outputs the length of the word as the key and the word itself as the value then
map(steve) would return 5:steve and map(savannah) would return 8:savannah.
This allows us to run the map function against values in parallel. So we have to only adapt the map() function of pseudo code.
8. The namenode knows that the datanode is active using a mechanism known as
A. Heartbeats
B. Datapulse
C. h-signal
D. Active-pulse
Answer: A) heartbeats
Explanation: Use Heartbeat to communicate between the Hadoop Namenode and Datanode. Heartbeat is therefore a signal that the data node sends to the name node after a certain time interval to indicate its existence, ie to indicate that it is alive.
9. True or False ?
HDFS performs replication, although it results in data redundancy?
A. True
B. False
Answer: True
Explanation: Once the data has been written on HDFS, it is replicated immediately along the cluster, so that the different data copies are stored in different data nodes. Normally, the replication factor is 3, since due to this, the data does not remain on replicates or are lower.
10. _____________function processes a key/value pair to generate a set of intermediate key/value pairs.
A. Map
B. Reduce
C. Both Map and Reduce
D. None of the mentioned
Answer: A) Map
Explanation: Mapping is a single task that converts input data records into intermediate data records and reduces the process and merges all intermediate values assigned by each key.