1. _____________is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.
A. Hadoop Common
B. Hadoop Distributed File System (HDFS)
C. Hadoop YARN
D. Hadoop MapReduce
Answer: C) Hadoop YARN
Explanation:
Hadoop Common: Contains libraries and utilities necessary from other Didoop modules.
HDFS: It is a distributed file system that stores data on a commodity machine. Provide a very high addition bandwidth throughout the cluster.
Hadoop Discussion: is a resource management platform responsible for managing processing resources in the cluster and use them to schedule users and applications. The thread is responsible for the assignment of system resources to the various applications running in a Hadoop cluster and the programming activities that are performed in different clustered nodes.
Hadooop MapReduce: it is a programming model that resizes data into many different processes.
2. Which of the following tool is designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases ?
A. Pig
B. Mahout
C. Apache Sqoop
D. Flume
Answer: C) Apache Sqoop
Explanation: Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases
3. ________________is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
A. Flume
B. Apache Sqoop
C. Pig
D. Mahout
Answer: A) Flume
Explanation: Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of historical data. It has a simple and very flexible architecture based on streaming data flow. It is very robust and fault-tolerant, and can really be adjusted to improve reliability mechanisms, failover, recovery, and any other mechanisms that keep the cluster safe and reliable. It uses a simple extensible data model that allows us to use various online analysis applications.
4. _____________refers to the connectedness of big data.
A. Value
B. Veracity
C. Velocity
D. Valence
Answer: D) Valence
Explanation: Valence refers to the connectedness of big data. Such as in the form of graph networks
5. Consider the following statements:
Statement 1: Volatility refers to the data velocity relative to timescale of event being studied
Statement 2: Viscosity refers to the rate of data loss and stable lifetime of data
A. Only statement 1 is true
B. Only statement 2 is true
C. Both statements are true
D. Both statements are false
Answer: D) Both statements are false
Explanation: The correct statements are:
Statement 1: Viscosity refers to the data velocity relative to timescale of event being studied Statement 2: Volatility refers to the rate of data loss and stable lifetime of data
6.___________refers to the biases, noise and abnormality in data, trustworthiness of data.
A. Value
B. Veracity
C. Velocity
D. Volume
Answer: B) Veracity
Explanation: Veracity refers to the biases ,noise and abnormality in data, trustworthiness of data.
7. ___________brings scalable parallel database technology to Hadoop and allows users to submit low latencies queries to the data that's stored within the HDFS or the Hbase without acquiring a ton of data movement and manipulation.
A. Apache Sqoop
B. Mahout
C. Flume
D. Impala
Answer: D) Impala
Explanation: Cloudera, Impala is specially designed for Cloudera, it is a query engine that runs on Apache Hadoop. The project was officially announced in late 2012 and became a publicly available open source distribution. Impala brings scalable parallel database technology to Hadoop, allowing users to send low-latency queries to data stored in HDFS or Hbase without performing a lot of data movements and operations.
8. True or False ?
NoSQL databases store unstructured data with no particular schema
A. True
B. False
Answer: A) True
Explanation: Traditional SQL can handle a large amount of structured data effectively, and we need NoSQL (not only SQL) to handle unstructured data. NoSQL database stores unstructured data without a specific schema.
9. _____________is a highly reliable distributed coordination kernel , which can be used for distributed locking, configuration management, leadership election, and work queues etc.
A. Apache Sqoop
B. Mahout
C. ZooKeeper
D. Flume
Answer: C) ZooKeeper
Explanation: Zookeeper is a central key value store that uses distributed systems that can coordinate. Since it is necessary to be able to manage the load, the zookeeper works with many machines.
10. True or False ?
MapReduce is a programming model and an associated implementation for processing and generating large data sets.
A. True
B. False
Answer: A) True
No comments:
Post a Comment
Open Researcher and Contributor ID (ORCID)