Sunday, October 24, 2021

Big Data Computing: Quiz Assignment-VIII Solutions (Week-8)

1. Which of the following are provided by spark API for graph parallel computations:
i. joinVertices
ii. subgraph
iii. aggregateMessages
A. Only (i)
B. Only (i) and (ii)
C. Only (ii) and (iii)
D. All of the mentioned
Answer: D) All of the mentioned


2. Which of the following statement(s) is/are true in the context of Apache Spark GraphX operators ?
S1: Property operators modify the vertex or edge properties using a user defined map function and produces a new graph.
S2: Structural operators operate on the structure of an input graph and produces a new graph. S3: Join operators add data to graphs and produces a new graphs.
A. Only S1 is true
B. Only S2 is true
C. Only S3 is true
D. All of the mentioned
Answer: D) All of the mentioned


3. True or False ?
The outerJoinVertices() operator joins the input RDD data with vertices and returns a new graph. The vertex properties are obtained by applying the user defined map() function to the all vertices, and includes ones that are not present in the input RDD.
A. True
B. False
Answer: A) True


4. Which of the following statements are true ?
S1: Apache Spark GraphX provides the following property operators - mapVertices(), mapEdges(), mapTriplets()
S2: The RDDs in Spark, depend on one or more other RDDs. The representation of dependencies in between RDDs is known as the lineage graph. Lineage graph information is used to compute each RDD on demand, so that whenever a part of persistent RDD is lost, the data that is lost can be recovered using the lineage graph information.
A. Only S1 is true
B. Only S2 is true
C. Both S1 and S2 are true
D. None of the mentioned
Answer: C) Both S1 and S2 are true


5. GraphX provides an API for expressing graph computation that can model the
abstraction.
A. GaAdt
B. Pregel
C. Spark Core
D. None of the mentioned
Answer: B) Pregel


6. Match the following:
A. Dataflow Systems i. Vertex Programs
B. Graph Systems ii. Parameter Servers
C. Shared Memory Systems iii. Guinea Pig
A. A:ii, B: i, C: iii
B. A:iii, B: i, C: ii
C. A:ii, B: iii, C: i
D. A:iii, B: ii, C: i
Answer: B) A:iii, B: i, C: ii


7. Which of the following statement(s) is/are true in context of Parameter Servers.
S1: A machine learning framework
S2: Distributes a model over multiple machines
S3: It offers two operations: (i) Pull for query parts of the model (ii) Push for update parts of the model.
A. Only S1 is true
B. Only S2 is true
C. Only S3 is true
D. All of the mentioned
Answer: D) All of the mentioned


8.



What is the PageRank score of vertex B after the second iteration? (Without damping factor)
Hint:- The basic PageRank formula is:

Where, PRt+1(u): page rank of node u under consideration PRt(v): previous page rank of node ‘v’ pointing to node ‘u’ C(v): outgoing degree of vertex ‘v’
A. 1/6
B. 1.5/12
C. 2.5/12
D. 1/3 

Answer: A) 1/6

Explanation: The Page Rank score of all vertex is calculated as follows: 

 

Iteration0

Iteration1

Iteration2

Page Rank

A

  1/4

1/12

1.5/12

1

B

1/4

2.5/12

2/12

2

C

1/4

4.5/12

4.5/12

4

D

1/4

4/12

4/12

3

 

Friday, October 15, 2021

Big Data Computing: Quiz Assignment-VII Solutions (Week-7)

1. Suppose you are using a bagging based algorithm say a Random Forest in model building. Which of the following can be true?
1 Number of tree should be as large as possible
2 You will have interpretability after using Random Forest
A. Only 1
B. Only 2
C. Both 1 and 2
D. None of these
Answer: A) Only 1
Explanation: Since Random Forest collects results from a few weak students, if possible we would like more trees in building the model. Random Forest is a black box model that you will lose interpretation after using it.
 
2. To apply bagging to regression trees which of the following is/are true in such case?
1. We build the N regression with N bootstrap sample
2. We take the average the of N regression tree
3. Each tree has a high variance with low bias
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1,2 and 3
Answer: D) 1,2 and 3
Explanation: All of the options are correct and self explanatory
 
3. In which of the following scenario a gain ratio is preferred over Information Gain?
A. When a categorical variable has very small number of category
B. Number of categories is the not the reason
C. When a categorical variable has very large number of category
D. None of the mentioned
Answer: C) When a categorical variable has very large number of category
Explanation: When high cardinality problems, gain ratio is preferred over Information Gain technique.
 
4. Which of the following is/are true about Random Forest and Gradient Boosting ensemble methods?
1. Both methods can be used for classification task
2. Random Forest is use for classification whereas Gradient Boosting is use for regression task
3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task
4. Both methods can be used for regression task
A. 1 and 2
B. 2 and 3
C. 2 and 4
D. 1 and 4
Answer: D) 1 and 4
Explanation: Both algorithms are design for classification as well as regression task.
 
5. True or False ?
Bagging provides an averaging over a set of possible datasets, removing noisy and non-stable parts of models.
A. True
B. False
Answer: A) True
 
6. Hundreds of trees can be aggregated to form a Random forest model. Which of the following is true about any individual tree in Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Answer: A) 1 and 3
Explanation: Random forest is based on the bagging concept, which takes into account the champion faction and the characteristic faction for the construction of individual trees.
 
7. Boosting any algorithm takes into consideration the weak learners. Which of the following is the main reason behind using weak learners?
Reason I-To prevent overfitting Reason II- To prevent underfitting
A. Reason I
B. Reason II
C. Both Reason I and Reason II
D. None of the Reasons
Answer: A) Reason I
Explanation: To prevent overfitting, because the overall complexity of the learner increases with each step. Starting with weak students implies that late grade students will tend to be less big.

Search Aptipedia