1. Suppose you are using a bagging based algorithm say a Random Forest in model building. Which of the following can be true?
1 Number of tree should be as large as possible
2 You will have interpretability after using Random Forest
A. Only 1
B. Only 2
C. Both 1 and 2
D. None of these
Answer: A) Only 1
Explanation: Since Random Forest collects results from a few weak students, if possible we would like more trees in building the model. Random Forest is a black box model that you will lose interpretation after using it.
1 Number of tree should be as large as possible
2 You will have interpretability after using Random Forest
A. Only 1
B. Only 2
C. Both 1 and 2
D. None of these
Answer: A) Only 1
Explanation: Since Random Forest collects results from a few weak students, if possible we would like more trees in building the model. Random Forest is a black box model that you will lose interpretation after using it.
2. To apply bagging to regression trees which of the following is/are true in such case?
1. We build the N regression with N bootstrap sample
2. We take the average the of N regression tree
3. Each tree has a high variance with low bias
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1,2 and 3
Answer: D) 1,2 and 3
Explanation: All of the options are correct and self explanatory
1. We build the N regression with N bootstrap sample
2. We take the average the of N regression tree
3. Each tree has a high variance with low bias
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1,2 and 3
Answer: D) 1,2 and 3
Explanation: All of the options are correct and self explanatory
3. In which of the following scenario a gain ratio is preferred over Information Gain?
A. When a categorical variable has very small number of category
B. Number of categories is the not the reason
C. When a categorical variable has very large number of category
D. None of the mentioned
Answer: C) When a categorical variable has very large number of category
Explanation: When high cardinality problems, gain ratio is preferred over Information Gain technique.
A. When a categorical variable has very small number of category
B. Number of categories is the not the reason
C. When a categorical variable has very large number of category
D. None of the mentioned
Answer: C) When a categorical variable has very large number of category
Explanation: When high cardinality problems, gain ratio is preferred over Information Gain technique.
4. Which of the following is/are true about Random Forest and Gradient Boosting ensemble methods?
1. Both methods can be used for classification task
2. Random Forest is use for classification whereas Gradient Boosting is use for regression task
3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task
4. Both methods can be used for regression task
A. 1 and 2
B. 2 and 3
C. 2 and 4
D. 1 and 4
Answer: D) 1 and 4
Explanation: Both algorithms are design for classification as well as regression task.
1. Both methods can be used for classification task
2. Random Forest is use for classification whereas Gradient Boosting is use for regression task
3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task
4. Both methods can be used for regression task
A. 1 and 2
B. 2 and 3
C. 2 and 4
D. 1 and 4
Answer: D) 1 and 4
Explanation: Both algorithms are design for classification as well as regression task.
5. True or False ?
Bagging provides an averaging over a set of possible datasets, removing noisy and non-stable parts of models.
A. True
B. False
Answer: A) True
Bagging provides an averaging over a set of possible datasets, removing noisy and non-stable parts of models.
A. True
B. False
Answer: A) True
6. Hundreds of trees can be aggregated to form a Random forest model. Which of the following is true about any individual tree in Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Answer: A) 1 and 3
Explanation: Random forest is based on the bagging concept, which takes into account the champion faction and the characteristic faction for the construction of individual trees.
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
Answer: A) 1 and 3
Explanation: Random forest is based on the bagging concept, which takes into account the champion faction and the characteristic faction for the construction of individual trees.
7. Boosting any algorithm takes into consideration the weak learners. Which of the following is the main reason behind using weak learners?
Reason I-To prevent overfitting Reason II- To prevent underfitting
A. Reason I
B. Reason II
C. Both Reason I and Reason II
D. None of the Reasons
Answer: A) Reason I
Explanation: To prevent overfitting, because the overall complexity of the learner increases with each step. Starting with weak students implies that late grade students will tend to be less big.
Reason I-To prevent overfitting Reason II- To prevent underfitting
A. Reason I
B. Reason II
C. Both Reason I and Reason II
D. None of the Reasons
Answer: A) Reason I
Explanation: To prevent overfitting, because the overall complexity of the learner increases with each step. Starting with weak students implies that late grade students will tend to be less big.