(5/5)

# K-Fold Cross Validation: Develop a random forest to classify the MNIST dataset.

INSTRUCTIONS TO CANDIDATES

My homework assignment: Develop a random forest to classify the MNIST dataset. Here are the basic requirements: 1. This is a multi-class problem: the random forest should classify all 10 digits. 2. I want you to use k-fold validation, to determine the optimal number of trees used in the forest. ----------------------------- I attached the lecture notes related to this assignment so you can see what functions the teacher uses.

Objective

Objective of this session is to train our model using different algorithms and evaluate the performances of each

of the classifiers using k-fold cross validation method. We will use digit data from sklearn for this exercise.

We use use the same data and same plit (using train test split method) then we use different classifier to train

our model and measure their scores.

K-Fold Cross Validation

When evaluating different settings (“hyperparameters”) for estimators, such as the C setting that must be

manually set for an SVM, there is still a risk of overfitting on the test set because the parameters can be

tweaked until the estimator performs optimally. This way, knowledge about the test set can “leak” into the model

and evaluation metrics no longer report on generalization performance. To solve this problem, yet another part

of the dataset can be held out as a so-called “validation set”: training proceeds on the training set, after which

evaluation is done on the validation set, and when the experiment seems to be successful, final evaluation can

be done on the test set.

However, by partitioning the available data into three sets, we drastically reduce the number of samples which

can be used for learning the model, and the results can depend on a particular random choice for the pair of

(train, validation) sets.

A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out

for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called kfold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow

the same principles). The following procedure is followed for each of the k “folds”:

A model is trained using k-1 of the folds as training data;

the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a

performance measure such as accuracy).

The performance measure reported by k-fold cross-validation is then the average of the values computed in the

loop. This approach can be computationally expensive, but does not waste too much data (as is the case when

fixing an arbitrary validation set), which is a major advantage in problems such as inverse inference where the

number of samples is very small

(5/5)

## Related Questions

##### . Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

##### . The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

##### . Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

##### . SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

##### . Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Hire Me

Hire Me