logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

The goal of this assignment is to develop a machine learning system that can predict whether a hotel

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

The goal of this assignment is to develop a machine learning system that can predict whether a hotel reservation will be cancelled or not. Towards this goal, you must select the “best” preprocessing techniques, features, model, and model parameters. Apply 5-fold cross-validation to train and evaluate your models using the area under the ROC curve (AUC).

1. Exploratory Data Analysis (EDA) [10 pts]

Report the most relevant insights that might be useful to improve the AUC performance of the predictive model.

2. Baseline System [10 pts]

Start by creating a simple baseline system based on logistic regression. Select some features (which you think have a prediction power, might be based on your EDA), train and cross-validate your model.

a. Report the average AUC with the standard deviation.

3. Preprocessing [20 pts]

Conduct the appropriate preprocessing techniques toward improving the average AUC performance of your baseline system (using the same five fold CV), using for instance:

Replacing missing values

Removing outliers

Encoding categorical variables (large number of categories might require binning/grouping)

Scaling: Normalization/standardization

Report:

a. Your best/selected preprocessing techniques

b. The average AUC with the standard deviation of your baseline model (after preprocessing).

 

4. Feature Selection [20 pts]

The goal of this step is to simplify the system by dropping less important features for the selected model without significantly decreasing the AUC value. To find the more (or less) important features:

a. Apply the recursive feature elimination with cross-validation (RFECV) to select the number of features

b. Based on the above two feature selection techniques, what are the top five features (most powerful predictors)? Does it make sense?

c. Report the percentage reduction in feature size and the average AUC with the standard deviation for the selected feature subset using the same baseline model

 

5. Model Optimization and Selection [20 pts]

Based on the subset of features and the preprocessing techniques selected in the previous step, train and optimize the parameters of KNN and SVM (with an RBF Kernel – gamma and C) using 5-fold cross-validation.

For each model, report:

a. The average AUC with the standard deviation of the best model parameters

b. Plot the (five) ROC curves of the final best model

 

6. Model Optimization and Selection [10 pts]

a. Which preprocessing techniques, features, and model do you select for deployment (operation)?

b. What recommendations do you provide?

c. If your boss give you an extra month to further improve this project, what would you do?

 

Notebook Organization and Code Structure [5 pts]

Good notebook structure, organization, comments, and coding style are expected (similar to the labs and also consult the Rubric for project’s code on Moodle)

Deliverable [5 pts]:

Two files, one notebook (also exported as html) divided into the above 6 sections (and subsections of your choice) and named:

File 1: uid_firstname_lastname_assign3.ipynb (wk47_wael_khreich_assign3.ipynb) [1 pt]

File 2: uid_firstname_lastname_assign3.html (wk47_wael_khreich_assign3.html)  [1 pt]

You can follow this link or this link to convert ipynb -> html from colab

Do not include irrelevant experiments and outputs in your notebook, only the final and relevant code/results that are sufficient to answer the question (the more concise your notebook the better). Describe (in writing) additional experiments if needed. [3 pts]

 

(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Atharva PatilComputer science

593 Answers

Hire Me
expert
Chrisantus MakokhaComputer science

639 Answers

Hire Me
expert
AyooluwaEducation

842 Answers

Hire Me
expert
RIZWANAMathematics

505 Answers

Hire Me

Get Free Quote!

356 Experts Online