(5/5)

Imports Math Data 1 and Math Data 2

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Part A:

Develop a data mining process that:

1) Imports Math Data 1 and Math Data 2

2) Appends the two datasets and remove duplicate examples (if any)

3) Imputes the missing values using KNN (keep the parameters on default)

4) Stores the dataset as “student performance”

5) Creates a pivot table as below:

romantic average(G3 = Low)_Female

average(G3 = Low)_Male

Yes

6) After developing steps 1-5, deactivate the Pivot operator and select the below attributes to use

7) Consider the attribute G1 as a Label and use Linear regression (keep the parameters on default) to predict the label using cross-validation (10-fold)

8) Interpret the linear regression Model (you do not need to interpret the performance). Only interpret the coefficient column for those attributes with asterisks

Part B:

Develop a data mining process that:

1) Uses “student performance” dataset

2) Selects numeric attributes only

3) Considers the attribute of G3 as a Label and use Neural networks (keep the parameters on default) to predict the label using cross-validation (5-fold)

4) Considers the attribute of G3 as a Label and uses Neural networks (keep the parameters on default) to predict the label using cross-validation (5-fold). This time wrap the cross-validation inside a Backward elimination (keep the parameters on default).

5) Using steps 3 and 4, fill in the following table and in a few sentences say what you understand from these findings (interpret the table)

Model root_mean_squared_erro squared_correlation

Neural networks

Neural networks with feature selection (backward elimination)

Part C:

Develop a data mining process that:

1) Uses “student performance” dataset

2) Does NOT use the following attributes:

3) Uses “Discretize by User Specification” to create two classes (Low and High) for the attribute of G3 (and assign it as the label). The upper limit for “Low” is 10 and for “High” is Infinity

4) Maps “Low” to positive class

5) Uses “Weight by Information Gain” and “Select by Weights” to select the top 10 attributes

6) After completing steps 1-5, develop three models to predict the classes using cross-validation (10-fold). The three models are: Neural networks, Rule induction, and KNN (all using default parameters).

6) Use Performance (binominal) to measure the performance of the three models (using metrics of accuracy, recall, precision, and AUC), fill in the following table and in a few sentences say what you understand from these findings (interpret the table)

Model Accuracy Precision Recall AUC

Neural networks

KNN

Rule Induction

Part D:

Develop a data mining process that:

1) Uses “student performance” dataset

2) Uses the following attributes only:

3) Dummy codes the nominal attribute

4) Normalizes the data using Z-transformation

5) Creates 4 clusters using the NumericalMeasure of EuclideanDistance. Use a Bar chart to interpret the clusters (to make the interpretation easier, use De-normalization so you can have the original dataset and the label representing the clusters).

6) Of these clusters, only select those students that belong to “cluster_3” and perform association rule analysis (FP-Growth min support = 0.3; keep all parameters as default for the “Create Association Rules”). Interpret 5 interesting AND highly reliable rules.

(5/5)

Use CA10RAM to get 10%* Discount.

Imports Math Data 1 and Math Data 2

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science

Other Services

Imports Math Data 1 and Math Data 2

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science