Part A:
Develop a data mining process that:
1) Imports Math Data 1 and Math Data 2
2) Appends the two datasets and remove duplicate examples (if any)
3) Imputes the missing values using KNN (keep the parameters on default)
4) Stores the dataset as “student performance”
5) Creates a pivot table as below:
romantic average(G3 = Low)_Female
average(G3 = Low)_Male
Yes
No
6) After developing steps 1-5, deactivate the Pivot operator and select the below attributes to use
7) Consider the attribute G1 as a Label and use Linear regression (keep the parameters on default) to predict the label using cross-validation (10-fold)
8) Interpret the linear regression Model (you do not need to interpret the performance). Only interpret the coefficient column for those attributes with asterisks
Part B:
Develop a data mining process that:
1) Uses “student performance” dataset
2) Selects numeric attributes only
3) Considers the attribute of G3 as a Label and use Neural networks (keep the parameters on default) to predict the label using cross-validation (5-fold)
4) Considers the attribute of G3 as a Label and uses Neural networks (keep the parameters on default) to predict the label using cross-validation (5-fold). This time wrap the cross-validation inside a Backward elimination (keep the parameters on default).
5) Using steps 3 and 4, fill in the following table and in a few sentences say what you understand from these findings (interpret the table)
Model root_mean_squared_erro squared_correlation
Neural networks
Neural networks with feature selection (backward elimination)
Part C:
Develop a data mining process that:
1) Uses “student performance” dataset
2) Does NOT use the following attributes:
3) Uses “Discretize by User Specification” to create two classes (Low and High) for the attribute of G3 (and assign it as the label). The upper limit for “Low” is 10 and for “High” is Infinity
4) Maps “Low” to positive class
5) Uses “Weight by Information Gain” and “Select by Weights” to select the top 10 attributes
6) After completing steps 1-5, develop three models to predict the classes using cross-validation (10-fold). The three models are: Neural networks, Rule induction, and KNN (all using default parameters).
6) Use Performance (binominal) to measure the performance of the three models (using metrics of accuracy, recall, precision, and AUC), fill in the following table and in a few sentences say what you understand from these findings (interpret the table)
Model Accuracy Precision Recall AUC
Neural networks
KNN
Rule Induction
Part D:
Develop a data mining process that:
1) Uses “student performance” dataset
2) Uses the following attributes only:
3) Dummy codes the nominal attribute
4) Normalizes the data using Z-transformation
5) Creates 4 clusters using the NumericalMeasure of EuclideanDistance. Use a Bar chart to interpret the clusters (to make the interpretation easier, use De-normalization so you can have the original dataset and the label representing the clusters).
6) Of these clusters, only select those students that belong to “cluster_3” and perform association rule analysis (FP-Growth min support = 0.3; keep all parameters as default for the “Create Association Rules”). Interpret 5 interesting AND highly reliable rules.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of