(5/5)

new bank called Universal Bank is looking for ways to convert an abundance of liability customers

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Step 1: Business Problem Understanding

A small, new bank called Universal Bank is looking for ways to convert an abundance of liability customers into personal loan customers, and they have collected a decent amount of records with various attributes. The goal of this case study is to utilize the provided records to see what attributes or combination of attributes would make someone more likely to accept a personal loan.

Step 2: Data Understanding and Collection

For this study, the bank has provided 5,000 records with 14 variables that include: ID, Age, Experience, Income, ZIPCode, Family, CCAvg, Education, Mortgage, Personal Loan, Securities Account, CD Account, Online, and CreditCard. All of the variables in the dataset are numerical; however, many of the variables like: Personal Loan, Securities Account, CD Account, Online, and CreditCard are intended to determine whether a variable is true or false rather than measure how much of a variable is present. ID is a variable intended to connect a customer to a record and is not intended to measure anything or influence results. The Personal Loan variable is the special attribute, and it is intended to measure whether a customer accepted a personal loan. More specifically, a couple attributes in the dataset are more complex and need to be explained like Family. This results for this attribute are intended to measure how large the customers family is, while the education attribute is recorded from 1-3 with 1 representing the customer has an undergraduate degree, 2 representing the customer has a graduate degree, and 3 representing the customer has an advanced degree.

Step 3: Data Preparation and Feature Selection

To prepare the data, I first ensured that there were no missing values by running the dataset and inspecting the statistics tab and missing column and noticed that there was no missing data. Then, I used the set role operator to make the ID variable an ‘ID’ in the records so RapidMiner would not try to do calculations with that variable that would falsely influence the results. Additionally, I searched for outliers by using the ‘detect outlier’ operator and then filtering the examples to keep all the useful data in the model. As a result, 10 rows of data were removed, and 4,990 records remained after the missing values and outliers were removed. Additionally, the data was rewritten to a new excel file that will be used in the model. Finally, I used a correlation matrix to see if any variables were correlated or measured similar things. I noticed that Age and Experience were highly correlated, but I do not believe they are identical, so I will leave them in the model for now. Additionally, income and CCAvg are relatively correlated, but the correlation is not strong enough to consider removing them either.

Step 4: Modeling Development

The creation of this model required three processes in order to get results that were accurate and valuable to the goal of the study; but first, I am going to explain the problems with the first model that requires a separate process that created another dataset with an equal number of True and False records for the Personal Loan attribute. The first logistic regression model I created used the entirety of the clean data and the results were very unbalanced because the model had an abundance of records that were Personal Loan results were false, so the model had heavy bias to assume that almost everything was false. To fix this, I had to create a model that had an equal number of customers that accepted and denied the offer for Personal Loans. So, I started by taking two of the clean datasets and placing them separately in the process and each of the datasets was filtered using the filter example operators. One filter was made to include the True records and the other filtered in the false records. Then, the false set was sampled for 479, which is the number of positive records in the dataset. To finish this process, the two separately filtered datasets were combined into another data set with a total of 958 records that included an equal number of true and false records, and this data set was written into a new excel sheet that was then used to make the combined even data set. After the dataset was clean and even, I created another ROC process to see which type of model operator would produce the best results. Within the ROC operator I included a logistic regression, deep learning, and decision tree operator and ran the process. The decision tree produced the best results, and this was the operator I chose for the last process. Then, I created the logistic reasoning process that started by utilizing the select attributes operator to include the CCAvg, CD Account, CreditCard, Education, Family, ID, Income, Online, and Personal Loan variables since their p-values were under 0.05 and within acceptable. I then used the set role operator to set the role of the Personal Loan variable to a ‘label’ and the ID variable to an ‘ID’ in this separate process. I also included the numerical to binomial operator to convert the Personal Loan variable, so the logistical regression model would be able to function. Finally, the cross-validation operator is included and will perform 100 folds. The decision tree operator is on the training side of the cross-validation operator and the apply model and performance operator are on the testing side. The decision tree operator was also edited to change the criterion to information_gain since it yielded the best results for the model.

Step 5: Model Evaluation and Interpretation

The logistic regression model that was created in step 4 preformed with an accuracy of 96.26%, which is very reliable and means that the model is making correct decisions 96.26% of the time. The model has 22 False Negatives, 465 True Negatives, 14 False Positive, 457 True Positives which means that the model has a misclassification rate of 3.74%. This means that the model is very accurate and able to predict whether a customer is going to accept a personal loan with tremendous accuracy, and the odds of making a correct determination are dependent on the use of the model.

(5/5)

Use CA10RAM to get 10%* Discount.

new bank called Universal Bank is looking for ways to convert an abundance of liability customers

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science

Other Services

new bank called Universal Bank is looking for ways to convert an abundance of liability customers

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science