[Solved]1 Load Loandatacsv Data Set R Lists Outcome 5611 Loans Data Variables Include Loan Status Q37180247
1 A. Load the LoanData.csv data set into R. It lists theoutcome of 5611 loans. The data variables include loan status(current, late or in default), credit grade (from best rating AA tothe worst one, HC for heavy risk), loan amount, loan age (inmonths), borrower’s interest rate and the debt to income ratio. Code loan status as a binary outcome (0 for currentloans, 1 for late or default loans). Code debt-to-income ratio intothree levels (‘low’ for ratio<10%, ‘medium’ for ratio between10% and 30%, ‘high’ for ratio above 30%).
B. Fit the recoded data set using logistic regression.Use Credit.Grade, Amount, Age, Borrower.Rate and Debt to IncomeRatio (recoded) as the explanatory variables. Copy the glmsummary output from R and paste it below.
C. Evaluate in-sample fitting of your logisticregression model using .5 as the cutoff probability. Display theconfusion matrix below.
D. The cutoff probability should be around 92.43% withsymmetric costs of misclassification. Why? Display the confusionmatrix using the updated cutoff probability below. What’s theoverall in-sample misclassification rate in this case?
E. Randomly select 4611 out of 5611 loans as yourtraining set. Apply the fitted logistic model to the 1000 loansfrom your test set. Choose the appropriate cutoff probabilityassuming symmetric costs of misclassification [see step D]. What’syour out-of-sample prediction accuracy rate based on the test set’sconfusion matrix?
F. Sort the 1000 loans in your test set according to thepredicted default probabilities in decreasing order. Use a FOR loopto calculate the lift. Then plot the lift chart for your testset.
G. Calculate the out-of-sample prediction accuracy ratefor 20 random test samples (sample size=1000). Display the 20accuracy rates and their mean below.
Expert Answer
Answer to 1 A. Load the LoanData.csv data set into R. It lists the outcome of 5611 loans. The data variables include loan status (… . . .
OR

