compsci 2000

Question 1. A bank is predicting the likelihood of default for each customer with an unbalanced data structure. The “No Default” cases occupy 80% of the data while the “Default” cases take up the remaining 20%. There are 1000 customers in the database. The confusion matrix for the model is:

(a) Which group (“Default” or “No Default”) will you call positives?

(b) Calculate the followings:

(i) Plain accuracy

(ii) Error rate

(iii) True positive rate/ Sensitivity

(iv) False positive rate

(v) Specificity

(c) Calculate the overall expected value.

(d) Assume the same target percentage as in the first table. Write down the confusion

matrix for a random classifier.

(e) Calculate the overall expected value for the random classifier in (d).

Question 2. Two classifiers – A and B – are used to predict the probability of an increase in the Fed Funds rate. The predicted probabilities over the past 6 quarters are shown in the following table:

(a) Plot the ROC curves for the 2 classifiers and the random classifier. [Compute the TP and FP rates at the cutoff values: (0, 0.2, 0.4, 0.5, 0.6, 0.8, 1).]

(b) Comment on the 2 models. Which one is better?

