# Titanic 3 – Model Model Evaulation

This part directly follows on from the Titanic Logistic Regression model we built, so you need to work through that part.

NOTE: You should put this code at the bottom of the code you have already created from this.

## Load the package

For this we are going to be using the SciKit Learn metrics package. We load this as shown below. This code says, from the metrics package load everything (*). It is better practice to bring in only the modules we need but it is simpler to bring them all in.

1 |
from sklearn.metrics import * |

You can put the above code where you left off, I always prefer to put it in the cell with all the other packages. So my code looks like this…

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import pandas as pd import numpy as np pd.set_option("display.max_rows",10) pd.set_option("display.max_columns",101) import statsmodels.api as sm import matplotlib.pyplot as plt # For the evaluation from sklearn.metrics import * %matplotlib inline |

If you do it my way, remember to re-run the cell. Otherwise the package will not load.

## Predict all of *x*

Previously we predicted one row at a time. Now we are going to predict the whole dataset. This is done in the same way as previously done.

1 |
pred = result.predict(x) |

This used the model called ‘result’ to predict the survival probability for all of the passengers we have data for.

If we run ‘pred’ we can see the predictions as values between 0 and 1.

1 |
pred |

## Confusion Matrix

Now we can start looking at how good our model is by comparing the true survival outcomes against what the model predicted. A nice way of doing this is using confusion matrix. Below is an example of what a confusion matrix looks like.

The confusion matrix can give us a lot of information. At first glance the confusion matrix shows the number of correct predicts and number of incorrect predictions. Lets assume the example above shows survival (yes they survived, no they didn’t survive) we can see the following…

- 100 people that where predicted to survive actually survived (Correct classification)
- 50 people that where predicted to not survive actually didn’t survive (Correct classification)
- 10 people where predicted to survive but actually died (False Positive / Type 1 error)
- 5 people where predicted to die but actually survived (False Negative / Type 2 error)

Some other useful measure can be calculated from the confusion matrix, discussed on wikipedia, including:

- Accuracy
- Positive Predicted Value (PPV)
- Negative Predicted Value (NPV)
- Precision
- Recall
- Sensitivity
- Specificity
- Etc, etc, etc

So now lets look at our confusion matrix. In our code we use np.round(pred, 0), this rounds the prediction score to either 0 or 1. This is important because the confusion matrix compares the classification, so did the person survive or not. By doing this we assume that any prediction over 0.5 means they survived and anything below means they did not. 0.5 is the common cut off however the optimal cut off can be calculated using the Youden-J index (I will create a tutorial at some point).

1 |
confusion_matrix(y, np.round(pred,0)) |

This should give us this…

1 2 |
array([[363, 61], [ 80, 210]]) |

This is not pretty, and you need to be careful when interpreting it. Our first job it to identify which are the true positives and which are the true negatives, the the false positives and the false negatives.

A nicer way of viewing this is…

1 |
pd.crosstab(y.Survived, np.round(pred,0), rownames=['True'], colnames=['Predicted']) |

This gives us…

Here we can see the performance of our model as…

- correctly identifying 210 people who survived
- correctly identifying 363 people who did not survived
- incorrectly predicted that 61 would survive but actually died
- incorrectly predicted that 80 people would die but actually survived

Next we want to find out how accurate the model was, we can manually calculate this or we can quickly run this code…

1 |
accuracy_score(y, np.round(pred,0)) |

My model has an accuracy of 0.80252100…. meaning that the model has an accuracy of **80%**.

## ROC Plot

The Receiver Operator Characteristic (ROC) plot is a popular method of presenting the performance of a classifier. For this to work your predictions need to be on a scale of 0 to 1, and not just 0’s or 1’s. The plot shows the trade-off between sensitivity and specificity of the model as the threshold changes. These are also refered to as the ‘false positive rate’ (FPR) and the ‘true positive rate’ (TPR).

To produce this we need to calculate the TPR and FPR at different thresholds, SciKit Learn does this for us…

1 |
fpr, tpr, thresholds = roc_curve(y, pred) |

Next we simply need to plot the FPR and TPR.

1 2 3 4 5 |
plt.plot(fpr, tpr) # Add the labels plt.ylabel("Sensitivity") plt.xlabel("1 - Specificity") |

This gives us…

Note: FPR is 1 – Specificity, Sensitivity and Specificity are more commonly known than TPR and FPR.

The plot shows a smooth curve which is good. It shows that we can adjust the threshold to increase sensitivity and the cost of specificity, and visa versa. An example can be seen…

- If we have a sensitivity of 0.8 we have a specificity of about 0.78 (1 – 0.22)
- If we change the threshold to increase sensitivity to 0.9 we have a specificity of around 0.4 (1 – 0.6).

From this we also can also calculate the Area Under the Curve (AUC). This is a good measure of performance. An AUC of 0.5 means the model is not very good, it is no better than a 50/50 guess. If we have an AUC of less than 0.5 then something went wrong. When you read clinical papers looking at predicting life/death or illness/no-illness then an AUC greater than 0.7 is good and an AUC greater than 0.8 is very good.

We calculate ours using…

1 |
auc(fpr, tpr) |

Note: we use the TPR and FPR from the ROC plot.

This gives us an AUC of 0.86328887….. or **86%**, which is pretty good.