Linear Regression

Well I’m back from a fantastic course at the University of Hull Scarborough campus titled Statistical Programming in Rand thought it was about time I shared a tutorial. So lets have a look at Linear Regression, then next we can look in more depth at Logistic Regression (and maybe Logistic Regression Classifiers.

For this we’ll be using a dataset from the UCI Machine Learning Repository (also see: all data sets). Since I’m looking at Heart Failures we might as well use the Heart Disease set.

Load the data

A shown there is a couple of ways to load the data. We’ll also add the column names (info).

 Inspect the data

Here is what the density plot should look like.

Heart Disease - Density


Giving us this…

Heart Disease - Histogram

I should be really looking at what the inputs are, but this is just a quick demo. So my first thoughts are oldpeak looks like a left skewed distribution, we can maybe fix that with log transform, exang, slope, num, restcd, fbs, sex, cp look more like factors so I’ll just leave them. (NOTE: I should really be looking at what the data is).

Log Transforms

Heart Disease - Log Oldpeak


So the data is ready (ish).


