Outliers

Outlier removal in R using IQR rule

In short outliers can be a bit of a pain and have an impact on the results. Grubbs (1969) states an outlier “is an observation point that is distant from other observations”. They can usually be seen when we plot the data, below we can see 1, maybe 2 outliers in the density plot. 2.5 is a clear outliers and 2.0 may or may not be.

Density Plot (outlier)

 

So, one of the ways that we can identify outliers is through the use of the Interquartile Range Rule  (IQR Rule). This sets a min and max value for the range based on the 1st and 3rd quartile.

Step 1, get the  Interquartile Range

IQR

Step 2, calculate the upper and lower values

MinIQR

MaxIQR

Step 3, remove anything greater than max, or less than min.

Step 4, enjoy…!

Doing it in R

I get bored repeating processes over and over again, so I sort of automated it in R. Lets have a look at the code…

That is it, this script should remove all the outliers for you. Just be aware of the names of the datasets and make sure you spell the column names correctly.

3 thoughts on “Outlier removal in R using IQR rule”

  1. Pingback: Quora
  2. You’ve saved me a lot of time – and not a little heartache – with the above code. (I also downloaded your introductory guide as I’m bound to learn something useful from it as well.) Thanks again.

Leave a Reply