I feel like I’ve been staring at RStudio for way too long, so I’ve decided to give Python (SciKit-Learn) another go. I really need to recommend Anaconda Python for this, it contains everything you need for Scientific Python coding, including…

- Python 2.7 (3.x is available)
- SciKit-Learn – Everything machine learning related
- Pandas – Dataframes (all kinds of dataframe stuff)
- Matlibplot – The plotting library
- iPython Notebook – A web based IDE, I like using this
- Spyder – A nice IDE, I’m still getting to grips with it

Seriously, just get Anaconda Python, it is FREE.

I have done previous post on the exact same problem however this uses DataFrames and is hopefully a little neater. There is some tweaks to plotting the hyperplane / decision boundary.

# Lets get started

1 2 3 4 5 6 7 8 9 |
# Load the libraries you will need import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import perceptron from pandas import * # You only need this if using Notebook %matplotlib inline |

## Load in the data

Wow look at the DataFrame in action. We have 3 columns, A, B and Targets. A and B are just the input values. The target is a dichotomous value of 0 or 1, this could represent No or Yes, Product A or Product B, Dead or Alive, etc.

1 2 3 4 5 6 |
# Put some data into a dataframe inputs = DataFrame({ 'A' : [2, 1, 2, 5, 7, 2, 3, 6, 1, 2, 5, 4, 6, 5], 'B' : [2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 7], 'Targets' : [0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1] }) |

## Plot the data (optional)

I always like to plot the data, I think its good practice to see what you are doing.

1 2 3 4 5 6 7 8 |
# Set an array of colours, we could call it # anything but here we call is colormap # It sounds more awesome colormap = np.array(['r', 'k']) # Plot the data, A is x axis, B is y axis # and the colormap is applied based on the Targets plt.scatter(inputs.A, inputs.B, c=colormap[inputs.Targets], s=40) |

You should see this. If not, you might have forgotten the inline thing (above) or your install of Python is missing something.

## Build the model and train

1 2 3 4 5 |
# Create the perceptron object (net) net = perceptron.Perceptron(n_iter=100, verbose=0, random_state=None, fit_intercept=True, eta0=0.002) # Train the perceptron object (net) net.fit(inputs[['A', 'B']],inputs['Targets']) |

## View the coefficients (optional)

I like to see what is going on

1 2 3 4 |
# Output the coefficints print "Coefficient 0 " + str(net.coef_[0,0]) print "Coefficient 1 " + str(net.coef_[0,1]) print "Bias " + str(net.intercept_) |

## Plot the hyperplane / decision boundary

1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Plot the original data plt.scatter(inputs.A, inputs.B, c=colormap[inputs.Targets], s=40) # Calc the hyperplane (decision boundary) ymin, ymax = plt.ylim() w = net.coef_[0] a = -w[0] / w[1] xx = np.linspace(ymin, ymax) yy = a * xx - (net.intercept_[0]) / w[1] # Plot the hyperplane plt.plot(xx,yy, 'k-') plt.ylim([0,8]) # Limit the y axis size |

You should see this…

## Using the system to make a prediction (and a confusion matrix)

Really we should be passing different data to it here, but here we can see the code to use the perceptron to predict the outcome only based on the inputs (in our case A and B).

1 2 3 4 5 6 |
# Do a prediction pred = net.predict(inputs[['A','B']]) print pred # Confusion Matrix confusion_matrix(pred, inputs['Targets']) |

The code also outputs a confusion matrix, it looks horrible.

# Homework

Try this data to…

- Use in the prediction of this model, how well does the system perform?
- Rebuild the full model using this data
- See how the hyperplane has moved?

1 2 3 4 5 6 |
# Different Data inputs = DataFrame({ 'A' : [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5], 'B' : [1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4], 'Targets' : [0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,0,1,1,1,1] }) |