Machine Learning: Step-Wise Classification


Github Repo


This program is a machine learning algorithm that attempts to calssify images of human faces as either smiling or not smilling. The program is implemented by hand using numpy and matplot lib. In this project, I train the model using training sizes of 400 images, 800 images, 1200 images, 1600 images, and 2000 images. In the results we can see how the different training sizes affect the overal accuracy of the model on the testing set.

Build Using

Built Using Python, Numpy, Matplotlib

Some Background

Machine learning is a very powerful tool when a model is given a lot of past data to make informed decisions in the future. Python is a great language to implement machine learning projects in as it is an interpreted language, has a high level of abstraction, and has excellent library support. The one major drawback of python is that it is slow, especially when iterating via a loop. However, numpy is a great python library that uses vectorization to run tasks much faster.

When building a machine that can detect if an individual is smiling, we need to have a set of ground truths. For this project, we have a set of 2000 training faces and a set of corresponding training labels that are either marked as true or false if an image is considered smiling. With this data we can create our smile detection model.


Machine learning is all about optimization as we want to maximize the accuracy of our model while minimizing the lost. In this problem we are going to be using step-wise classification to select the best features myopically.

Essentially, the machine will operate as follows. We will look at two pixels from every image in the training set, and compare them using a simple greater than operation. If p1 > p2, the machine will return true, meaning it considers this image to be smiling. Now, this method of classification immedietly seems far to trivial when it comes to image classification, right? How can we possibly expect the machine to understand when what a smile looks like if it can only look at two pixels and return true if one is greater than another. Well, due to the continuously optimizing nature of machine learning, this method actually does return positive results. Though not a perfect model, even using two pixels and doing a simple greater than comparision, we can find positive results.

In this algorithm we are going to be finding the best pixel comparison out of all of the pixels in the image. Each image is simplified to a 24x24 array, thus there are 576 * 575 different pixel combinations to be tested. Once we have found the pixel pair that returns the highest level of accuracy on the training set, we will then find another pixel pair and repeat the process now with the oringial pair in the algorithm.

This concept may seem a little bizzare so lets run through a numerical example. Lets say we have our training set of 2000 face images and 2000 corresponding labels. We then task our model with finding the first pixel pair that can correctly identify which images are smiling doing a simple grater than comparison. For the sake of example, lets say the model returns pixel 5 and pixel 7 as the best pair with an accuracy of 65.5%. This means that in all of the images of faces, when we looked at if pixel 5 > pixel 7, that determined the image was smiling 65% of the time. Now with that in mind, we then go onto find the next pixel pair, however we continue to use pixel 5 and piexle 7. Lets say our model now returns pixel 10 and pixel 25 with accuracy 72%. That means that when we looked at all of our face images when we did pixel 5 > pixel 7 or pixel 10 > pixel 25, it was able to correctly identify a smile 72% of the time.

For this project we repreat this process for five cycles, or until we have 5 pixel pairs. We determine whether an image is smiling if 3/5 pixel comparisons return true.

In the code it is very important that we vectorize this process. When I originally wrote the code, looping through all of the images in the training set, it took hours to train the model. Once I veotrized the code, the model could be training in a matter of minutes.


After calculating the training accuracy and testing accuracy using the various different training sizes the results are as follows:

N Training Accuracy Testing Accuracy
400 0.825 0.72374
800 0.80625 0.742888
1200 0.79416 0.742888
1600 0.788125 0.748358
2000 0.786 0.764221

We can see from the table that as the number of values (n) in the training set increased, the training accuracy decreased while the testing accuracy increased. We expect the testing accuracy to improve with more samples because when there are more samples we have when training, the model can take into account more information about what characterizes a smile. Therefore, we expect that when n is equal to 2000, we get our best testing accuracy of 0.764221.

As the value of n increases we can also see how the training accuracy decreases. The training accuracy is always maximized based on the number of samples in the training set. When the training set is smaller, the model does not have as many different images of people smiling; thus, it does not have to account for as much variation. When there is less variation between images, it is easier to train the model and find similarities between images. That then causes the training accuracy to be higher; however, as we can see this does not translate to having a higher testing accuracy. The testing accuracy is maximized when there is more data in the training set because that brings more variation in images.

These trends seem to make sense if we stop and consider what the model truly is. At the end of the day, we are trying to classify images as smiling or not smiling based on the comparison between one pixel and another. Even if we do this for five pairs of pixels, this basis of classification is decidedly weak. I am surprised we were even able to get a testing accuracy of 0.7642 using such a method. In a perfect world we would look at way more variables, but that would be extremely costly.

Here we can view the various pixles that each size of training data deeded to be the most accurate at predicting smile rates.