I was recently playing with Google’s new Photo application. An incredibly powerful little application, yet has an incredibly simple UI.

Google has gone through my entire library of 30k photos and classified them against their boat loads of data.

It’s always nice to get a visual sense of what Machine Learning is doing, and after playing with the photo app it’s quite clear how it classifies photos.

For example, if I search for ‘festival’, it finds photo’s of all sorts of festivals, from Glastonbury:

Glastonbury

To the Soccer World Cup:

South Africa world cup

To the Notting Hill Carnival:

Notting Hill Carnival

To a few incorrect ones like these:

Vietnam

You can understand why machine learning might classify these, and it’s not really a problem. When we have more data, those results will continue to get better, they will almost definitely result in more accuracy going into the future.

So, what’s the problem?

The problem is here; do a search for something like ‘desert’, I get this result:

namibia desert

Do a search for ‘dessert’ and I get this result:

namibia desert

So, through our own stupidity, we have trained the machines to think ‘dessert’ is the same as ‘desert’.

A trivial example, but in a time when we are ramping up our dependence on machine learning, we need to remember that the machine is not always correct and it is for the same reason as the crowd is not always correct.