/ aws

AWS Rekognition

A few weeks ago, I took part in a hackathon at Cloudreach. If you are a returning reader, you'll probably know about it already. If not, check it out here. As part of the hackathon, we had to use a new technology that had been launched since AWS Re:Invent 2016 and we decided to use AWS Rekognition to process images being uploaded through Monz.

This post aims to go into a little more detail about what Rekognition is and what you can do with it.

Rekognition

Rekognition is an Image analysis service by AWS. Through the super simple to use API's, you can simply and quickly get results about what is going on in your images. We used this in Monz to figure out if an image attached to a transaction was, in fact, a receipt with the aim of advising the user to upload a better image if it was less than 50% likely to be a receipt.

The use cases for Rekognition expand to identifying items in an image as well as detailed facial recognition.

I'll go into a little more detail on each of these areas and give some code examples to see how you can use Rekognition in your own projects.

Facial Analysis

One of the more powerful features of Rekognition is the ability to detect and analyse faces within an image. This feature is scarily accurate and detailed and can identify a lot of characteristics about individual faces. Check out what it can show:

  • The gender of the person
  • Age range
  • Are they wearing glasses (or eye glasses as Rekognition describes it) or sunglasses
  • Is the face smiling?
  • Does it have a beard or moustache... or both?
  • and a lot more...

The awesome thing about this is how easy it is to get said information. Take a look at the code example below where I use an image of Donald Trump and his family:
Our example image image

"""Rekognition Example."""

import boto3
import json

rek = boto3.client('rekognition') # Setup Rekognition
s3 = boto3.resource('s3') # Setup S3
print "Getting Image"
image = s3.Object('my-cool-bucket','trumprecognition.png') # Get an Image from S3
img_data = image.get()['Body'].read() # Read the image

print "Image retrieved"
print "Sending to Rekognition"

# Detect the items in the image
print "Image retrieved"
print "Sending to Rekognition"
results = rek.detect_faces(
    Image={
        'Bytes': img_data

    },
    Attributes=['ALL']
)


# Print the result
print json.dumps(
    results['Labels'],
    indent=2
)

# Print a message for each item
for face in results["FaceDetails"]:
    msg = "I found a {gender} who is {emot}".format(gender=face['Gender']['Value'], emot=face['Emotions'][0]['Type'].lower())

    if face['Smile']['Value'] is False:
        msg += " but they are not smiling"
    else:
        msg += " and they are smiling"
    print msg

Running this code will give us the following output:

As you can see from this, Rekognition has identified 7 people in the image and our code will print out each person's gender and their emotion. We have access to a lot more information but I'll leave that part to you to play with.

One section of information we do get is bounding box positions of each face as well as points of interest (or landmarks as they are called by Rekognition). This allows us to show where on an image the face is and each part that has been recognised - such as each eye, mouth, nose etc. Although producing that image with the bounding boxes on it is out of the scope of this post, here is what it can look like:
The output of the same image when viewed in the AWS Rekognition console

Looking closer at the results, I found a particularly amusing find and maybe a hint of the accuracy limitations of Rekognition... Let us take a look at what we get back for Trump's son, Barron trump:
Little trump being recognised as a girl...

As you can see... Rekognition has amusingly detected Barron as a girl... with 100% confidence - Perhaps Rekgonition knows more than we do (#FakeNews) however it is likely just an inaccuracy of Rekognition's capability.

Item Detection

As well as facial recognition, AWS Rekognition can also detect items in an image and return a confidence rating. In the example below, I will use an image of a picnic which I have passed into AWS Rekognition using the Boto3 Python module that provides in-depth API access to AWS.

So, using the image below...
Example of Label Detection

and this code...

"""Rekognition Example."""

import boto3
import json

rek = boto3.client('rekognition') # Setup the Rekognition Client
s3 = boto3.resource('s3') # Setup the S3 Resource

print "Getting Image" 
image = s3.Object('my-amazing-bucket', 'picnic.png') # Get an image to work with
img_data = image.get()['Body'].read() # Read the image

print "Image retrieved"
print "Sending to Rekognition"

# Detect the items in the image
results = rek.detect_labels(
    Image={
        'Bytes': img_data

    }
)

print "Rekognition done"

# Print the result
print json.dumps(
    results['Labels'],
    indent=2
)

# Print a message for each item
for label in results["Labels"]:
    print "I am {}% confident of of the image having a {} in it".format(
        int(label['Confidence']),
        label['Name'],
    )

We get something like:
Example of Label Detection

Unlike the facial analysis, we are unable to get much more information other than what has been found and the confidence rating of that item existing. It would be cool for bounding boxes to be included here too and maybe provide some characteristics of the item found such as colour/size/orientation etc... Maybe this is coming?

What I find very impressive about the detection API is how Rekognition is able to see the scene as a picnic plus the items in the picnic. It is more confident of it being a picnic that anything - this is cool as It could be used in Photo album apps to tag images not only with items in the image but describe what the image is of overall - in this case, a picnic, rather than just random plates/cups etc.

What's next?

So that was just a simple dive into AWS Rekognition. I think computer vision combined with Machine Learning/AI and the Cloud are a match made in heaven that all the big players are investing in and making readily available. You can see this in practice in Apple Photos and a bunch of Google services, including Google Photos, their Rekognition equivilent in Google Compute and some very exciting and interesting work around Neural Networks

I think when this technology can also be applied to video and grows in accuracy, we will see a bunch of powerful and even somewhat creepy applications of this side of things come into the world. Obvious ideas would be things like Facebook tagging for Video or something like YouTube detecting product placement and automatically creating a connection with the makers of said products to set up new areas of ad-based revenue.

The technology could also cause new laws and forms of paranoia to develop - How do you stop someone running Rekognition on all publicly available images of a person and being used to build up "profiles" about that person? If Amazon starts recommending me 50% off Magners because of all the party pic's I have on Facebook... I'll be a little bit freaked out (as well as excited for the savings), however on the flip side, that sort of interaction with technology could be pretty cool - How about recommending me new shoes or a haircut based on my Snapchat selfies or Facebook Videos?

If this is something I can give my permission towards and control then I am all for it. I guess as well the next obvious step past video is instant, live video - How about using this technology to detect someone stealing in a shop, or needing help to get to their car - the future of Computer Vision, AI and Neural Networks is exciting!

Let me know what you think and how you could see this technology being used!

Neil Stewart

Neil Stewart

Cloud Systems Developer at Cloudreach. Passion for DevOps and Cloud Infrastructure development. AWS Certified in 5 areas and working towards Chef certification. Apple lover and general tech enthusiast

Read More