It’s been a while since I last posted something. I will fill the gap with a quick post about rekognition.

rekognition is a service from AWS that is described as:

Deep learning-based image recognition

Search, verify, and organize millions of images

In this light post, I will present a simple method to grab a picture from my webcam, send it to rekognition and display the result.

The part of the result I will focus on is the emotion. In other word, I will ask amazon: “An I happy?”.

Getting the picture from the webcam

I am using the package github.com/blackjack/webcam to grab the picture.

Capabilities of the webcam and image format

My webcam is handling the MJPEG format. Therefore, after the creation of a cam object and set the correct settings to grab mjpeg, I can read a frame in JPEG:

1
2
3
4
5
// ...
cam, err := webcam.Open("/dev/video0") // Open webcam
// ...
// Setting the format:
_,_,_, err := cam.SetImageFormat(format, uint32(size.MaxWidth), uint32(size.MaxHeight))

format is of type uint32 and computable thanks to the informations present in /usr/include/linux/videodev2.h

MJPEG is: 1196444237

Note: To be honest, I did not evaluate the FOURCC method; I have requested the supported format of my webcam with their descriptions :)

Grabbing the picture

In a endless for loop, a frame is read with a call to ReadFrame:

1
2
3
4
5
for {
    timeout := uint32(5) //5 seconds
    err = cam.WaitForFrame(timeout)
    frame, err := cam.ReadFrame()
}

AWS

The API to import to use the service is github.com/aws/aws-sdk-go/service/rekognition and is documented here: http://docs.aws.amazon.com/sdk-for-go/api/service/rekognition/

The operation that I am using to detect the emotion is DetectFaces that takes an pointer to DetectFacesInput with is composed of a pointer to an Image.

Creating the input

The first thing that needs to be created is the Image object from our frame:

1
2
3
4
if len(frame) != 0 {
    image := &rekognition.Image{ // Required
        Bytes: frame,
    }

Then we create the DetectFacesInput:

1
2
3
4
5
6
params := &rekognition.DetectFacesInput{
        Image: image,
        Attributes: []*string{
                aws.String("ALL"), 
        },
}

The ALL attributes is present, otherwise AWS does not return the complete description of what it has found.

Sending the query

Pricing notice and warning

The price of the service as of today is 1 dollar per 1000 request. That sounds cheap, but at 25 FPS, this may cost a lot. Therefore, I have set up a read request that only process a picture if we press enter

1
bufio.NewReader(os.Stdin).ReadBytes('\n')

Session

As usual, to query AWS we need to create a session:

1
2
3
4
5
6
7
var err error
sess, err = session.NewSession(&aws.Config{Region: aws.String("us-east-1")})
if err != nil {
    fmt.Println("failed to create session,", err)
    return
}
svc = rekognition.New(sess)

Note: The session library will take care of connections informations such as environment variables like:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ ACCESS_KEY_ID

Query and result

Simply send the query

1
2
3
4
5
6
esp, err := svc.DetectFaces(params)

if err != nil {
        fmt.Println(err.Error())
        return
}

The result is of type DetectFacesOutput. This type is composed of a array of FaceDetails because obviously there can me more than one person per image. So we will loop and display the emotion for each face detected:

1
2
3
4
5
6
7
for i, fd := range resp.FaceDetails {
        fmt.Printf("The person %v is ", i)
        for _, e := range fd.Emotions {
                fmt.Printf("%v, ", *e.Type)
        }
        fmt.Printf("\n")
}

Run:

Conclusion

That’s all folks. The full code can be found here.

In the test I made, I was always happy. I’ve tried to be angry or sad, without success… Maybe I have a happy face. I should try with someone else maybe.

The service is nice and opens the door to a lot of applications: For example to monitor my home and sends an alert if someone is in my place and not from my family (or not the cat :).