Go celebrity spotting with the Twilio API for WhatsApp, AWS Rekognition and Ruby

Phil Nash - Apr 2 '19 - - Dev Community

Did you know you can send and receive media using the Twilio API for WhatsApp? When I found out I wanted to make something fun with it, so why not combine it with AWS Rekognition to work out if I look like any celebrities?

By the end of this post, you'll know how to build an app that lets you send an image to a WhatsApp number, download the image, analyse the image with the AWS Rekognition API and respond to say whether there are any celebrities in the picture.

What you'll need

To build this application you'll need a few things:

Got all that? Let's get started then.

Application basics

When Twilio receives a WhatsApp message it will send an HTTP request, a webhook, to a URL we provide. We need to build an application that can receive those webhooks, process the image using the AWS Rekognition service and then send a message back in the response to Twilio.

Create yourself a directory to build your application in and initialize a new Gemfile with bundler:

mkdir celebrity-spotting
cd celebrity-spotting
bundle init
Enter fullscreen mode Exit fullscreen mode

Open up the Gemfile and add the gems we're going to use for this application:

# frozen_string_literal: true

source "https://rubygems.org"

gem "sinatra", require: "sinatra/base"
gem "aws-sdk"
gem "envyable"
gem "down"
gem "twilio-ruby"
Enter fullscreen mode Exit fullscreen mode

We're going to use Sinatra as the web framework to receive the incoming webhooks from Twilio. We'll need the AWS SDK to communicate with the Rekognition service. Envyable is to store our credentials in environment variables in development. Down is a gem that makes it really easy to download files. And the twilio-ruby gem will be used to generate TwiML so that we can communicate back to Twilio in the response.

Run bundle install to install the gems then create the other files we'll need for this app: app.rb, config.ru and config/env.yml. That's the preparation complete, let's start building the application.

Building the app

We'll use config.ru to load and run the application. Add the following code to config.ru:

require "bundler"
Bundler.require

Envyable.load("./config/env.yml") unless ENV["RACK_ENV"] == "production"

require "./app.rb"

run CelebritySpotting
Enter fullscreen mode Exit fullscreen mode

This requires all the dependencies defined in the Gemfile, loads our config into the environment using Envyable and then loads and runs the application. Next, let's create the CelebritySpotting app.

Open app.rb and create a new class:

class CelebritySpotting < Sinatra::Base

end
Enter fullscreen mode Exit fullscreen mode

We need a path to an endpoint that we can provide as our webhook URL. By default Twilio makes a POST request, so our endpoint will respond to POST requests:

class CelebritySpotting < Sinatra::Base
  post "/messages" do

  end
end
Enter fullscreen mode Exit fullscreen mode

We're going to be returning TwiML, so we'll create a new Twilio::TwiML::MessagingResponse and set the content type header to application/xml:

class CelebritySpotting < Sinatra::Base
  post "/messages" do
    content_type "application/xml"
    twiml = Twilio::TwiML::MessagingResponse.new
  end
end
Enter fullscreen mode Exit fullscreen mode

To make sure this is working so far, let's add a message, return the TwiML as XML and test it out:

class CelebritySpotting < Sinatra::Base
  post "/messages" do
    content_type "application/xml"
    twiml = Twilio::TwiML::MessagingResponse.new
    twiml.message body: "Hello! Just testing here."
    twiml.to_xml
  end
end
Enter fullscreen mode Exit fullscreen mode

Start the application on the command line with:

bundle exec rackup
Enter fullscreen mode Exit fullscreen mode

The application will start on http://localhost:9292. There's no interface, so we can test it using curl to see if it is acting correctly.

$ curl -d "" http://localhost:9292/messages
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Message>Hello! Just testing here.</Message>
</Response>
Enter fullscreen mode Exit fullscreen mode

We can see that the message is being returned in the TwiML so let's hook it up to the Twilio API for WhatsApp.

Connecting to the Twilio API for WhatsApp

Twilio provides a sandbox to test your WhatsApp integrations without waiting for a Twilio number to be approved by WhatsApp. Log in to your Twilio console and follow the instructions to set up your WhatsApp sandbox.

Once you have it set up, you need to define a webhook URL so that you can configure your WhatsApp sandbox number.

Our app currently runs on our own machine, so we need to tunnel down to that from the public internet, that's where ngrok comes in. Start ngrok by running:

ngrok http 9292
Enter fullscreen mode Exit fullscreen mode

Executing this command will give you a public URL that looks like https://RANDOM_STRING.ngrok.io. Take that ngrok URL, add the /messages path to it and enter it in your WhatsApp sandbox settings as the URL to call when a message comes in from WhatsApp.

Enter your ngrok URL into the field for

Save your settings for the WhatsApp sandbox and send the sandbox number a message. You should get your testing message back.

If you send any message to your WhatsApp sandbox number you will get the response

We have WhatsApp connected and we can send messages back and forth. This builds the foundation to work with the included images and analyse them with AWS Rekognition.

Receiving and downloading images

Earlier we included the Down gem in the application. We're going to use it to download the images sent to our WhatsApp number.

Returning to app.rb we're going to test whether our incoming message has any images and if it does, download the first one.

Twilio sends all the information we need in the body of the webhook request. We're going to look for the NumMedia parameter to tell whether there is any media. If there is, the image URL will be in the MediaUrl0 parameter.

With that MediaUrl0 parameter we can use Down to download the image. When you download an image with Down it gives you a Tempfile. We can read that file or the various properties of it.

Once we are done with the tempfile we should close and unlink it with the close! method so that it doesn't just hang around the operating system. We also need to handle the case when no image is sent, for this we can reply with a message asking for a picture.

Delete the testing message and add the following code:

  post "/messages" do
    content_type = "text/xml"
    twiml = Twilio::TwiML::MessagingResponse.new
    if params["NumMedia"].to_i > 0
      tempfile = Down.download(params["MediaUrl0"])
      begin
        twiml.message body: "Thanks for the image! It's #{tempfile.size} bytes large."
      ensure
        tempfile.close!
      end
    else
      twiml.message body: "I can't look for celebrities if you don't send me a picture!"
    end
    twiml.to_xml
  end
Enter fullscreen mode Exit fullscreen mode

Restart your app and send yourself a couple more test messages with and without images and make sure the result is what you expect.

Now it's time to start searching for celebrities in the images, time to dig into AWS Rekognition.

AWS Rekognition

Before we make any API calls to AWS we'll need to get an access key and secret. In your AWS console, create a user with the AmazonRekognitionFullAccess policy.

There are many ways to create users and give them permissions within AWS. The following is one way that will give you an API user that can access the Rekognition service.

Start in the AWS console home and search for and select IAM in the "Find Services" box.

Search in the AWS Management Console for

In the IAM section, click on the "Users" menu in the left navigation, then click the "Add user" button.

In the IAM dashboard, navigate to

Give your user a name, check the box for "Programmatic access", and then click "Next: Permissions".

In the

Choose "Attach existing policies directly" and you will see a table of policies. Search for the policies for "Rekognition". You will see three policies, select the AmazonRekognitionFullAccess policy, with the description "Access to all Amazon Rekognition APIs".

When setting permissions, choose to attach policies directly, then search for Rekognition and choose the full access policy.

Now click "Next" until you see the success message.

On the success page you will find your Access key ID and Secret access key

On the success page you will see your "Access key ID" and "Secret access key", save them both in config/env.yml along with an AWS region where Rekognition is available, like "us-east-1". If you want to find out more about this process, check out the documentation on authentication and access control for Rekognition.

AWS_ACCESS_KEY_ID: YOUR_KEY_ID
AWS_SECRET_ACCESS_KEY: YOUR_SECRET_KEY
AWS_REGION: us-east-1
Enter fullscreen mode Exit fullscreen mode

Now, to spot celebrities in our pictures we need to create a client to use the AWS API and send the image to the recognizing celebrities endpoint. Within the begin block add the following code:

      begin  
        client = Aws::Rekognition::Client.new
        response = client.recognize_celebrities image: { bytes: tempfile.read }
      ensure
        tempfile.close!
      end
Enter fullscreen mode Exit fullscreen mode

The Ruby AWS SDK automatically picks up your credentials from the environment. We then read the image we downloaded and send it as bytes to the recognize_celebrities method of the client.

The response will have all the details about the faces that were detected and whether they are likely to be celebrities. You can then build up your response however you like. I chose to report on the celebrities in the picture if there were any and if there weren't report back how many faces were detected:

        if response.celebrity_faces.any?
          if response.celebrity_faces.count == 1
            celebrity = response.celebrity_faces.first
            twiml.message body: "Ooh, I am #{celebrity.match_confidence}% confident this looks like #{celebrity.name}."
          else
            twiml.message body: "I found #{response.celebrity_faces.count} celebrities in this picture. Looks like #{to_sentence(response.celebrity_faces.map { |face| face.name }) } are in the picture."
          end
        else
          case response.unrecognized_faces.count
          when 0
            twiml.message body: "I couldn't find any faces in that picture. Maybe try another pic?"
          when 1
            twiml.message body: "I found 1 face in that picture, but it didn't look like any celebrity I'm afraid."
          else
            twiml.message body: "I found #{response.unrecognized_faces.count} faces in that picture, but none of them look like celebrities."
          end
        end
Enter fullscreen mode Exit fullscreen mode

I also added a short helper function here to turn a list of names into a readable sentence:

def to_sentence(array)
  return array.to_s if array.length <= 1
  "#{array[0..-2].join(", ")} and #{array[-1]}"
end
Enter fullscreen mode Exit fullscreen mode

Restart your app once more and send an image to the WhatsApp number. It turned out I didn't look enough like any celebrities to get a match from Rekognition so I thought I'd try with some celebrities too. I sent myself a few celebrity pictures, like this one, to see the results.

When sending the Ellen DeGeneres Oscars photo, full of celebrities, Rekognition spots Bradley Cooper, Ellen DeGeneres and Kevin Spacey.

There's a few more than that Rekognition!

WhatsApp, Images, AWS, and celebrities

In this post we've seen how to receive images sent to a WhatsApp number using the Twilio API for WhatsApp, download the images with Down and then search for celebrities in them using AWS Rekognition. You can see all the code from this post in this GitHub repo.

This is just the start though, Rekognition gives you a bunch of tools for analysing images, including recognising objects and scenes, text, and even nude or suggestive content.

This is a small Sinatra app, but you could implement this in Rails too. Downloading images and using the Rekognition APIs take quite a while, so you might want to delay those API calls with ActiveJob and respond using the REST API instead. It is worth considering response times as Twilio webhooks will only wait for 15 seconds before they timeout.

Have you built anything cool with image analysis? I'd love to hear about your image hacks in the comments or on Twitter at @philnash.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .