Introducing πŸ•· Spitey-Sense: Simulate conversations with Spider-Man using πŸͺ„ GPT-3 and Deepgram ☘️

Mohit Yadav - Apr 11 '22 - - Dev Community

πŸš€ Introduction

Hey there! πŸ‘‹ Have you ever wanted to talk with your favorite superheroes, but either they're they're unreachable, or they're too busy shooting for their next blockbuster?

Well, with πŸ•· Spitey-Sense, you could interact with a simulated version of Peter-Parker! The idea is simple; OpenAI's GPT-3 was trained (or fine-tuned) using scripts from various Spider-Man movies, and the AI was able to learn from Peter Parker's personality. At the end, it was able to predict exactly what Peter Parker would have said if he was asked something.

Open AI

For example, GPT-3 classifies Peter's personality to be having a sense of soft humor, and a feeling of responsibility. Therefore, the AI would generate responses to your conversations exactly/mostly like how Peter Parker would have.

The goal of this project was to create an interface to seamlessly communicate with the AI engine and show responses to statements in real time, using web-sockets, along with training the model to act in the required way.

πŸͺ„ Demo

As soon as the user opens up the website, they're greeted with this landing page, from where they could open up the menu, or dive right into the fun part (talking)

Home Page

Upon navigating the /chat route, users are asked to give microphone permission to the app so that they could interact with the AI. A web-socket connection is created with the backend, and the audio is constantly monitored.

Demo

If the user doesn't speak for a few moments, the conversation till now is sent to the OpenAI model, and the AI replies exactly how Peter Parker/Spider-Man would have.

πŸ‹οΈβ€β™€οΈ Training/Fine-Tuning the AI Model

As for the model, I used the Text-DaVinci-001 model, since it is the most powerful General Purpose Artificial Intelligence model out there, and has the highest number of parameters. Also, it's pretty expensive, but is worth it.

Fine-Tuning

For the data, I scoured the internet for scripts, tried fruitlessly to extract information from the PDF script files I got from movie databases, however, it was a nightmare to classify and sort the data according to the requirements. The number of space, tabs, etc. had to be considered, and they often overlapped. Therefore, I continued my search for a better solution.

πŸ’Ώ Getting access to a dataset

After visiting many movie-db sites, I saw an article where someone created a character-simulation using a dataset from Kaggle. I just found what I needed... I quickly downloaded the dataset, and it contained script data for around hundreds of titles. Once I found a super-hero movie, it was time to extract the data from .csv data format.

Kaggle Files Structure

For this purpose, I used 🐼 pandas, and loaded the data from the multiple files, which were partitioned like a relational-database, with unique and foreign keys for each movie, character, etc. I sourced the conversations file, then extracted the plain text lines along with the people speaking those, and finally, fed them into the model. At the end, I was able to finally get around 80-90 lines which had either Peter Parker, or Spider-Man as an active speaker.

I fed them into the model, and though the waiting time was around a few hours, I was finally able to get the custom-model up and running.

πŸ† Deepgram's Role

Deepgram plays an essential role in the whole lifecycle of the app. From providing the first medium of contact with the app, Deepgram seamlessly transcribes audio in real time with almost unnoticeable latency, along with quotations, punctuations, etc. so that the AI model could analyze the tone of the sentence.

Therefore, Deepgram is an integral and irreplaceable part of the app due to its various awesome features.

☘️ Submission Category

Analytics Ambassador: Because the app uses Deepgram to analyse expressions in users' voices (inhances inputs with puntuations) and also makes it accessible for people with low-vision/motor disabilites to interact with the AI without typing a word. The recording is sent in real time and the backend analyses it to convert it to a textual transcript.

πŸ₯³ Useful Links

Here are some useful links you may want to access:

πŸ– Main Links

🦾 AI Links

🏑 Miscellaneous Links:

. . . . . .