This tutorial was originally posted to the Twitter Developer Blog
Tweets combined with a sentiment score can give you a gauge of your Tweets in a quantitative way. To put some data behind the question of how you are feeling, you can use Python, Twitter’s Recent Search Endpoint to explore your Tweets from the past 7 days, and Microsoft Azure’s Text Analytics Cognitive Service to detect languages and determine sentiment scores. This tutorial will walk you through how you can create code that pulls your Tweets from the past 7 days and gives you a score to let you know exactly how your week has been. You can reference the full version of the code.
Setting up
Before you can get started you will need to make sure you have the following:
- Python 3 installed.
- Twitter Developer account: if you don’t have one already, you can apply for one.
- A Twitter developer app, which can be created in your Twitter developer account.
- A bearer token for your app
- Enrollment in Twitter Developer Labs
- Your app will also need to be enrolled in the preview for Recent Search.
- An account with Microsoft Azure’s Text Analytics Cognitive Service and an endpoint created. You can check out Microsoft’s quick start guide on how to call the Text Analytics API.
You will need to create a dictionary for this project so in your terminal you can type the following, which will create a new directory for you and change from the directory you currently are to the new one you just created. You’ll also create a new Python file and a YAML configuration file that will be used to store your tokens and secrets.
mkdir how-positive-was-your-week
cd how-positive-was-your-week
touch week.py
touch config.yaml
Using the text editor of your choosing, you can now set up your configuration file config.yaml
. You will want to set this up to replace x’s with your own bearer token and subscription key.
search_tweets_api:
bearer_token: xxxxxxxxxxxxxxxxxxxxxxx
azure:
subscription_key: xxxxxxxxxxxxxxxxxxxxxxx
You will also need to install the libraries Requests, PyYaML, and Pandas. Requests will be used to make HTTP requests to the Twitter and Azure endpoints and pandas which is used to shape the data. PyYaML allows you to parse the .yaml
file where you will be storing your keys and tokens. Pandas will be used to manipulate and shape the data.
Open the file week.py
and import all the libraries you’ll use. In addition to Requests and Pandas, you will want to import the packages for json
and ast
, which are part of the standard library of Python, so you don’t need to install them ahead of time. Additionally, you will import pandas
using the alias pd
so you don’t have to type the full word each time you want to call the library.
import requests
import pandas as pd
import json
import ast
import yaml
Creating the URL
Before you can connect the Twitter API, you’ll need to set up the URL to ensure it has the right fields so you get the right data back. You’ll first need to create a function called create_twitter_url
in this function you’ll declare a variable for your handle, you can replace jessicagarson
with your own handle. The max_results
can be anywhere from 1 to 100. If you are using a handle that would have more than 100 Tweets in a given week you may want to build in some logic to handle pagination or use a library such as searchtweets-labs. The URL will need to be formatted to contain the max number of results and the query to say that you are looking for Tweets from a specific handle. You’ll return the formatted URL in a variable called url, since you will need it to make a get GET request later.
def create_twitter_url():
handle = "jessicagarson"
max_results = 100
mrf = "max_results={}".format(max_results)
q = "query=from:{}".format(handle)
url = "https://api.twitter.com/labs/2/tweets/search?tweet.fields=lang&{}&{}".format(
mrf, q
)
return url
The URL you are creating is:
https://api.twitter.com/labs/2/tweets/search?max_results=100&query=from:jessicagarson
You can adjust your query if you wanted to exclude retweets or Tweets that contain media. You can make adjustments to the data that is returned by the Twitter API by adding additional fields and expansions to your query. Using a REST client such as Postman or Insomnia can be helpful for seeing what data you get back and making adjustments before you start writing code. There is a Postman collection for Labs endpoints as well.
Setting up your main function
At the bottom of the file, you can start to set up the main
function that you will use to call all of the functions that you create. You can add the function you just created and call the function using an if __name__ == "__main__"
statement.
def main():
url = create_twitter_url()
if __name__ == "__main__":
main()
Authenticating and Connecting to the Twitter API
To access the configuration file you created while setting up config.yaml
, you can define a function called process_yaml
which will read in the YAML file and save the contents.
def process_yaml():
with open("config.yaml") as file:
return yaml.safe_load(file)
In your main
function, you can save this to a variable named data
. Your main function should now have two variables one for url
and one for data
.
def main():
url = create_twitter_url()
data = process_yaml()
To access the bearer token from your config.yaml
file you can use the following function.
def create_bearer_token(data):
return data["search_tweets_api"]["bearer_token"]
Just as you did earlier, you can add a variable called bearer_token
to your main
function.
def main():
url = create_twitter_url()
data = process_yaml()
bearer_token = create_bearer_token(data)
To connect to the Twitter API, you’ll create a function called twitter_auth_and_connect
where you’ll format the headers to pass in your bearer_token
and url
. At this point, this is where you connect to the Twitter API by using the request package to make a GET request.
def twitter_auth_and_connect(bearer_token, url):
headers = {"Authorization": "Bearer {}".format(bearer_token)}
response = requests.request("GET", url, headers=headers)
return response.json()
The object you are returning, in this function, is a payload that looks like this:
{'data': [{'id': '1272881032308629506', 'text': '@nomadaisy @kndl I just want to do deals with you'}, {'id': '1272880943687258112', 'text': '@nomadaisy @kndl I live too far away to hang responsibly with y’all 😬😭'}, {'id': '1272711045606408192', 'text': '@Babycastles https://t.co/Yfj8SJAnpG'}, {'id': '1272390182231330816', 'text': '@replylord Haha, I broke a glass in your honor today and all so I think I do read your Tweets'}, {'id': '1271810907274915840', 'text': '@replylord I like that I’m the only like here.'}, {'id': '1271435152183476225', 'text': '@Arfness @ChicagoPython @codewithbri @WeCodeDreams @agfors The video seems to be available https://t.co/GojUGdulkP'}, {'id': '1271111488024064001', 'text': 'RT @TwitterDev: Tune in tonight and watch as @jessicagarson takes us through running your favorite Python package in R. 🍿\n\nLearn how to use…'}, {'id': '1270794941892046848', 'text': 'RT @ChicagoPython: Chicago Python will be live-streaming tmrw night!\n\nOur talks:\n- How to run your favorite Python package in R by @jessica…'}, {'id': '1270485552488427521', 'text': "Speaking virtually at @ChicagoPython's __main__ meeting on Thursday night. I'll be showing how to run your favorite Python package in R. https://t.co/TnqgO80I3t"}], 'meta': {'newest_id': '1272881032308629506', 'oldest_id': '1270485552488427521', 'result_count': 9}}
You can now update your main
function so it looks as follows:
def main():
url = create_twitter_url()
data = process_yaml()
bearer_token = create_bearer_token(data)
res_json = twitter_auth_and_connect(bearer_token, url)
Generating languages
While it is possible to get languages from the payload using the Recent Search payload, there is a version of this code that uses this method, Azure also offers an endpoint that will estimate the language for you. Before you can use it, you will need to ensure your data is in the right shape to connect to the detect languages endpoint. Therefore, you’ll format the data to match the format outlined in Azure’s quick start guide. In order to do so, you will need to separate the object inside of the called data which contains the Tweets and ids into a variable called data_only
. You will need to do some string formatting to get the Tweet data into the right format and other formatting needed to convert the string into a dictionary. You can use the json
and ast
libraries to assist in this conversion.
def lang_data_shape(res_json):
data_only = res_json["data"]
doc_start = '"documents": {}'.format(data_only)
str_json = "{" + doc_start + "}"
dump_doc = json.dumps(str_json)
doc = json.loads(dump_doc)
return ast.literal_eval(doc)
To connect to Azure, you will need to format your data, by adjusting the URL, in a similar way to how you did with the Twitter API URL. You can set up your URLs for retrieving data from both the languages and sentiment endpoints. Your credentials will be parsed from your config.yaml
and passed in to authenticate to the Azure endpoints.
def connect_to_azure(data):
azure_url = "https://week.cognitiveservices.azure.com/"
language_api_url = "{}text/analytics/v2.1/languages".format(azure_url)
sentiment_url = "{}text/analytics/v2.1/sentiment".format(azure_url)
subscription_key = data["azure"]["subscription_key"]
return language_api_url, sentiment_url, subscription_key
Additionally, you will create a function, to create the header, for connecting to Azure by passing in your subscription key into the format needed to make your request.
def azure_header(subscription_key):
return {"Ocp-Apim-Subscription-Key": subscription_key}
At this point, you are now ready to make a POST request to the Azure API to generate languages for your Tweets.
def generate_languages(headers, language_api_url, documents):
response = requests.post(language_api_url, headers=headers, json=documents)
return response.json()
You should get back a JSON response that looks similar to the response below.
{'documents': [{'id': '1272881032308629506', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272880943687258112', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272711045606408192', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1272390182231330816', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271810907274915840', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271435152183476225', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1271111488024064001', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1270794941892046848', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}, {'id': '1270485552488427521', 'detectedLanguages': [{'name': 'English', 'iso6391Name': 'en', 'score': 1.0}]}], 'errors': []}
You will also want to update your main function to include the new functions you created. It should now look similar to this.
def main():
url = create_twitter_url()
data = process_yaml()
bearer_token = create_bearer_token(data)
res_json = twitter_auth_and_connect(bearer_token, url)
documents = lang_data_shape(res_json)
language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
headers = azure_header(subscription_key)
with_languages = generate_languages(headers, language_api_url, documents)
Obtaining sentiment scores
Before you can use Azure’s endpoint for generating sentiment scores, you will need to combine the Tweet data with the data that contains the generated languages. You can use pandas to assist in this data conversion process. You can convert the json object with detected languages into a data frame. Since you only want the abbreviations of the language you can do a list comprehension to get the iso6391Name
which contains abbreviations of languages. The iso6391Name
is contained inside of a dictionary, which is inside of a list and the list is inside of the data frame with language data. You can also turn the Tweet data into a data frame and attach the abbreviation for the languages of your Tweets to that same data frame. From there, you can send that Tweet data into a JSON format.
def combine_lang_data(documents, with_languages):
langs = pd.DataFrame(with_languages["documents"])
lang_iso = [x.get("iso6391Name")
for d in langs.detectedLanguages if d for x in d]
data_only = documents["documents"]
tweet_data = pd.DataFrame(data_only)
tweet_data.insert(2, "language", lang_iso, True)
json_lines = tweet_data.to_json(orient="records")
return json_lines
Similarly to how you get the data into a dictionary format with the word documents:
as the key in front of your payload to obtain the sentiment scores.
def add_document_format(json_lines):
docu_format = '"' + "documents" + '"'
json_docu_format = "{}:{}".format(docu_format, json_lines)
docu_align = "{" + json_docu_format + "}"
jd_align = json.dumps(docu_align)
jl_align = json.loads(jd_align)
return ast.literal_eval(jl_align)
Now, your data should be in the right format to call Azure’s sentiment endpoint. You can make a POST request to the sentiment endpoint you defined in the connect_to_azure
function.
def sentiment_scores(headers, sentiment_url, document_format):
response = requests.post(
sentiment_url, headers=headers, json=document_format)
return response.json()
The JSON response you will get returned should look similar to the payload below.
{'documents': [{'id': '1272881032308629506', 'score': 0.18426942825317383}, {'id': '1272880943687258112', 'score': 0.0031259357929229736}, {'id': '1272711045606408192', 'score': 0.7015109062194824}, {'id': '1272390182231330816', 'score': 0.8754926323890686}, {'id': '1271810907274915840', 'score': 0.19140595197677612}, {'id': '1271435152183476225', 'score': 0.7853382229804993}, {'id': '1271111488024064001', 'score': 0.7884223461151123}, {'id': '1270794941892046848', 'score': 0.8826596736907959}, {'id': '1270485552488427521', 'score': 0.8784275054931641}], 'errors': []}
Your main
function should now look similar to the following.
def main():
url = create_twitter_url()
data = process_yaml()
bearer_token = create_bearer_token(data)
res_json = twitter_auth_and_connect(bearer_token, url)
documents = lang_data_shape(res_json)
language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
headers = azure_header(subscription_key)
with_languages = generate_languages(headers, language_api_url, documents)
json_lines = combine_lang_data(documents, with_languages)
document_format = add_document_format(json_lines)
sentiments = sentiment_scores(headers, sentiment_url, document_format)
Getting the average sentiment score
To get the average sentiment score we can turn our JSON response from the Azure sentiment endpoint into a data frame and calculate the mean of the column entitled score.
def mean_score(sentiments):
sentiment_df = pd.DataFrame(sentiments["documents"])
return sentiment_df["score"].mean()
After you have the average score you can create a logic statement to let you know exactly how positive your week was.
def week_logic(week_score):
if week_score > 0.75 or week_score == 0.75:
print("You had a positive week")
elif week_score > 0.45 or week_score == 0.45:
print("You had a neutral week")
else:
print("You had a negative week, I hope it gets better")
The final version of the main statement for your file should look like this:
def main():
url = create_twitter_url()
data = process_yaml()
bearer_token = create_bearer_token(data)
res_json = twitter_auth_and_connect(bearer_token, url)
documents = lang_data_shape(res_json)
language_api_url, sentiment_url, subscription_key = connect_to_azure(data)
headers = azure_header(subscription_key)
with_languages = generate_languages(headers, language_api_url, documents)
json_lines = combine_lang_data(documents, with_languages)
document_format = add_document_format(json_lines)
sentiments = sentiment_scores(headers, sentiment_url, document_format)
week_score = mean_score(sentiments)
print(week_score)
week_logic(week_score)
You should now be able to run your code by typing the following into your terminal:
python3 week.py
Depending on your sentiment score you should see something that looks similar to this in your terminal output:
0.6470708809792995
You had a neutral week
Next steps
This code sample could be easily extended to let you know which Tweets were the most positive, or the most negative, or to track changes week by week with a visualization.
Let us know on the forums if you run into any troubles along the way or Tweet us at @TwitterDev if this inspires you to create anything. I used several libraries and tools beyond the Twitter API to make this tutorial, but you may have different needs and requirements and should evaluate whether those tools are right for you. Twitter does not operate or manage the third party services mentioned above and those services may have separate terms regarding use of any tools and other features.