Are you taking data from an API in the format the web services gives it to you? You should not dictate the structure of data inside your application based on how an API provider structures their data. Instead, you can take advantage of the power of Python's list manipulation techniques to sort, filter, and reorganize data in ways that best suit your needs.
In this article, we'll explore a few methods of using native Python features to sort, filter, and remap data from an external REST API that you can then use within your application, or pass down to a client as part of your own API. Looking for techniques to do this in Javascript? We wrote about that in a previous article.
To follow along, you'll need to have Python3 installed and be familiar with running Python code.
Retrieving data from an API
To get started, we'll need some data to manipulate. For the examples in this article, we can use GitHub's v3 REST API and its search/repositories
endpoint. The endpoint takes a few parameters, but we'll use q
to pass in the Bearer
search query, and per_page
to limit the results to 10. This makes it easier to follow along if you are printing results to the console, but feel free to increase the returned results per page if you want a larger dataset to work with. You can also change the search query to anything you like, as it won't affect the manipulations that we'll perform.
You can use any HTTP client, but our examples will be in Requests. Begin by installing requests
and importing it into your project.
To install it, use pip
(or pip3, depending on your local setup):
pip install requests
Next, import requests and set up the API call.
import requests
def get_repos():
res = requests.get('https://api.github.com/search/repositories?q=bearer&per_page=10')
return res.json()['items'] # Returns the value of the 'items' key from the JSON response body
The code above defines a function, get_repos
that will make the API call and return only the part of the response that we are interested in. In this case, the items
list on the JSON response. From this point forward, whenever we need the items data, we can assign it to a variable with x = get_repos()
. This will leave us with a list
of dict
, or dictionary, data types.
Now with the setup out of the way, we can begin manipulating the data from the API.
Sorting results in Python
While many APIs allow you to sort the results in some way, rarely do you have access to sort by any property. GitHub, for example, allows sorting by stars, forks, help-wanted-issues, and how recently a repo has been updated. If you want a different sorting method, you'll need to handle this on your own. One thing to keep in mind is that we are only going to sort the results we retrieve. This means that if you capture the first ten results, sort them, then capture the next 10 results, they will not all be in the right order. If working with large datasets, it's best to capture all the data before sorting to avoid repeating the work on the same set of data multiple times.
Python offers the sorted
built-in function to make sorting lists easier. sorted
takes the list to you want to sort, and a key
representing the value to use when sorting the results. Let's take the following example: Sort the repositories by open issue count, from fewest issues to most issues.
# This line can be used in each example, but you only need it once in your file
repos = get_repos()
fewest_issues_sort = sorted(repos, key=lambda repo: repo['open_issues_count'])
The code above passes repos
in to the sorted
function, and then uses a lambda function for the key
. Lambda functions in Python are anonymous functions that you can use in-line without defining them elsewhere. They are great for use-cases like ours. This lambda is saying: "pass each iteration of repos to this function as repo
, then return its open_issues_count
." The sorted function uses the open issues count as the key. The result is a new list of dictionaries sorted by fewest issues to most issues.
What if you want to reverse the order to show repos with the highest issue count first? You can achieve this with the reverse
argument in sorted
. By setting reverse
to True
the same logic will apply, but in reverse.
most_issues_sort = sorted(repos, key=lambda repo: repo['open_issues_count'], reverse=True)
You can use the same technique to sort by any value.
Filtering list data with Python
Filters pair well with sorting. They allow you to reduce a list down to only the entries that matter for your needs. There are a variety of ways to filter a list, but they all use the same concept of building a new list from a subset of the original list's entries.
One of the most effective ways to do this is using list comprehension. List comprehension is a method for creating lists, but it can also simplify complex loops into shorter expressions. Let's take an example where we only want repositories in our list that have a description filled out.
# Fetch the repos if you haven't already
repos = get_repos()
repos_with_descriptions = [repo for repo in repos if repo['description'] is not None]
The code above does a lot with a single line, so let's break it down. It assigns repos_with_descriptions
to a new list, [ ]
. The first repo
in repo for
represents what we want to add to the new list. The next part repo in repos
should look familiar from any for-in loop. Finally, the end is our condition: if repo['description'] is not None
. When a description is not set in the GitHub API, it returns null
to the JSON response. Python interprets this as None
, so we check to see if the description is not set to None
since we only want repositories with a description.
What if we want two conditions? For example, only repositories that have a description and a homepage URL set.
repos_with_desc_and_home = [repo for repo in repos if repo['description'] is not None and repo['homepage'] is not None]
As with our previous example, this code says: "Make a new list made up of each repo
in repos
, but only include those where the description is not None
and the homepage is not None
."
In both examples we've used the is not
operator. Let's look at a more conventional condition and only include repositories with 100 or more stars.
popular_repos = [repo for repo in repos if repo['stargazers_count'] >= 100]
The code again uses list comprehension, but this time includes a more traditional greater-than-or-equal condition to compare values. You could also implement any of these examples with your preferred looping and list-generation methods.
Remapping or normalizing data
Sometimes an API returns more information than you need. GraphQL solves this by allowing the client to specify exactly the data it needs, but for now most APIs are still REST-like. If you need to reduce the data down to only a few properties before you interact with it or send it on to other parts of your app, you can do so by iterating over the list and building a new one with only the parts you want.
Loops or list comprehension can do this, but we will use Python's map
function. map
applies a function to every item in an iterable. Let's look at an example where we take the original repos
list and simplify it into fewer key/value pairs with map.
def simplify(repo):
return {
'url': repo['html_url'],
'name': repo['name'],
'owner': repo['owner']['login'],
'description': repo['description'],
'stars': repo['stargazers_count']
}
new_repos = map(simplify, repos)
Here, map
receives the simplify
function which will run over each item in repos
. You can use this same method to manipulate the data further, normalize values, and even strip sensitive information.
Beyond the basics
The techniques mentioned in this article are a great starting point to handle the majority of your use cases. API's mostly return JSON, which Python can convert into dictionaries. In our examples, we captured the items
list from the response dictionary and used it alongside many built-in Python functions to iterate over the data.
Whether the data coming from an API is the core of your product offering, or only a part of the value you provide to your users, you need to be able to manipulate it to best suit your needs. You also need to know that the source of this data is reliable.
How vital is this API data to the success of your app? What happens when the APIs experience performance problems or outages? There are safeguards you can put in place to avoid and manage downtimes and protect your integrations from failure. At Bearer we're building a solution that automatically monitors API performance, notifies you of problems, and can even intelligently react when a problem happens. Learn more about Bearer and give it a try today.