Eager Loading VS Lazy Loading in SQLAlchemy

Chidiebere Ogujeiofor - Feb 12 '20 - - Dev Community

I have heard of eager loading data relationships a lot in the past, but it was only recently that I became more conscious of it in my APIs. I will be showing you guys what it is and why you should bother about it in this post.

Prerequisites

The prerequisites for this post is a basic understanding of SQL Joins, SQLAlchemy and Python

Definitions of Eager Loading VS Lazy Loading

Eager loading is a technique of retrieving a model's relationship data while querying the model either through a JOIN or subquery.

Lazy loading, on the other hand, retrieves the related models only when an attempt is made to retrieve the relationship field by emitting SELECT statements.

I think examples would help make this clearer.

Examples

Let's say we have three models, namely User, Membership and Company, such that there is a many-to-many relationship between users and companies like so:

Alt text of image

In SQLAlchemy models:



class User(db.Model):
   id = db.Column(db.Integer, primary_key=True, autoincrement=True)
   username = db.Column(
     db.String(20),
     nullable=False,
   )
   password_hash = db.Column(db.VARCHAR(130), nullable=False)
   memberships = db.relationship('Membership',
                                 back_populates='member')


class Company(db.Model):
   id = db.Column(db.Integer, primary_key=True, autoincrement=True)
   name = db.Column(db.String(120), nullable=False)
   website = db.Column(db.String(), nullable=False)
   address = db.Column(db.String(), nullable=False)
   memberships = db.relationship('Membership',
                                 back_populates='company')


class Membership(db.Model):
   id = db.Column(db.Integer, primary_key=True, autoincrement=True)
   user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
   company_id = db.Column(db.Integer,
                               db.ForeignKey('company.id'))
   role = db.Column(db.String, default='REGULAR_USER')
   member = db.relationship("User", back_populates="memberships")
   company = db.relationship('Company', back_populates="memberships")



Enter fullscreen mode Exit fullscreen mode

The Lazy Problem

Now if we query a membership from our database, like so:



>>> membership = Membership.query.first()


Enter fullscreen mode Exit fullscreen mode

By default, SQLAlchemy retrieves only the data contained in the Membership table by running the following SQL query:



INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role
FROM membership
 LIMIT %(param_1)s


Enter fullscreen mode Exit fullscreen mode

Now when you try to access the member.username contained in the membership object:



>>> membership.member.username


Enter fullscreen mode Exit fullscreen mode

it makes another DB call to retrieve the member from the User's table which then contains the username:



INFO:sqlalchemy.engine.base.Engine:SELECT "user".id AS user_id, "user".username AS user_username, "user".password_hash AS user_password_hash
FROM "user"
WHERE "user".id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}


Enter fullscreen mode Exit fullscreen mode

Similarly, when we attempt to retrieve name of the company, it results in another DB call:



>>> membership.company.name


Enter fullscreen mode Exit fullscreen mode


INFO:sqlalchemy.engine.base.Engine:SELECT company.id AS company_id, company.name AS company_name, company.website AS company_website, company.address AS company_address
FROM company
WHERE company.id = %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}


Enter fullscreen mode Exit fullscreen mode

When you retrieve multiple rows(say 50) from a table using lazy loading and you run a loop to access a related field, SQLAlchemy would emit SELECT statements for each of the (50) models you retrieved.

Note that looping through each model described here is how marshmallow dumps nested fields to JSON.

That is lazy-loading relationship fields work. You try to retrieve data only when you need them.

The Eager Solution

When we Eager-Load relationship fields, we tell SQLAlchemy to retrieve those fields when the first query is being made either through a JOIN or a subquery. In general, JOINs tend to be more efficient than subqueries.

We do this by using sqlalchemy.orm module like so:



>>> from sqlalchemy import orm
>>> Membership.query.options(orm.joinedload('company')).first()


Enter fullscreen mode Exit fullscreen mode

This generates the following SQL:



INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role, company_1.id AS company_1_id, company_1.name AS company_1_name, company_1.website AS company_1_website, company_1.address AS company_1_address
FROM membership LEFT OUTER JOIN company AS company_1 ON company_1.id = membership.company_id
 LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}


Enter fullscreen mode Exit fullscreen mode

As you can see a join is made to retrieve the company relationship thus trying to access the company name



>>> membership.company.name
'Ade Store1'


Enter fullscreen mode Exit fullscreen mode

does not lead to any extra DB call.

Eager Loading multiple fields

What if we wanted to return nested data from both the company and member fields, how do we eager load both? We can do that easily by adding more arguments to the options method call like so:



membership = Membership.query.options(
    orm.joinedload('company'), orm.joinedload('member')
).first()


Enter fullscreen mode Exit fullscreen mode

This generates the following SQL statement:



INFO:sqlalchemy.engine.base.Engine:SELECT membership.id AS membership_id, membership.user_id AS membership_user_id, membership.company_id AS membership_company_id, membership.role AS membership_role, user_1.id AS user_1_id, user_1.username AS user_1_username, user_1.password_hash AS user_1_password_hash, company_1.id AS company_1_id, company_1.name AS company_1_name, company_1.website AS company_1_website, company_1.address AS company_1_address
FROM membership LEFT OUTER JOIN "user" AS user_1 ON user_1.id = membership.user_id LEFT OUTER JOIN company AS company_1 ON company_1.id = membership.company_id
 LIMIT %(param_1)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': 1}


Enter fullscreen mode Exit fullscreen mode

With that accessing either the company or member field does not lead to any extra SQL statement being run.

Takeaways

You should eager-load relationship fields only when you know that those fields would be used in your code else you would fall into a situation where you are retrieving values you don't actually need which can make your API a little slower. Also, be sure to avoid lazy-loading fields that you would need for your logic

One other thing to note is that this concept is not tied to only SQLAlchemy. It also exists in Django ORM and some other non-python ORMs.

You can get more info on this topic from the SQLAlchemy Docs

Conclusion

I would like to end by saying that the syntax for the eager-loading doesn't quite cumbersome. You can make this easier by putting the logic in a BaseModel as I did in the blog post below

Finally, you can view all the code(including setup) can be gotten the gist below:

Thanks for your time

Alt text of image

. . . . . . . . . . . . . . .