In this article, we’re going to cover quite an advanced topic in Django ORM. And after reading the article, you’re going to understand how Django ORM works a little bit better, especially how it handles joins.
Here’s the video if you prefer watching over reading:
Let’s say we have a Django project with these two simple models:
from django.core.validators import MinValueValidator, MaxValueValidator from django.db import models class Course(models.Model): title = models.CharField(max_length=255) price = models.DecimalField(max_digits=10, decimal_places=2) class Review(models.Model): course = models.ForeignKey( 'Course', related_name='reviews', on_delete=models.CASCADE ) value = models.PositiveSmallIntegerField( validators=[MinValueValidator(1), MaxValueValidator(5)] ) date = models.DateField()
Now, let’s play a little bit with Django ORM and let’s use the filter method.
I’m gonna open the shell right now that prints SQL queries as they’re executed and this shell is not built into Django. You will need to install django-extensions
library if you wanna have the same shell as I do:
$ pip install django-extensions
Let’s just filter courses only by their own fields. It’s the most basic filter we can do:
>>> Course.objects.filter(title__contains='Course', price__gte=20) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" WHERE ("courses_course"."price" >= '20' AND "courses_course"."title" LIKE '%Course%' ESCAPE '\\\\') LIMIT 21 Execution time: 0.000410s [Database: default] <QuerySet [<Course: Course object (1)>, <Course: Course object (2)>]> >>> Course.objects.filter(title__contains='Course').filter(price__gte=20) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" WHERE ("courses_course"."title" LIKE '%Course%' ESCAPE '\\\\' AND "courses_course"."price" >= '20') LIMIT 21 Execution time: 0.000513s [Database: default] <QuerySet [<Course: Course object (1)>, <Course: Course object (2)>]>
If you look at the queries that were generated, they look absolutely the same. So, it doesn’t matter if we chain filters or if we pass all conditions to a single filter call. The result will be the same in both cases.
Now, things start to become tricky when we filter objects based on a ManyToManyField
or based on a reverse ForeignKey
.
Just to illustrate this, let’s try to filter courses by reviews:
>>> Course.objects.filter(reviews__value=5, reviews__date__year=2020) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") WHERE ("courses_review"."date" BETWEEN '2020-01-01' AND '2020-12-31' AND "courses_review"."value" = 5) LIMIT 21 Execution time: 0.000248s [Database: default] <QuerySet [<Course: Course object (1)>]> >>> Course.objects.filter(reviews__value=5).filter(reviews__date__year=2020) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") INNER JOIN "courses_review" T3 ON ("courses_course"."id" = T3."course_id") WHERE ("courses_review"."value" = 5 AND T3."date" BETWEEN '2020-01-01' AND '2020-12-31') LIMIT 21 Execution time: 0.000254s [Database: default] <QuerySet [<Course: Course object (1)>, <Course: Course object (2)>]>
As you can see now, the results of filtering courses are different. The first filter only found one course, and the second filter found two courses.
Also, the queries that were generated and sent to the database are different. The query for the second filter as you can see is a little bit more complicated.
In order to understand why these filters produced different results, first of all, let me show you the data that we have right now in the database:
We have course and review tables. What’s interesting for us right now is review
table. It has 8 rows. And every course has 2 reviews except for the course with id 3. It doesn’t have any reviews.
Also, I’ve highlighted some important cells with green and blue colors. And these cells are important because of the values that they have. We used these values when we were filtering courses in the examples that I showed you before.
Let’s get back to our examples. Let’s take a look at our first filter again:
>>> Course.objects.filter(reviews__value=5, reviews__date__year=2020) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") WHERE ("courses_review"."date" BETWEEN '2020-01-01' AND '2020-12-31' AND "courses_review"."value" = 5) LIMIT 21 Execution time: 0.000248s [Database: default] <QuerySet [<Course: Course object (1)>]>
Here we’re joining course table with review table, and then we apply “WHERE” clause. Basically, we’re just trying to get courses that have at least one review that at the same time has the value equals to 5, and the year when it was created is 2020.
And we only have one review that has 5 and 2020 at the same time and it’s the first review. This review is attached to the course with id 1, and that’s why we had only this course in a QuerySet.
Now, I would say that this filter is quite intuitive and easy to understand. So, let’s take a look at chaining filters:
>>> Course.objects.filter(reviews__value=5).filter(reviews__date__year=2020) SELECT "courses_course"."id", "courses_course"."title", "courses_course"."price" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") INNER JOIN "courses_review" T3 ON ("courses_course"."id" = T3."course_id") WHERE ("courses_review"."value" = 5 AND T3."date" BETWEEN '2020-01-01' AND '2020-12-31') LIMIT 21 Execution time: 0.000254s [Database: default] <QuerySet [<Course: Course object (1)>, <Course: Course object (2)>]>
Chaining filters are trickier. As you can see, we’re joining course table with review table two times, and if you look at “WHERE” clause, you’ll see that the first condition uses the first joined table, and the second condition uses the second joined table.
It’s quite difficult to understand this query, but let’s try to visualize it. Let’s take a look at the result of a similar SELECT if it didn’t have “WHERE” clause:
SELECT "courses_course"."id" as "T1 id", "courses_course"."title" as "T1 title", "courses_course"."price" as "T1 price", "courses_review"."value" as "T2 value", "courses_review"."date" as "T2 date", "T3"."value" as "T3 value", "T3"."date" as "T3 date" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") INNER JOIN "courses_review" T3 ON ("courses_course"."id" = T3."course_id")
As you can see, every single course has 4 rows. Basically, we have all possible combinations of values. And if we try to add “WHERE” clause to our SELECT:
SELECT "courses_course"."id" as "T1 id", "courses_course"."title" as "T1 title", "courses_course"."price" as "T1 price", "courses_review"."value" as "T2 value", "courses_review"."date" as "T2 date", "T3"."value" as "T3 value", "T3"."date" as "T3 date" FROM "courses_course" INNER JOIN "courses_review" ON ("courses_course"."id" = "courses_review"."course_id") INNER JOIN "courses_review" T3 ON ("courses_course"."id" = T3."course_id") WHERE ("courses_review"."value" = 5 AND T3."date" BETWEEN '2020-01-01' AND '2020-12-31')
We will end up with this result:
That’s because we have only two rows where “T2 value” column has 5 and “T3 date” column has 2020.
Basically, when we chain filters, these multiple filter calls are applied independently. We start from the first filter. When we apply this first filter, we get courses that have reviews with value equal to 5. And we have 3 courses with such reviews. Courses with id 1, 2, and 5.
Then we apply another filter. It filters by date 2020. And we only have two courses that have reviews with value
equal to 5, and reviews with date
2020. The first and the second course. The course with id 5 doesn’t have reviews with date
2020, that’s why we don’t have it in the result.
Conclusion
In Django, if we want to filter data, we use filter
method. This filter method works differently depending on how we use it and depending on what relationships our models have.
If we filter our Course
model by its own fields, by one-to-one relationship, or by a foreign key, the result will be the same no matter how we use it. We can chain filters or we can pass all conditions to a single filter method call. It doesn’t matter, the result will be the same.
However, when we start to filter by a reverse foreign key, or by many-to-many relationship, then things start to become trickier.
When we pass all conditions to a single filter method call, these conditions are applied simultaneously. When we chain filters, these multiple filter calls are applied independently.
“So, it doesn’t matter if we chain filters or if we pass all conditions to a single filter call.” I concur purely from an effective standpoint, but are you sure this is also the case from an efficiency standpoint? Isn’t the sql query filtering on the price first more efficient as integer comparison is faster than string matching?
In general SQL databases will execute a query with optimizations to cover basic situations like this (ordering of joins, which predicates to apply first, appropriate use of indexes). So in all probability you can safely forget worries about this kind and focus on what you want the query to return rather than how it will execute in the database internals.
The “tricky” case seems more like a bug than a feature to me. I would expect a chained filter to be further filtering and apply to members of the first filtered result only – this seems to be neither expected nor desirable behavior.