How to use Elasticsearch with Django

Paweł Młynarek

25 January 2023, 11 min read

thumbnail post

Are you building a Django application that needs to search through a massive data set? You might consider using a traditional relational database. You’ll quickly discover that this solution can be slow and problematic when handling advanced requirements. Luckily for you, this is where Elasticsearch comes in.  

Here’s an Elasticsearch tutorial for Django to help you make the most of this handy search engine in your project.

What is Elasticsearch?

Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene, written in Java. Since its release in 2010, Elasticsearch has quickly become the most popular search engine. It’s commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence.

Companies like eBay, Facebook, Uber, and Github use Elasticsearch to build their products' search, aggregation, analytics, and other functionalities.

Why use Elasticsearch?

Applications that have search engines operating on massive databases often face this problem: retrieving product information takes way too long. This, in turn, leads to a poor user experience that harms the app's potential as a digital product.

Most of the time, the lag in search originates in the relational database the development team used for building the application, where data is scattered among many different tables. To retrieve meaningful information, the system needs to fetch the data from these tables, which can take longer than some users are prepared to wait.

A relational database may work very slowly when retrieving data and fetching search results through database queries. That's why developers have been busy looking for alternative approaches to accelerate data retrieval.

That's where Elasticsearch comes in. It's a NoSQL distributed datastore that works on a flexible document-oriented database to match the demanding workloads of applications with the real-time engagement required.

Who uses Elasticsearch?

  • eBay - with countless business-critical text search and analytics use cases that utilize Elasticsearch as the backbone, eBay has created a custom 'Elasticsearch as a Service’ platform to allow easy Elasticsearch cluster provisioning on their internal OpenStack-based cloud platform.

  • Facebook - has been using Elasticsearch for 3+ years, having gone from a simple enterprise search to over 40 tools across multiple clusters with 60+ million daily queries and growing.

  • Uber - Elasticsearch plays a crucial role in Uber’s Marketplace Dynamics core data system, aggregating business metrics to control critical marketplace behaviors like dynamic (surge) pricing and supply positioning and assess overall marketplace diagnostics – all in real-time.

  • Github - uses Elasticsearch to index over 8 million code repositories and critical event data.

  • Microsoft - uses Elasticsearch to power search and analytics across various products, including MSN, Microsoft Social Listening, and Azure Search.

  • Just Eat - Elasticsearch increases delivery radius accuracy as it can be used to define more complex delivery routes and provides real-time updates whenever a restaurant makes a change.

Elasticsearch - some basic concepts

  • Index - a collection of different types of documents and document properties. For example, a document set may contain the data of a social networking application.

  • Type/Mapping − a collection of documents sharing a set of standard fields in the same index. For example, an index contains data from a social networking application; there can be a specific type for user profile data, a different kind for messaging data, and yet another for comments data.

  • Document − a collection of fields defined in the JSON format in a specific manner. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.

  • Field - Elasticsearch fields can include multiple values of the same type (essentially a list). In SQL, on the other hand, a column can contain exactly one value of the said type.

Using Elasticsearch with Django

To take advantage of Elasticsearch with Django, we’ll use a few beneficial packages:

  • Elasticsearch DSL - a high-level library that helps write and run queries against Elasticsearch. It’s built on top of the official low-level client (elasticsearch-py).

  • Django Elasticsearch DSL - a package that allows easy integration and configuration of Elasticsearch with Django. It’s built as a thin wrapper around elasticsearch-dsl-py so that you can use all the features developed by the elasticsearch-dsl-py team.

  • Django Elasticsearch DSL DRF - integrates Elasticsearch DSL and the Django REST framework. It provides us with Elasticsearch using API most efficiently.

Haystack vs. Elasticsearch DSL

Haystack is an excellent open-source package that provides a modular search for Django. Unfortunately, it doesn’t fully support the most recent versions of Elasticsearch or more complicated queries. Also, the configuration is minimal and highly restricted.

Set up Elasticsearch

We first need to download, install, and run the Elasticsearch server on a machine. Fortunately, Elasticsearch is available for multiple platforms. You have a look at how this process works here.

In most cases, we can install Elasticsearch more conveniently - for example, by using some package manager. In our testing project, we will use a docker image.

Once we’ve installed the Elasticsearch server, we can configure it by putting settings to elasticsearch.yml. We can manage Elasticsearch from this file using specific keys with the correct options. Some settings are available for changing dynamically on the running server. You just need to send a request to Elasticsearch’s API.

You can check if everything works correctly via curl:

curl -X GET localhost:9200/\_cluster/health

Now set up Elasticsearch in Django

To configure Elasticsearch, we first need to add connections config to settings.py. Django needs to know where the Elasticsearch server is:

ELASTICSEARCH_DSL = {
    'default': {
        'hosts': 'elasticsearch:9200'
    },
}

'hosts': 'elasticsearch:9200' - that’s where we’re creating our host using Docker. This is the name of the service in docker-compose.yml file. Read more about docker services here.

Let’s add apps to INSTALLED_APPS in settings.py:

  • 'rest_framework'
  • 'django_elasticsearch_dsl'
  • 'django_elasticsearch_dsl_drf'

These are our models. To integrate data with Elasticsearch, we’ll create ‘documents’.

models.py

from django.db import models
from django.utils.translation import ugettext_lazy as _


class Manufacturer(models.Model):
    name = models.CharField(
        _('name'),
        max_length=100,
    )
    country_code = models.CharField(
        _('country code'),
        max_length=2,
    )
    created = models.DateField(
        _('created'),
    )


class Car(models.Model):
    TYPES = [
        (1, 'Sedan'),
        (2, 'Truck'),
        (3, 'SUV'),
    ]

    class Meta:
        verbose_name = _('Car')
        verbose_name_plural = _('Cars')

    name = models.CharField(
        _('name'),
        max_length=100,
    )
    color = models.CharField(
        _('color'),
        max_length=30,
    )
    description = models.TextField(
        _('description'),
    )
    type = models.IntegerField(
        _('type'),
        choices=TYPES,
    )
    manufacturer = models.ForeignKey(
        Manufacturer,
        on_delete=models.CASCADE,
        verbose_name=_('manufacturer'),
    )

    def __str__(self):
        return self.name

    def get_auction_title(self):
        return '{} - {}'.format(self.name, self.color)

documents.py

We’ll use this document in the following examples: Shards & Replicas.

from django_elasticsearch_dsl import (
    DocType,
    fields,
    Index,
)

from cars.models import (
    Car,
    Manufacturer,
)

car_index = Index('cars')

car_index.settings(
    number_of_shards=1,
    number_of_replicas=0
)


@car_index.doc_type
class CarDocument(DocType):
    name = fields.TextField(
        attr='name',
        fields={
            'suggest': fields.Completion(),
        }
    )
    manufacturer = fields.ObjectField(
        properties={
            'name': fields.TextField(),
            'country_code': fields.TextField(),
        }
    )
    auction_title = fields.TextField(attr='get_auction_title')
    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == 'silver':
            return 2
        return 1

    class Meta:
        model = Car
        fields = [
            'id',
            'color',
            'description',
            'type',
        ]

        related_models = [Manufacturer]

    def get_queryset(self):
        return super().get_queryset().select_related(
            'manufacturer'
        )

    def get_instances_from_related(self, related_instance):
        if isinstance(related_instance, Manufacturer):
            return related_instance.car_set.all()

But what exactly is going on here?

    name = fields.TextField(
        attr='name',
        fields={
            'suggest': fields.Completion(),
        }
    )

We’re getting names from the database and putting these values into documents. They’re stored in our index, and we use attrs to specify the model field from which the value should be taken. In this case, we also set up additional fields for suggestions.

    manufacturer = fields.ObjectField(
        properties={
            'name': fields.TextField(),
            'country_code': fields.TextField(),
        }
    )

We can add information from related models to our document. ObjectField helps make our data look clearer.

Expert tip: Avoid making your document too considerable and put only the required values. When it comes to just serving data, the database is still the preferred method.

    auction_title = fields.TextField(attr='get_auction_title')

You can use the existing model’s method to create a custom field.

    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == 'silver':
            return 2
        return 1

Elasticsearch allows preparing custom values, and now you can do everything you need.

How does Elasticsearch know how to insert data to index?

Everything works thanks to Django signals (post and delete). Data integrations are near real-time. Django Elasticsearch DSL with default configuration will automatically synchronize all the new data. In more complex cases or when you just don't use post or delete signal, for example, you can manually trigger signal post_save or post_delete if you’re using update instead of save.

Triggering signal post_save:

post\_save.send(MyModel, instance=instance, created=False)

If you have some existing data, use the search_index command to create and populate the Elasticsearch index and mapping use:

python manage.py search\_index --rebuild

Examples of usage

Simple filtering for our documents:

cars = CarDocument.search().query('match', color='black')
for car in cars:
    print(car.color)

Sorting:

cars = CarDocument.search().sort('id')

Get suggestions for typed text under our named suggestion:

cars = CarDocument.search()
cars = cars.suggest('name_suggestions', 'cor', completion={'field': 'name.suggest'})
suggestions = cars.execute()

Note that we get 0 index because we only want one suggestion. We can get a lot of things done at the same time:

suggestions = suggestions.suggest.name_suggestions[0]['options']
for suggestion in suggestions:
    print(suggestion['text'])

  1. Create an instance of our document class to search and setup that documents will be not returned (size=0), because in this case we only want information about aggregations.
  2. Add rules that, we want aggregations for points field.
  3. Execute query to Elasticsearch.
  4. Show the result.
cars = CarDocument.search().extra(size=0)
cars.aggs.bucket('points_count', 'terms', field='points')
result = cars.execute()
for point in result.aggregations.points_count:
    print(point)

Examples of usage - DRF

A simple serializer for CarDocument:

from django_elasticsearch_dsl_drf.serializers import DocumentSerializer

from cars.documents import CarDocument


class CarDocumentSerializer(DocumentSerializer):
    class Meta:
        document = CarDocument
        fields = (
            'id',
            'name',
            'type',
            'description',
            'points',
            'color',
            'auction_title',
            'manufacturer',
        )

A simple ViewSet for CarDocument:

from django_elasticsearch_dsl_drf.constants import (
    LOOKUP_FILTER_RANGE,
    LOOKUP_QUERY_GT,
    LOOKUP_QUERY_GTE,
    LOOKUP_QUERY_IN,
    LOOKUP_QUERY_LT,
    LOOKUP_QUERY_LTE,
    SUGGESTER_COMPLETION,
)
from django_elasticsearch_dsl_drf.filter_backends import (
    DefaultOrderingFilterBackend,
    FacetedSearchFilterBackend,
    FilteringFilterBackend,
    SearchFilterBackend,
    SuggesterFilterBackend,
)
from django_elasticsearch_dsl_drf.viewsets import DocumentViewSet

from cars.documents import CarDocument
from cars.serializers import CarDocumentSerializer


class CarViewSet(DocumentViewSet):
    document = CarDocument
    serializer_class = CarDocumentSerializer
    ordering = ('id',)
    lookup_field = 'id'

    filter_backends = [
        DefaultOrderingFilterBackend,
        FacetedSearchFilterBackend,
        FilteringFilterBackend,
        SearchFilterBackend,
        SuggesterFilterBackend,
    ]

    search_fields = (
        'name',
        'description',
    )

    filter_fields = {
        'id': {
            'field': 'id',
            'lookups': [
                LOOKUP_FILTER_RANGE,
                LOOKUP_QUERY_IN,
                LOOKUP_QUERY_GT,
                LOOKUP_QUERY_GTE,
                LOOKUP_QUERY_LT,
                LOOKUP_QUERY_LTE,
            ],
        },
        'name': 'name',
    }

    suggester_fields = {
        'name_suggest': {
            'field': 'name.suggest',
            'suggesters': [
                SUGGESTER_COMPLETION,
            ],
        },
    }

Django Elasticsearch DSL DRF - examples of usage

  • display all cars

http://localhost:8000/cars/

  • display cars which contain the letter ‘a’ in their name

http://localhost:8000/cars/?name\_\_wildcard=\*a\*

  • search and display cars, which contain the word ‘is’ in the description

http://localhost:8000/cars/?search=description|is

  • filter and get only the cars which have ‘id’ greater or equal than 7

http://localhost:8000/cars/?id\_\_gte=7

  • filter, get only the cars with ids 7 and 8

http://localhost:8000/cars/?id\_\_in=7\_\_8

  • get suggestions for the word ‘cor’

http://localhost:8000/cars/suggest/?name\_suggest\_\_completion=cor

Final remarks

This guide helps you to use Elasticsearch with Django.

If you’d like to see how Elasticsearch works in practice, download my example project and test Elasticsearch locally using Docker. Be sure to follow the instructions in the readme.

As you can see, Elasticsearch is not as scary as it seems. Thanks to the REST API, every web developer can quickly get familiar with this solution. The construction of this solution is similar to the database concept, which facilitates understanding the node, index, type, and document.

Contact us

At Sunscrapers, we have a team of developers and software engineers with outstanding technical knowledge and experience. We can build your project successfully.

Talk with us about hiring Python and Django developers and work with the best. Sunscrapers can help you make the most out of Python, Django, and other technologies to put your project to the next level.

Contact us at hello@sunscrapers.com

Read more

Articles mentioned in the post

Are you ready for your next project?

Whether you need a full product, consulting, tech investment or an extended team, our experts will help you find the best solutions.

Hi there, we use cookies to provide you with an amazing experience on our site. If you continue without changing the settings, we’ll assume that you’re happy to receive all cookies on Sunscrapers website. You can change your cookie settings at any time.