Are you building a Django application that needs to search through a massive data set? You might be considering to use a standard relational database. But you’ll quickly find out that this solution can be slow and problematic when handing advanced requirements. That’s where Elasticsearch comes in. 

Here’s an Elasticsearch tutorial for Django to help you make the most of this handy search engine in your project.

What is Elasticsearch?

Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene, written in Java. Since its release in 2010, Elasticsearch has quickly become the most popular search engine. It’s commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence.

Companies like eBay, Facebook, Uber, and Github use Elasticsearch to build search, aggregation, analytics, and other functionalities in their products.


Why use Elasticsearch?

Applications that have search engines operating on massive databases often face this problem: retrieving product information taking way too long. This, in turn, leads to poor user experience that has a negative impact on the app’s potential as a digital product.

Most of the time, the lag in search originates in the relational database the development team used for building the application, where data is scattered among many different tables. To retrieve meaningful information, the system needs to fetch the data from these tables – and that can time more time than some users are prepared to wait for.

A relational database may work very slowly when retrieving data and fetching search results through database queries. That’s why developers have been busy looking for alternative approaches that would accelerate the process of data retrieval.

That’s where Elasticsearch comes in. It’s a NoSQL distributed datastore that works on flexible document-oriented database to match the demanding workloads applications with real-time engagement require.


Who uses Elasticsearch?

  • eBay – with countless business-critical text search and analytics use cases that utilize Elasticsearch as the backbone, eBay has created a custom ‘Elasticsearch as a Service’ platform to allow easy Elasticsearch cluster provisioning on their internal OpenStack-based cloud platform.
  • Facebook has been using Elasticsearch for 3+ years, having gone from a simple enterprise search to over 40 tools across multiple clusters with 60+ million queries a day and growing.
  • Uber – Elasticsearch plays a key role in Uber’s Marketplace Dynamics core data system, aggregating business metrics to control critical marketplace behaviors like dynamic (surge) pricing, supply positioning, and assess overall marketplace diagnostics – all in real time.
  • Github uses Elasticsearch to index over 8 million code repositories, as well as critical event data.
  • Microsoft – uses Elasticsearch to power search and analytics across various products, including MSN, Microsoft Social Listening, and Azure Search,
  • Just Eat – Elasticsearch increases delivery radius accuracy as it can be used to define more complex delivery routes and provides real-time updates whenever a restaurant makes a change.


Elasticsearch – some basic concepts

  • Index – a collection of different types of documents and document properties. For example, a document set may contain the data of a social networking application.
  • Type/Mapping − a collection of documents sharing a set of common fields present in the same index. For example, an index contains data of a social networking application; there can be a specific type for user profile data, another type for messaging data, and yet another one for comments data.
  • Document − a collection of fields defined in the JSON format in a specific manner. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier, called the UID.
  • Field – Elasticsearch fields can include multiple values of the same type (essentially a list). In SQL, on the other hand, a column can contain exactly one value of the said type.


Using Elasticsearch with Django

To take advantage of Elasticsearch with Django, we’ll use a few very helpful packages:

  • Elasticsearch DSL- a high-level library that helps with writing and running queries against Elasticsearch. It’s built on top of the official low-level client (elasticsearch-py).
  • Django Elasticsearch DSL – a package that allows easy integration and configuration of Elasticsearch with Django. It’s built as a thin wrapper around elasticsearch-dsl-py, so you can use all the features developed by the elasticsearch-dsl-py team.
  • Django Elasticsearch DSL DRF – integrates Elasticsearch DSL and the Django REST framework. It provides us Elasticsearch using API most efficiently.

Haystack vs. Elasticsearch DSL

Haystack is a great open-source package that provides modular search for Django. Unfortunately, it doesn’t fully support the most recent versions of Elasticsearch or more complicated queries. Also, the configuration is minimal and highly restricted.


Set up Elasticsearch

The first thing we need to do is download, install, and run Elasticsearch server on a machine. Fortunately, Elasticsearch is available for multiple platforms. You have a look at how this process works here

Hopefully, in most cases we can install Elasticsearch more conveniently – for example, by using some kind of packages manager. In our testing project we will use a docker image.

Once we’ve installed the Elasticsearch server, we can configure it by putting settings to elasticsearch.yml. We can manage Elasticsearch from this file, using specific keys with the correct options. Some settings are available for changing dynamically on running server, you just need to send a request to Eelasticsearch’s API.

You can check if everything works correctly via curl:

curl -X GET localhost:9200/_cluster/health


Now set up Elasticsearch in Django

To configure Elasticsearch, we first need to add connections config to settings.py. Django needs to know where the Elasticsearch server is:

1
2
3
4
5
ELASTICSEARCH_DSL = {
    'default': {
        'hosts': 'elasticsearch:9200'
    },
}

‘hosts’: ‘elasticsearch:9200’ – that’s where we’re creating our host using Docker. This is the name of the service in docker-compose.yml file. Read more about docker services here.

Let’s add apps to INSTALLED_APPS in settings.py:

  • ‘rest_framework’
  • ‘django_elasticsearch_dsl’
  • ‘django_elasticsearch_dsl_drf’

These are our models. To integrate data with Elasticsearch, we’ll create ‘documents’.

#models.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from django.db import models
from django.utils.translation import ugettext_lazy as _


class Manufacturer(models.Model):
    name = models.CharField(
        _('name'),
        max_length=100,
    )
    country_code = models.CharField(
        _('country code'),
        max_length=2,
    )
    created = models.DateField(
        _('created'),
    )


class Car(models.Model):
    TYPES = [
        (1, 'Sedan'),
        (2, 'Truck'),
        (3, 'SUV'),
    ]

    class Meta:
        verbose_name = _('Car')
        verbose_name_plural = _('Cars')

    name = models.CharField(
        _('name'),
        max_length=100,
    )
    color = models.CharField(
        _('color'),
        max_length=30,
    )
    description = models.TextField(
        _('description'),
    )
    type = models.IntegerField(
        _('type'),
        choices=TYPES,
    )
    manufacturer = models.ForeignKey(
        Manufacturer,
        on_delete=models.CASCADE,
        verbose_name=_('manufacturer'),
    )

    def __str__(self):
        return self.name

    def get_auction_title(self):
        return '{} - {}'.format(self.name, self.color)

# documents.py

This is the document we’ll be using in the next examples: Shards & Replicas.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from django_elasticsearch_dsl import (
    DocType,
    fields,
    Index,
)

from cars.models import (
    Car,
    Manufacturer,
)

car_index = Index('cars')

car_index.settings(
    number_of_shards=1,
    number_of_replicas=0
)


@car_index.doc_type
class CarDocument(DocType):
    name = fields.TextField(
        attr='name',
        fields={
            'suggest': fields.Completion(),
        }
    )
    manufacturer = fields.ObjectField(
        properties={
            'name': fields.TextField(),
            'country_code': fields.TextField(),
        }
    )
    auction_title = fields.TextField(attr='get_auction_title')
    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == 'silver':
            return 2
        return 1

    class Meta:
        model = Car
        fields = [
            'id',
            'color',
            'description',
            'type',
        ]

        related_models = [Manufacturer]

    def get_queryset(self):
        return super().get_queryset().select_related(
            'manufacturer'
        )

    def get_instances_from_related(self, related_instance):
        if isinstance(related_instance, Manufacturer):
            return related_instance.car_set.all()

But what exactly is going on here?

1
2
3
4
5
6
    name = fields.TextField(
        attr='name',
        fields={
            'suggest': fields.Completion(),
        }
    )

We’re getting names from the database and putting these values to documents. They’re stored in our index, and we use attrs to specify the model field from which value should be taken. In this case, we also set up additional fields for suggesting.

1
2
3
4
5
6
    manufacturer = fields.ObjectField(
        properties={
            'name': fields.TextField(),
            'country_code': fields.TextField(),
        }
    )

We can add information from related models to our document. ObjectField is really helpful in making our data look clearer.

Expert tip: Avoid making your document too large and put only the required values. When it comes to just serving data, database is still the preferred method.

1
    auction_title = fields.TextField(attr='get_auction_title')

You can use the existing model’s method to create a custom field.

1
2
3
4
5
6
    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == 'silver':
            return 2
        return 1

Elasticsearch allows preparing custom values, and now you can do everything you need.


How does Elasticsearch know how to insert data to index?

Everything works thanks to Django signals (post and delete). Data integrations are near real-time. Django Elasticsearch DSL with default configuration will automatically synchronize all the new data. In more difficult cases or when you just don’t use post or delete signal, for example, you can manually trigger signal post_save or post_delete if you’re using update instead of save.

Triggering signal post_save:

post_save.send(MyModel, instance=instance, created=False)

If you have some existing data, use the search_index command to create and populate the Elasticsearch index and mapping use:

python manage.py search_index –rebuild


Examples of usage:

Simple filtering for our documents:

1
2
3
cars = CarDocument.search().query('match', color='black')
for car in cars:
    print(car.color)

Sorting:

1
cars = CarDocument.search().sort('id')

Get suggestions for typed text under our named suggestion:

1
2
3
cars = CarDocument.search()
cars = cars.suggest('name_suggestions', 'cor', completion={'field': 'name.suggest'})
suggestions = cars.execute()

Note that we get 0 index because we only want one suggestion. We can get a lot of things done at the same time:

1
2
3
suggestions = suggestions.suggest.name_suggestions[0]['options']
for suggestion in suggestions:
    print(suggestion['text'])
  1. Create an instance of our document class to search and setup that documents will be not returned (size=0), because in this case we only want information about aggregations.
  2. Add rules that, we want aggregations for points field.
  3. Execute query to Elasticsearch.
  4. Show the result.
1
2
3
4
5
cars = CarDocument.search().extra(size=0)
cars.aggs.bucket('points_count', 'terms', field='points')
result = cars.execute()
for point in result.aggregations.points_count:
    print(point)


Examples of usage – DRF:

A simple serializer for CarDocument:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from django_elasticsearch_dsl_drf.serializers import DocumentSerializer

from cars.documents import CarDocument


class CarDocumentSerializer(DocumentSerializer):
    class Meta:
        document = CarDocument
        fields = (
            'id',
            'name',
            'type',
            'description',
            'points',
            'color',
            'auction_title',
            'manufacturer',
        )

A simple ViewSet for CarDocument:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
from django_elasticsearch_dsl_drf.constants import (
    LOOKUP_FILTER_RANGE,
    LOOKUP_QUERY_GT,
    LOOKUP_QUERY_GTE,
    LOOKUP_QUERY_IN,
    LOOKUP_QUERY_LT,
    LOOKUP_QUERY_LTE,
    SUGGESTER_COMPLETION,
)
from django_elasticsearch_dsl_drf.filter_backends import (
    DefaultOrderingFilterBackend,
    FacetedSearchFilterBackend,
    FilteringFilterBackend,
    SearchFilterBackend,
    SuggesterFilterBackend,
)
from django_elasticsearch_dsl_drf.viewsets import DocumentViewSet

from cars.documents import CarDocument
from cars.serializers import CarDocumentSerializer


class CarViewSet(DocumentViewSet):
    document = CarDocument
    serializer_class = CarDocumentSerializer
    ordering = ('id',)
    lookup_field = 'id'

    filter_backends = [
        DefaultOrderingFilterBackend,
        FacetedSearchFilterBackend,
        FilteringFilterBackend,
        SearchFilterBackend,
        SuggesterFilterBackend,
    ]

    search_fields = (
        'name',
        'description',
    )

    filter_fields = {
        'id': {
            'field': 'id',
            'lookups': [
                LOOKUP_FILTER_RANGE,
                LOOKUP_QUERY_IN,
                LOOKUP_QUERY_GT,
                LOOKUP_QUERY_GTE,
                LOOKUP_QUERY_LT,
                LOOKUP_QUERY_LTE,
            ],
        },
        'name': 'name',
    }

    suggester_fields = {
        'name_suggest': {
            'field': 'name.suggest',
            'suggesters': [
                SUGGESTER_COMPLETION,
            ],
        },
    }


Django Elasticsearch DSL DRFexamples of usage:

  • http://localhost:8000/cars/ – display all cars.
  • http://localhost:8000/cars/?name__wildcard=*a* – display cars which contain the letter ‘a’ in their name.
  • http://localhost:8000/cars/?search=description|is – search and display cars, which contain the word ‘is’ in the description.
  • http://localhost:8000/cars/?id__gte=7 – filter and get only the cars which have ‘id’ greater or equal than 7.
  • http://localhost:8000/cars/?id__in=7__8 – filter, get only the cars with ids 7 and 8.
  • http://localhost:8000/cars/suggest/?name_suggest__completion=cor – get suggestions for the word ‘cor’.

You may also be interested in: 


See how Elasticsearch works

I hope this guide helps you to use Elasticsearch with Django.

If you’d like to see how Elasticsearch works in practice, download my example project and test Elasticsearch locally using Docker. Be sure to follow the instructions in readme. 

As you can see, Elasticsearch is not as scary as it seems. Thanks to the REST API, every web developer can quickly get familiar with this solution. The construction of this solution is similar to the database concept, which facilitates the understanding of what is node, index, type and document. 

If you have any questions about using Elasticsearch with Django or Python, please share them in the comments section!

Patryk Młynarek
Patryk
Backend Engineer

Patryk is a Python backend engineer at Sunscrapers. He specializes in developing web applications using Python and Django. In his free time, Patryk enjoys playing board games and motorcycling.

Python

The Ultimate Tutorial for Django REST Framework: Functional Endpoints and API Nesting (Part 6)

Here’s another part of my tutorial for Django REST framework. Be sure to catch up with the work we’ve completed in other parts of the series: In this article, [...]

Python

6 expert tips for building better Django models

As the old programming adage goes: Show me your algorithm, and I will remain puzzled but show me your data structure, and I will be enlightened. An application’s data [...]

Join our newsletter.

Scroll to bottom

Hi there, we use cookies to provide you with an amazing experience on our site. If you continue without changing the settings, we'll assume that you're happy to receive all cookies on the Sunscrapers website. You can change your cookie settings at any time.

Learn more