How to Use Elasticsearch with Django?

Patryk Młynarek - Backend Engineer

Patryk Młynarek

30 November 2023, 11 min read

thumbnail post

What's inside

  1. What is Elasticsearch?
  2. Why Use Elasticsearch?
  3. Who Uses Elasticsearch?
  4. Elasticsearch - Some Basic Concepts
  5. Using Elasticsearch with Django
  6. Haystack vs. Elasticsearch DSL
  7. Set Up Elasticsearch
  8. How Does Elasticsearch Know How to Insert Data to Index?
  9. Examples of Usage
  10. Examples of Usage - DRF
  11. Django Elasticsearch DSL DRF - Examples of Usage
  12. Final Remarks
  13. Contact Us

Are you building a Django application that needs to search through a massive data set? You might consider using a traditional relational database. You’ll quickly discover that this solution can be slow and problematic when handling advanced requirements. Luckily for you, this is where Elasticsearch comes in.  

Here are tips and best practices on how to use Elasticsearch with Python.

What is Elasticsearch?

Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene, written in Java. Since its release in 2010, Elasticsearch has quickly become the most popular search engine. It’s commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence.

Companies like eBay, Facebook, Uber, and Github use Elasticsearch to build their products' search, aggregation, analytics, and other functionalities.

Why Use Elasticsearch?

The need for efficient data retrieval is a constant challenge in application development, mainly when dealing with extensive databases and search functionality. Slow data retrieval can lead to a suboptimal user experience, potentially hampering the success of a digital product.

This issue frequently arises due to traditional relational databases, where data is distributed across multiple tables. These databases require querying various tables to access meaningful information, leading to frustrating delays for users.

Relational databases often prove sluggish regarding data retrieval and search operations. Consequently, developers have sought alternative solutions to expedite these processes and enhance system performance.

This is where Elasticsearch comes into play. Elasticsearch is a NoSQL distributed data store specializing in meeting real-time engagement demands and flexible, document-oriented databases, making it an invaluable resource for addressing the challenges associated with data retrieval and search operations in traditional databases.

Who Uses Elasticsearch?

  • eBay - with countless business-critical text search and analytics use cases that utilize Elasticsearch as the backbone, eBay has created a custom 'Elasticsearch as a Service’ platform to allow easy Elasticsearch cluster provisioning on their internal OpenStack-based cloud platform.

  • Facebook - has been using Elasticsearch for 3+ years, having gone from a simple enterprise search to over 40 tools across multiple clusters with 60+ million daily queries and growing.

  • Uber - Elasticsearch plays a crucial role in Uber’s Marketplace Dynamics core data system, aggregating business metrics to control critical marketplace behaviors like dynamic (surge) pricing and supply positioning and assess overall marketplace diagnostics – all in real-time.

  • Github - uses Elasticsearch to index over 8 million code repositories and critical event data.

  • Microsoft - uses Elasticsearch to power search and analytics across various products, including MSN, Microsoft Social Listening, and Azure Search.

  • Just Eat - Elasticsearch increases delivery radius accuracy as it can be used to define more complex delivery routes and provides real-time updates whenever a restaurant makes a change.

Elasticsearch - Some Basic Concepts

  • Index: a collection of different types of documents and document properties. For example, a document set may contain the data of a social networking application.

  • Type/Mapping: a collection of documents sharing a set of standard fields in the same index. For example, an index contains data from a social networking application; there can be a specific type for user profile data, a different kind for messaging data, and yet another for comments data.

  • Document: a collection of fields defined in the JSON format in a specific manner. Every document belongs to a type and resides inside an index. Every document is associated with a unique identifier called the UID.

  • Field: Elasticsearch fields can include multiple values of the same type (essentially a list). In SQL, on the other hand, a column can contain exactly one value of the said type.

Using Elasticsearch with Django

When integrating Elasticsearch with your Django project, you have a range of valuable packages at your disposal. These packages can be used independently or in combination, depending on your specific needs:

  • Elasticsearch-py is the official low-level client library for Elasticsearch in Python, serving as the foundation for other Elasticsearch-related packages.

  • Elasticsearch DSL is a high-level library that helps write and run queries against Elasticsearch. It’s built on top of the official low-level client (elasticsearch-py).

  • Django Elasticsearch DSL is a package that allows easy integration and configuration of Elasticsearch with Django. It’s built as a thin wrapper around elasticsearch-dsl-py, so you can use all the features developed by the elasticsearch-dsl-py team.

Haystack vs. Elasticsearch DSL

While Haystack is a notable open-source package offering modular search capabilities for Django, it has some limitations compared to Elasticsearch DSL.

Haystack doesn't offer full support for the latest versions of Elasticsearch and may not be suitable for handling more complex and advanced search queries. Additionally, its configuration options are relatively minimal and restricted, potentially limiting the flexibility and capabilities available to developers.

In contrast, Elasticsearch DSL provides a more robust and feature-rich approach to integrating Elasticsearch with Django, offering greater support for the latest Elasticsearch features and the ability to handle intricate queries. This makes Elasticsearch DSL a compelling choice for projects where advanced search capabilities are essential.

Set Up Elasticsearch

We first need to download, install, and run the Elasticsearch server on a machine. Fortunately, Elasticsearch is available for multiple platforms. You have a look at how this process works here.

In most cases, we can install Elasticsearch more conveniently - for example, by using some package manager. In our testing project, we will use a docker image.

Once we’ve installed the Elasticsearch server, we can configure it by putting settings to _elasticsearch.yml_. We can manage Elasticsearch from this file using specific keys with the correct options. Some settings are available for changing dynamically on the running server. You just need to send a request to Elasticsearch’s API.

You can check if everything works correctly via curl:

curl -X GET localhost:9200/\_cluster/health

Now set up Elasticsearch in Django

To configure Elasticsearch, we must first add connections to config and settings.py. Django needs to know where the Elasticsearch server is:

ELASTICSEARCH_DSL = {
    "default": {
        "hosts": ["http://elasticsearch:9200"],
    },
}

hosts: elasticsearch:9200 - that’s where we’re creating our host using Docker. This is the name of the service in docker-compose.yml file. Read more about docker services here.

Let’s add apps to INSTALLED\_APPS in settings.py:

  • rest\_framework
  • django\_elasticsearch\_dsl

These are our models. To integrate data with Elasticsearch, we’ll create ‘documents’.

  1. models.py
from django.db import models
from django.utils.translation import gettext_lazy as _


class Manufacturer(models.Model):
    name = models.CharField(
        _("name"),
        max_length=100,
    )
    country_code = models.CharField(
        _("country code"),
        max_length=2,
    )
    created = models.DateField(
        _("created"),
    )

    def __str__(self):
        return self.name


class Car(models.Model):
    TYPES = [
        (1, "Sedan"),
        (2, "Truck"),
        (3, "SUV"),
    ]

    class Meta:
        verbose_name = _("Car")
        verbose_name_plural = _("Cars")

    name = models.CharField(
        _("name"),
        max_length=100,
    )
    color = models.CharField(
        _("color"),
        max_length=30,
    )
    description = models.TextField(
        _("description"),
    )
    type = models.IntegerField(
        _("type"),
        choices=TYPES,
    )
    manufacturer = models.ForeignKey(
        Manufacturer,
        on_delete=models.CASCADE,
        verbose_name=_("manufacturer"),
    )

    def __str__(self):
        return self.name

    def get_auction_title(self):
        return f"{self.name} - {self.color}"
  1. documents.py

We’ll use this document in the following examples: Shards & Replicas.

from django_elasticsearch_dsl import Document
from django_elasticsearch_dsl import fields
from django_elasticsearch_dsl.registries import registry

from cars.models import Car
from cars.models import Manufacturer


@registry.register_document
class CarDocument(Document):
    name = fields.TextField(
        attr="name",
        fields={
            "suggest": fields.Completion(),
        },
    )
    manufacturer = fields.ObjectField(
        properties={
            "name": fields.TextField(),
            "country_code": fields.TextField(),
        }
    )
    auction_title = fields.TextField(attr="get_auction_title")
    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == "silver":
            return 2
        return 1

    class Index:
        name = "cars"
        settings = {"number_of_shards": 1, "number_of_replicas": 0}

    class Django:
        model = Car
        fields = [
            "id",
            "color",
            "description",
            "type",
        ]

        related_models = [Manufacturer]

    def get_queryset(self):
        return super().get_queryset().select_related("manufacturer")

    def get_instances_from_related(self, related_instance):
        if isinstance(related_instance, Manufacturer):
            return related_instance.car_set.all()

But what exactly is going on here?

    name = fields.TextField(
        attr="name",
        fields={
            "suggest": fields.Completion(),
        },
    )

We’re getting names from the database and putting these values into documents. They’re stored in our index, and we use attrs to specify the model field from which the value should be taken. In this case, we also set up additional fields for suggestions.

    manufacturer = fields.ObjectField(
        properties={
            "name": fields.TextField(),
            "country_code": fields.TextField(),
        }
    )

We can add information from related models to our document. ObjectField helps make our data look clearer.

Expert tip: Avoid making your document too considerable and put only the required values. When it comes to just serving data, the database is still the preferred method.

    auction_title = fields.TextField(attr='get_auction_title')

You can use the existing model’s method to create a custom field.

    points = fields.IntegerField()

    def prepare_points(self, instance):
        if instance.color == 'silver':
            return 2
        return 1

Elasticsearch allows preparing custom values, and now you can do everything you need.

How Does Elasticsearch Know How to Insert Data to Index?

Everything works thanks to Django signals (post and delete). Data integrations are near real-time. Django Elasticsearch DSL with default configuration will automatically synchronize all the new data. In more complex cases or when you just don't use post or delete signal, for example, you can manually trigger signal post_save or post_delete if you’re using update instead of save.

Triggering signal post_save:

post\_save.send(MyModel, instance=instance, created=False)

If you have some existing data, use the search\_index command to create and populate the Elasticsearch index and mapping use:

python manage.py search\_index --rebuild

Examples of Usage

Simple filtering for our documents:

cars = CarDocument.search().query("match", color="black")
for car in cars:
    print(car.color)

Sorting:

cars = CarDocument.search().sort("id")

Get suggestions for typed text under our named suggestion:

cars = cars.suggest(
    "my_suggestion",
    "cor",
    completion={
        "field": "name.suggest",
    },
)
suggestions = cars.execute()
suggestions = suggestions.suggest.my_suggestion[0]["options"]
for suggestion in suggestions:
    print(suggestion["text"])

Aggregate documents basing on points field:

  1. Create an instance of our document class to search and setup that documents will be not returned (size=0), because in this case we only want information about aggregations.
  2. Add rules that, we want aggregations for points field.
  3. Execute query to Elasticsearch.
  4. Show the result.
cars = CarDocument.search().extra(size=0)
cars.aggs.bucket('points_count', 'terms', field='points')
result = cars.execute()
for point in result.aggregations.points_count:
    print(point)

Examples of Usage - DRF

A simple serializer for CarDocument:

class ManufacturerSerializer(ModelSerializer):
    class Meta:
        model = Manufacturer
        fields = (
            "name",
            "country_code",
        )


class CarSerializer(ModelSerializer):
    manufacturer = ManufacturerSerializer()
    points = serializers.IntegerField(
        required=False,
        default=1,
    )
    auction_title = serializers.CharField()

    class Meta:
        model = Car
        fields = (
            "id",
            "name",
            "type",
            "description",
            "points",
            "color",
            "auction_title",
            "manufacturer",
        )

A simple View for CarDocument:

class CarSearchAPIView(ElasticSearchAPIView):
    serializer_class = CarSerializer
    document_class = CarDocument

    def elasticsearch_query_expression(self, query):
        return Q(
            "bool",
            should=[
                Q("match", name=query),
                Q("match", color=query),
                Q("match", description=query),
            ],
            minimum_should_match=1,
        )

CarSearchAPIView base on subclass ElasticSearchAPIView which allows to search by documents:

class ElasticSearchAPIView(APIView):
    serializer_class = None
    document_class = None
    query_serializer_class = SearchQuerySerializer

    @abc.abstractmethod
    def elasticsearch_query_expression(self, query):
        """This method should be overridden and return a Q() expression."""

    def get(self, request):
        search_query: SearchQuerySerializer = self.query_serializer_class(data=request.GET.dict())
        if not search_query.is_valid():
            return DRFResponse(f"Validation error: {search_query.errors}", status=status.HTTP_400_BAD_REQUEST)

        query_data: Dict[str, Any] = search_query.data
        try:
            search_query: Bool = self.elasticsearch_query_expression(query_data["query"])
            search: Search = self.document_class.search().query(search_query)

            search = search[query_data["offset"] : query_data["limit"]]
            response: Response = search.execute()

            serializer = self.serializer_class(list(response.hits), many=True)
            return DRFResponse(serializer.data, status=status.HTTP_200_OK)
        except APIViewError:
            return DRFResponse("Error during fetching data", status=status.HTTP_500_INTERNAL_SERVER_ERROR)

Django Elasticsearch DSL DRF - Examples of Usage

  • display all cars which matches corolla query -

http://localhost:8000/cars/?query=corolla

  • display all cars which matches black query and paginate results -

http://localhost:8000/cars/?query=black&limit=2&offset=0

Final Remarks

This guide helps you to use Elasticsearch with Django.

If you’d like to see how Elasticsearch works in practice, download my example project and test Elasticsearch locally using Docker. Be sure to follow the instructions in the readme.

As you can see, Elasticsearch is not as scary as it seems. Thanks to the REST API, every web developer can quickly get familiar with this solution. The construction of this solution is similar to the database concept, which facilitates understanding the node, index, type, and document.

Contact Us

At Sunscrapers, we have a team of developers and software engineers with outstanding technical knowledge and experience. We can build your project successfully.

Talk with us about hiring Python and Django developers and work with the best. Sunscrapers can help you make the most out of Python, Django, and other technologies to put your project to the next level.

Contact us

Patryk Młynarek - Backend Engineer

Patryk Młynarek

Backend Engineer

Patryk is a experienced Senior Python Developer who puts business value on the first place. Web applications enthusiast from initial development to server maintenance, ensuring the entire process runs smoothly. In his free time, Patryk enjoys playing board games and motorcycling.

Tags

python
django

Share