Elasticsearch is an open-source distributed search server that comes in handy for building applications with full-text search capabilities. While its core implementation is in Java, it provides a REST interface that allows developers to interact with Elasticsearch using any programming language – including Python.

In this article, I show some essential best practices for using Elasticsearch with Python in any project.

Do you work with Django? Here’s an article on how to use Elasticsearch with Django.

Getting started

To run the examples I show below, you’ll need the Elasticsearch instance first. The easiest way to get it is by running a Docker container with Elasticsearch.

1
2
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node"
docker.elastic.co/elasticsearch/elasticsearch:7.1.0


1. Always set mapping explicitly

In general, setting the mapping is not required. It can be done automatically when inserting the first document. But automatically generated types might not be what you want exactly. Consider this example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from datetime import date, timedelta

from elasticsearch import Elasticsearch

es = Elasticsearch()


doc = {
   'internal_id': 123,
   'name': 'Jazz concert',
   'category': 'concert',
   'where': 'Warsaw',
   'when:': date.today(),
   'duration_hours': 2,
   'is_free': False,
}


es.index(index='events', body=doc)

es.indices.get_mapping(index='events')

In this example, the mapping is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
{
 "events": {
   "mappings": {
     "properties": {
       "category": {
         "type": "text",
         "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
       },
       "duration_hours": {
         "type": "long"
       },
       "internal_id": {
         "type": "long"
       },
       "is_free": {
         "type": "boolean"
       },
       "name": {
         "type": "text",
         "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
       },
       "place": {
         "type": "text",
         "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
       },
       "when:": {
         "type": "date"
       }
     }
   }
 }
}

Inserting the new document will result in a mapper exception:

1
2
3
4
5
6
7
8
9
10
11
12
doc = {
   'internal_id': 124,
   'name': 'Some performance',
   'category': 'performance',
   'where': 'Cracow',
   'when:': f'{date.today()} - {date.today() + timedelta(days=3)}',
   'duration_hours': None,
   'is_free': True,
}


es.index(index='events', body=doc)

Note: The internal_id key can be mapped as a keyword. Numeric fields are optimized for range queries while keywords are better for term queries. It’s not likely this field will be used for range queries so keyword may be a better choice.

2. Use the right library

Python developers can now use an official low-level client for Elasticsearch: elasticsearch-py.

It’s a good idea to use with it simple cases. But if you’re facing a more complex search, it’s better to use elasticsearch-dsl built on this library. It allows writing queries in a more Pythonic way, is less liable to syntax errors, and makes query modification easier.

Look at this simple query written with elasticsearch-py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
es.search(index='events', body={
   "query": {
       "bool": {
           "must": [
               {
                   "match": {
                       "is_free": True
                   }
               },
               {
                   "match": {
                       "category": "performance"
                   }
               }
           ],
           "filter": [
               {
                   "match": {
                       "where": "Kraków"
                   }
               }
           ]
       }
   }
}
                  )

And the equivalent using elasticsearch-dsl:

1
2
3
4
5
6
7
8
9
10
11
from elasticsearch_dsl import Search, Q

Search(using=es, index="events").query(
   Q(
       'bool',
       must=[
           Q('match', is_free=True),
           Q('match', category='performance')
       ],
       filter=[Q('match', where='Kraków')])
)

If you work with Django you can use django-elasticsearch-dsl. It is built on elasticsearch-dsl. In this case you can create model that looks like this:

1
2
3
4
5
class Event(models.Model):
   name = models.CharField()
   category = models.CharField()
   is_free = models.BooleanField()
   where = models.CharField()

And the document:

1
2
3
4
5
6
7
8
9
10
11
12
13
events = Index('events_dj')

@events.doc_type
class EventDocument(DocType):
   class Meta:
       model = Event

       fields = [
           'name',
           'category',
           'is_free',
           'where'
       ]

The query looks similar to the one in elasticsearch-dsl:

1
2
3
4
5
6
7
8
9
EventDocument.search().query(
   Q(
       'bool',
       must=[
           Q('term', is_free=False),
           Q('match', category='concert')
       ],
       filter=[Q('match', where='Warsaw')])
)

The Elasticsearch index is automatically updated when objects are created or deleted. It works thanks to the Django signals post_save and post_delete.

3. Bulk helpers

Performing operations on a massive document set one by one is just inefficient. You’d have to make a request every single time. That’s why it’s smart to use bulk helpers instead.

Here’s how bulk helpers work:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from datetime import date

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from elasticsearch_dsl import Document, Text, Date

es = Elasticsearch()

class EventDocument(Document):
   name = Text
   when = Date

   class Index:
       name = 'events'


docs = []
for i in range(100):
   document = EventDocument(name=f'Sample event {i}', when=date.today())
   docs.append(document.to_dict(include_meta=True))


bulk(es, docs)


4. Take advantage of aliases

Another good practice is referring to aliases rather than directly to indices. That’s because mapping for the existing index fields can’t be updated. You can easily create a new index with new mapping and alias it.

Here’s an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from datetime import date

from elasticsearch_dsl import Document, Text, Date
from elasticsearch import Elasticsearch

es = Elasticsearch()

class EventDocument(Document):
   name = Text
   when = Date

   class Index:
       name = 'events_v1'
       using = es


EventDocument.init()
event = EventDocument(name=f'Some event', when=date.today())
event.save(es)
EventDocument._index.put_alias( name='events_today')


class NewEventDocument(Document):
   name = Text
   when = Text

   class Index:
       name = 'events_v2'
       using = es


NewEventDocument.init()
new_event = NewEventDocument(name=f'New type event', when='May')
new_event.save(es)


NewEventDocument._index.put_alias(name='events_today')
EventDocument._index.delete_alias(name='events_today')

Aliases also come in handy when you need to switch the index with a new one that contains new data.


5. Asciifolding

When storing searchable data with non-latin characters such as “ą,” “č,” or “ė,” it’s a good idea to use the ASCII Folding Token Filter. Note that users tend to write queries without diacritical marks.

If you want to show results for such queries add asciifolding to the analyzer’s filters, here’s how:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from datetime import date

from elasticsearch_dsl import Document, Text, Date, analyzer, Search
from elasticsearch import Elasticsearch

es = Elasticsearch()

folding_analyzer = analyzer('folding_analyzer',
                           tokenizer="standard",
                           filter=["lowercase", "asciifolding"]
                           )

class EventDocument(Document):
   name = Text
   when = Date
   where = Text(analyzer=folding_analyzer)

   class Index:
       name = 'events_v3'
       using = es


EventDocument.init()

event = EventDocument(name='Sample event', when=date.today(), where='Kraków')
event.save()

event._index.refresh()
search = Search(using=es).query("match", where="Krakow")

search.execute()


6. Auto-generated ids

If the indexed document has an explicitly set id, Elasticsearch needs to check whether such an id is already present in the same shard. It’s an expensive operation, especially when your index is big. By using auto-generated ids, you’ll simply skip this step and save some time on your project.


7. Use copy-to for speeding up search

The more fields are included in multi_match query, the slower the search is. That’s why it’s smart to use as few fields as possible. Add the copy_to parameter to field mappings – that’s how you can copy their values to another field that will be used for search.

Here’s how it works:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from elasticsearch_dsl import Document, Text, Search
from elasticsearch import Elasticsearch


es = Elasticsearch()

class EventDocument(Document):
   name = Text(copy_to='search')
   category = Text(copy_to='search')
   where = Text(copy_to='search')

   class Index:
       name = 'events_v4'
       using = es


EventDocument.init()

event = EventDocument(name='Rock band', category='concert', where='Warsaw')

event.save()
event._index.refresh()
search = Search(using=es).query('match', search='Rock band concert Warsaw')
search.execute()

I hope these tips and best practices help you make the most of Elasticsearch in your Python project.

If you’d like to learn more about Python best practices, check out the Python category on our Blog – we publish learning resources , Python and Django tutorials, and step-by-step guides to help the Python community grow.


Useful links:

Karol Kostrzewa
Karol
Backend Engineer

Karol is a Python backend engineer specializing in web development using Python, Django, and Flask. He likes to work on projects that broaden his horizons - and he sure gets plenty of that at Sunscrapers! When not busy coding, Karol enjoys hiking, cycling, and cooking.

Python

How to use Elasticsearch with Django

Are you building a Django application that needs to search through a massive data set? You might be considering to use a standard relational database. But you’ll quickly find [...]

Python Web development

What is Python Used For? Key Benefits and Examples Across 3 Industries

What do Netflix, Facebook, Amazon, Dropbox, and Instagram have in common? Aside from being tech giants, they’re all built with the same core technology: Python. Created almost three decades [...]

Join our newsletter.

Scroll to bottom

Hi there, we use cookies to provide you with an amazing experience on our site. If you continue without changing the settings, we'll assume that you're happy to receive all cookies on the Sunscrapers website. You can change your cookie settings at any time.

Learn more