This presentation is a part of our weekly talks. This week, our CTO Przemek talked to us about the use of PostgreSQL and JSON in combination with Python.
Check out the content of the presentation below or watch the video (in Polish). Przemek’s presentation slides are available here.
Why use JSON in our databases at all?
Database experts want order in their databases. JSON comes in handy for managing data that has no schema but we still want to include it in our database for later organization.
Sometimes we might have data where the schema is unknown – for example, when we download undocumented data from API. That’s also where JSON can help.
It’s also a good solution for storing documents we need in a single place to avoid expensive joins later on.
In some applications, the user establishes their own schema – for example, the ecommerce context the user may want to define product characteristics. There exist solutions for making that work in a relational database, but doing it in JSON simply makes our lives easier.
PostgreSQL has been introducing novel solutions on a regular basis. Since version 8.2, it introduced the possibility of storing XML and HStore documents in databases.
XML: PostgreSQL just checks whether XML documents are correct and validates them. Users can also search through XML.
HStore: it’s a simple key value storage, both keys and values that end up in HStore become strings.
JSON: available since version 9.2, JSON ended up being quite similar to XML – JSON type is a string and the only thing users can do is validate whether its syntax is correct.
JSONB: available since version 9.4, full implementation of JSON, data is recorded binarily.
HStore vs JSON vs JSONB
Essentially, these 3 data types are continuous steps of the evolution of PostgreSQL’s ability to store unstructured data. There are several thing you should consider when using either of these options.
- Accepts only simple key/value pairs,
- Includes only strings,
- Doesn’t allow nesting (at least not directly, but you can always use another key as value since they’re all strings),
- Allows GiST index which is suitable for traversing hierarchical data.
- Behaves like a text type that validates whether the text has an adequate structure.
- Retains the exact structure of input document, including unsorted and duplicated keys.
- The most sophisticated option out of these three,
- Operates on all JSON data types,
- Allows nesting,
- Allows GIN indexing, suitable for traversing key/value documents.
JSON can be cast to JSONB and vice versa, so choosing between them is also a matter of deciding if you need faster writes (JSON) or faster queries (JSONB).
JSONB is a full JSON implementation that offers a binary data storage. Note that it doesn’t provide developers with a key order or duplicate preservation. There’s also no date type. On the other hand, JSONB allows quick access operations and easy indexing.
SQLAlchemy is a Python library for database service that allows using the three types of data mentioned above. It supports developers in writing JSON/JSONB queries with the help of Python. Have a look at this documentation to learn more about SQLAlchemy.
So how does the situton look like in Django? For a long time, developers had no support for PostgreSQL. There were some libraries available like Django HStore that were external applications. The community created support for PostgreSQL and these data types. At some point, the Django community decided to create something similar to SQLAlchemy. In Django 1.8, we had HStoreField and since 1.9, JSONField which uses JSONB.
Have you got any questions about making PostgreSQL and JSON work with Python?
Leave us a comment; we’re always happy to share our knowledge and start a conversation about best practices in the community.