What's inside
The GraphQL hype train is going full-speed today. Everybody is talking about it, there are tons of tutorials being written, the tooling is rapidly developed, and everybody is considering switching from REST to GraphQL.
In this article, I’m going to show you the main benefits of GraphQL and some of its flaws in practice, on a small example application. That’s the best way to tell whether GraphQL is the right technology choice for your project.
Project setup
First, let's setup the project. Because SQLite can be problematic when we’re altering columns, let's use PostgreSQL. To make things simple, we can take advantage of the “peer” auth where the DB user has the same name as the OS user.
$ sudo -u postgres psql -c 'CREATE user yourusername'
$ sudo -u postgres psql -c 'CREATE DATABASE graphql_demo WITH owner yourusername'
Now, let's configure the backend:
$ git clone INSERT HERE THE NAME OF PROJECT
$ cd graphql-presentation/project virtualenv --python=/usr/bin/python3.6 .virtualenv
$ source .virtualenv/bin/activate
$ cd backend
$ pip install -e . alembic upgrade head
$ psql graphql_demo < sakila_simplified.sql
$ cd ..
$ ./runserver
And in another tab, the frontend:
$ cd frontend
$ yarn codegen
$ yarn start
Advantages of GraphQL
A self-documenting integrated schema
When working with REST APIs, we often encounter poor documentation or documentation which is not up-to-date with the API. There’s are no standards in place for documenting REST APIs, so users can’t benefit from any automated help.
Let's see what the situation looks like with GraphQL. Navigate your browser to
http://localhost:5000/graphql
You’ll see a graphical tool for graphql query design called GraphiQL. You can also install this tool in your system if the API you’re planning to use doesn't provide it.
Now you can start creating your query. Type “query all films {” and hit CTRL-space. The GraphiQL tool will provide code completion for you. You can also hover above the identifiers in your query to get a tooltip. All of that is impossible with REST since there exists no standard as to how the API documentation should be published. Also, note that the documentation is integrated into the API, so there’s no reason to worry that the provided tips are outdated.
Easy to work with nested data
REST offers no canonical way of handling nested data. There exist several methods, but they all come with disadvantages.
The first one is using separate endpoints for every data type which then we link by URLs. This is the simplest way, but it doesn't scale well. For example, if we want to retrieve a movie and data about all of its actors from a database like the one above, we need to make a request to the film endpoint and a separate request to the actor endpoint for every actor. This problem (called N+1 problem) can lead to a severe performance drop of our application. What's more, there is no way to optimize the data retrieval on the SQL level, so we can expect another performance issue between our backend and the database.
Another approach is including the data we need in a nested structure. The problem here is that in a large application, there’s probably a lot of views using an endpoint and all of them have slightly different needs. This leads us to a situation where we dedicate our endpoints for views. For example, we would create a films endpoint and a separate films-with-actors endpoint.
This solution, however, decreases the maintainability of our code. Since there is little decoupling between the backend and frontend, even a trivial task would require work from both the backend and frontend team. You will soon find your backlog cluttered with menial tasks linked to adding fields to views. Your team is going to waste a lot of man-hours maintaining this compatibility.
The third idea is to parametrize your endpoints. Your query could look like this: film/1?with-actors=true or film/1?include=actors. It allows much more flexibility than the previous idea. Frontend developers can now pick the data they need for their view without having to ask their backend colleagues to alter the API every time they want to change something. This is a nice solution, right? Still, what you’re doing here is creating a homegrown query language that probably comes with its flaws. Basically, you’re trying to reinvent GraphQL and will probably do a bad job at that.
Now, let's see how easy it is to add new nested data in GraphQL. First, visit the file /src/compontents/MovieList/query.ts and uncomment the actors part. Now issue yarn codegen in the frontend directory. This will rebuild the file /src/generated/graphql.tsx, so now you can import the types for this file. Uncomment the import, the getActors method, and the Collapsible in the render method in /src/components/MovieList/MovieList.tsx. Now refresh your page. You’ve just made your frontend application load related data without any change on the backend.
Enforceable contract
When working with REST API, what you get from your backend is ”some JSON”. Ensuring that the data sent from the server is compatible with what the frontend can handle is up to the developers. There’s no way to automatically ensure this compatibility and warn the devs if it’s broken. That way, changes to the API could potentially break the frontend views and we would learn about it only in QA phase (or on production, if we’re not that lucky). A simple example is the decision to make a field in your database NULL-able. If your frontend is not prepared to handle a null value, this will cause a crash.
Let's try to emulate this situation in our app. In our scenario, a happy-go-lucky backend developer is tasked with making it possible for the film description to be NULL. Go to the models.py file in the backend and change the nullable argument to True. Now migrate the database, issuing in the backend dir the following:
$ alembic revision --autogenerate
$ alembic migrate head
and rebuild the graphql types by calling in frontend:
$ yarn codegen
Now try to refresh the app in your browser. You’ll see an error. What exactly happened here? If you look at /src/generated/graphql.tsx, you will see that the description became a Maybe<Scalars['String']>. This notation describes a union of string, null, and undefined. This causes an error on the transpilation phase. If we do the transpilation on our CI, the breaking change will never find its way even to the staging.
Disadvantages of GraphQL
N+1 won't just go away
I already mentioned the N+1 problem and how we can get rid of it in our API. What about the database? Let's see. Go to models.py and add echo=True to the create_engine call. Now refresh the application and look into the console where the backend is running. This is just terrible. We've got 21 calls. So the N+1 problem was shifted from the frontend to the backend. And because of how graphql APIs are implemented (using the so-called resolver functions which are responsible for particular fields), optimizing the DB request is quite hard.
But solutions to this problems are under way. We’re going to use the SQLAlchemy-Bulk-lazy-Loader. Execute:
$ pip install SQLAlchemy-bulk-lazy-loader
in the backend. Now uncomment the two lines related to BulkLoader lines in models.py. Lastly – add lazy='bulk' argument to actors relationship. Now, if you refresh the application you’ll see only two queries to the database. This is much better, but still not perfect. What the bulk lazy loader does is detecting when a member of a queryset loads a related object and loads it for the entire queryset. This reduces the number of queries to one per each level of nesting.
Unfortunately, as of the time I’m writing this article, there’s no solution for SQLAlchemy that would allow fetching all the data with JOINs. There exists one for Django and it’s called graphene-django-optimizer. It achieves this goal by analyzing the query's AST before any resolver is called.
No leverage from HTTP
Open your browser's debug tools. Note that the graphql requests to the server are sent using the POST method. In GraphQL, POST is the only HTTP method used - and that comes with some consequences. With REST, every request is marked on the HTTP layer with information about what it does. It’s therefore clear that GET requests are safe and have no side-effects, so their results can be cached, for example. DELETEs and PUTs are idempotent, so they can be safely resent without any additional effect. With GraphQL, every operation is marked as unsafe and uncacheable. Because of this, external caching services cannot recognise requests without side-effects or keep their results.
Relay as the only standard of plural data
GraphQL itself doesn't specify, how nested data should be represented. However, there exists the “Relay specification” which requires the use of structures called “connections.” A connection should contain edges that, in turn, contain nodes. Because this is the only official standard that exists, it became the de facto default standard for GraphQL. Tools like graphene-sqlalchemy use it.
One might argue, however that it’s an overkill for data coming from a simple SQL database. For example, if one of the database queries fails, we’d rather want the entire request to fail than get null edges. So if we use static typing on the frontend (which is a nice idea because it allows us to enforce the contract), we’d need to handle situations that would never arise. In TypeScript, this is simple – we can use the exclamation mark as a non-null assertion. For example, const film = filmEdge!.node!
informs the transpiler that we don’t expect filmEdge or node to be null, despite its type. In other languages (like Elm)
that could, however, require more boilerplate.
In this article, I’m going to show you the main benefits of GraphQL and some of its flaws in practice, on a small
example application.
Summary
All in all, GraphQL is surely a technology worth your consideration. It offers some solid solutions to many problems developers encounter in REST. Still, every technology comes with some flaws. And since GraphQL is much younger than REST, its tools aren’t as mature.
I hope this article helps you decide whether GraphQL is a good solution for your project. Ultimately, your choice depends on the problems you’re dealing with in your application.