Time flies quickly, and so do new versions of Python appear. This article will briefly examine migrating your software from Python 2 to Python 3.
The first question you may ask yourself is - why should I bother?
Let’s assume you own or manage a tech startup, and it’s going well. The user base is growing, and so is MRR. So naturally, you get more feature requests and new ideas on improving your product and growing it even more. The biggest problem is resources and time - how to use your tech team most efficiently. And if you do have some investors, you have another thing to worry about - constant growth. Investors care mostly about MRR, so you do too.
There is one thing that people usually do not care about - it’s the upgrades. They usually take time, and such work’s effects are not visible to users and investors.
So why should YOU care about the upgrades?
There are a few good reasons:
If you’re technical, it won’t shock you - the newer the version, the better the chance of fewer vulnerabilities. Naturally, it’s probably a good idea to wait a bit before jumping to the NEWEST version, but on the other hand, it’s still safer to use the current one instead of the old one. Remember, it’s been two years since the sunset of Python 2!
- Time to market
If you worry about time to market, it’s vital to keep your software up-to-date - because it will take less time to make the updates when you spend a little time every sprint on upgrades - instead of waiting for a longer time and spending a lot of time.
Similar to security - there is a much higher chance that new software versions will be more reliable than the old ones. There’s no question about it!
Ultimately, the time you spend continuously upgrading your stack will save you significant money in the long run. Because - let’s face it - you will need to upgrade sooner than later. And it’s better to do it in smaller increments every month than do one large upgrade every year (or even less often).
It’s easier to find engineers who would be willing to work on up-to-date versions rather than some very old legacy code.
Last but not least - we cannot forget about a significant factor - speed. It’s been proven that Python 3 is around 1.2–1.3x faster than its predecessor. I think the numbers speak for themselves in this case!
Now that we know some more business-oriented reasons for migrating your application from Python 2 to Python 3, let’s take a look at the specifics.
What if I need to use Python 2 alongside Python 3?
Usually, it’s hard to migrate every part of your code to Python 3. But luckily, it’s not a problem, and actually, it's a better idea to migrate whatever you can while still leaving some parts with older versions than not upgrading at all. This way, when your dependencies are ready for migration, it will take less time (and let’s remember - time is money).
To make this happen, try migrating the Python 2 code parts to Python 2.7 and worry only about this version. I will not cover keeping older versions like 2.6 or 2.5 - although I will only add that it is possible but not recommended in the worst case. Actually, at this point, given that the support for Python2 as an ongoing project is over, you should basically do anything you can to move EVERYTHING to Python3.
First of all, in your setup.py file, you should clearly specify the version you are supporting - not only the major one (Programming Language :: Python :: 2 :: Only) but also the minor one as well (Let’s assume this will be Programming Language :: Python :: 2.7.)
Before migrating any parts of the code to Python 3, you should really care about test coverage. It is a good idea to have it in general, but in this particular case, it is genuinely vital. Otherwise, it’s just guessing and requires a lot of manual testing. If you’re wondering what the proper amount of tests is, I’d say 80-90% coverage will do the job. To check the test coverage, I can recommend using coverage.py.
Before jumping into actual migration, I strongly recommend learning the differences between Python 2 and Python 3.
Now it’s finally time to upgrade your code. Luckily, you do not have to do everything by hand - there are two (at least) valuable tools that will help you do the job. One of them is Futurize, and the other is Modernize. They both do the same, but in different manners, so it’s essential to choose the proper one for you. The choice depends on your approach to migration. Futurize is better if you are more focused on Python 3, as it tries to make it possible to use Python 3 idioms and practices in Python 2. Modernise uses SIX to provide compatibility and is rather conservative. If you feel like Python 3 is the best choice but are still not perfectly familiar with it, I will use Futurize to get accustomed to it quickly.
Remember that none of those tools will do EVERYTHING for you. After running any of it, there are still things you need to do manually. To get the complete picture, you need to dig deep into the specs of the tool used to know what steps are done by default, which are optional, and which need to be handled manually.
I’ll try to draw your attention to those few things that need to be checked to see if problems can be avoided.
You probably know (but if not, it’s crucial) that the division function has changed slightly in Python 3.
Before the operation it looked like this: 7 / 2 == 3.
Now 7 / 2 = 3.5 - so all divisions between int values result in a float.
If you wish to stay compatible, you can use from future import division to your files and be free to use either / or //. // Will give floor division, while / will return float. Simple, yet important to remember.
Feature detection instead of version detection
Sometimes, you will need to choose an action based on the version of Python you’re running. In this case, you should use feature detection to see if the Python version can actually do what you want to. The good practice is to check the feature against Python 2, not Python 3. I’ll show you a quick example from the official documentation to illustrate why:
Assume that you need to access importlib (it’s widely available in Python 3 and Python 2 when importlib2 is used).
If you’d use this code:
import sys if sys.version_info == 3: from importlib import abc else: from importlib2 import abc
Then you may have a big problem when Python4 arrives. Instead, you can use this code:
import sys if sys.version_info > 2: from importlib import abc else: from importlib2 import abc. Still, the best thing you can do is to use feature detection try: from importlib import abc except ImportError: from importlib2 import abc
This way, you will be fully prepared for the future without worrying about versions.
Text and binary data
In Python 2, str could be used for text and binary data. It was a source of many problems, especially creating APIs for different languages. If you are 100% sure you will be handling either text or binary data, it’s ok. But the problem will appear when you handle both - this cannot be easily automated.
Why? Simple: Not all methods will work with both types of data. The best solution (probably) is encoding and decoding the data. If you receive a text in binary data, you should decode it asap. And if you need to send text as binary, you should encode it as late as possible. In this solution, internally, you will work only with the text data types without worrying about the differences.
Another issue you may face is whether string literals represent text or binary. To have a clear distinction between those two, you should add prefixes.
Another thing is handling files. You have to be careful to add b/rb whenever opening a binary file. Why? Because in Python 3, binary and text files are not compatible, and they have a clear distinction. You should also try using io.open() as it is compatible across Python 2.7 and 3.
Constructors of str and bytes differ in semantics (using the same arguments) between Python 2.7 and Python 3.
Take a look at the code:
In Python 2.7 bytes(2) == '2', whereas in Python 3 bytes(2) == b'\x00\x00'
The same goes for string:
Python 2 will give you the string back: str(b'2') == b'2' But in Python 3: str(b'2') == "b'2'" you will get a string representation of the bytes object.
The last thing to remember is indexing binary data:
Python 2: b'123' == b'2' But in Python 3: b'123' == 50
The reason is simple - Python 3 will return the integer value for the byte you index on. Remember, in Python 2 bytes == str, indexing returns a one-item slice of it.
To prevent any dependencies blocking you from running Python 3, we recommend the following:
- Any new modules should have the following code at the top:
from future import absolute_import
from future import division
from future import print_function
- Run PyLint with the –py3k flag to receive a warning whenever your code differs from Python3. You need to support Python 2.7 and Python 3.4 or newer to run this.
Check your dependencies
Once you have migrated your code, it’s time to check which dependencies are blocking you.
There is actually a little project, caniusepython3, that will help you do the job. You can choose either the CMD version or the web interface. They also give you a code that can be implemented into your test suite to help you control it automatically.
In the end, you should update your setup.py to say that you support Python 3 explicitly: Programming Language :: Python :: 3
Additionally, you should implement TOX into your continuous integration suite to ensure that your code can run under different python versions.
I recommend using the -bb flag with your python interpreter to easily find parts of code where you compare bytes to string or bytes to int. You will get a pleasant exception whenever you do it, so it’s easy to catch - while without it, it’s almost impossible to find.
Seem pretty a lot, right? As I wrote in the beginning, keeping your code up-to-date is vital. Then any transitions are easier to do. And if all this is a bit overwhelming, consider hiring us to help you get the job done. We are not afraid to perform such migrations, and what’s more - we have the necessary experience and skill set to do it without any pain.
We also know very well that migrating to Python 3 is NO LONGER OPTIONAL. It’s a MUST-HAVE for anyone who wants to keep their software safe. That is why we always urge our clients to make the transition as soon as possible and only then work on further phases of the projects.