Dependency Management

Packaging and Dependency Management in Python

Before we talk about Dependency Management, Python is Awesome, it simply is, but in my opinion what truly makes Python a great language is not the Syntax structure or the dynamic nature or any other of these features, but rather the community the fact that whenever you are planning to implement a task, the first thing to do is to check if there is already a cool package (with a couple of thousands of stars probably) on PYPI that does exactly what you want.
It is obvious that package management is part of most fast-growing programming languages and frameworks.

If you are new to the task you are doing, your plans to spend 10 days implementing a library turns into 2 days of learning another, already robust one.

We all love to import! But with great resources comes great responsibility, and as soon as you start thinking about deploying your code, all of these import statements will become dependencies where each has its own version and even it’s own subdependencies. And to top it off, any small change in the version of any of these libraries can break your code.

This introduces us to the topic of dependency management in python and how can we deploy a project to a remote location and make sure it works.

This is our first article on Software deployment in Python. Hope you enjoy this series and make sure to check out our other articles

The Old Ways

By far the de-facto method to manage dependencies in Python is to use virtual environments and requirements files. The process roughly goes as follows:

  1. At the start of a new project before you have written even a single line of code you would start by creating a virtual environment, this can be thought of as a sandbox for your project, the exact syntax of that depends on your virtual environment e.g. (Venv vs Conda). In the case of python’s Venv, for instance, it would look something like this:
    python -m venv name_of_envireonment
    Then you would need to activate this environment
    source ./name_of_envireonment/bin/activate
  2. At this point your sandbox is in place, this sandbox will contain all of your new packages and will shield the changes you make in your local code from those in other projects, now all you have to do is to use pip to install your favorite packages
    (name_of_envireonment) pip install flask
    so whenever you want to use your code you would just activate the environment and start hacking.
  3. then at some point you will need to share your code, maybe with another programmer through git or maybe you just need to run your awesome app on the deployment environment, at this point you will need to share all of these packages you have installed with the exact version you have to make sure that the code does not break on the other end, so you create your requirements.txt file by exporting the contents of your venv.
    (name_of_envireonment) pip freeze > requirements.txt
    This file will contain all of the packages you have and even their dependencies, here is an example:
    click==6.7
    Flask==0.12.1
    itsdangerous==0.24
    Jinja2==2.10
    MarkupSafe==1.0
    Werkzeug==0.14.1
  4. Finally, on the receiving end after creating an empty virtual environment you will use this requirements.txt file to clone the working environment
    (deployment_envireonment) pip install -r requirements.txt
    Up till this point, this seems cool, consistent and simple. That is why as we said venv and requirements.txt is the standard way to manage your packages.

However, the moment you start leaving your comfort zone goblines starts to pop up.

The Old Pitfalls

There are 3 main problems that you might face when working with pip and requirements.txt, so lets us go through them to learn how we can face (many of) them.

Continuous, Manual Updates of Requirements

Using the requirements.txt we have pinned the exact version of each dependency, and with these pinned dependencies, we can ensure that the packages installed in your production environment match those in your development environment exactly, so your product doesn’t unexpectedly break. This “solution,” unfortunately, opens up a new set of problems.

Now that you’ve specified the exact versions of every third-party package, you are responsible for keeping these versions up to date, even though they’re sub-dependencies

Imagine a new version of Jinja2 was introduced to resolve a security issue, or maybe you have installed a new dependency to build the feature you are working on. Now based on our previous discussion, pip is your friend but now as you have made changes to your virtual environment, it is no longer compatible with the requirements.txt file. The only solution at this point is to redo

(name_of_envireonment) pip freeze > requirements.txt

That may seem fine if you only considered a single example, but remember that every time you do a pull request or deploy a new piece of code, you must make sure that your requirements are updated. Humans make mistakes, and when you do one in this context, you’ll find yourself pausing because all you can think of is the infamous “but it works on my device”

The real question is:

“How do you allow for deterministic builds for your Python project without gaining the responsibility of updating versions of sub-dependencies?”

VS

“How do you allow for deterministic builds for your Python project while also maintain the ability of updating the versions of its sub-dependencies?”

This is not the goal, it is basically about how we can ensure the requirements match between the development and deployment environment by only writing the versions of the main requirements without specifically pinning the versions of the sub-dependencies

Different Dependencies for Different Deployments

Let’s take a very simple example:

So you always make sure to properly document your code, and since you have a sense for aesthetics and don’t want other people using your code to be navigating the actual source, you start using a package like Sphinex (you look like a good lad or lady after all). You install a bunch of themes and subdependencies to get your documentation running.
Other developers using the code will truly appreciate your docs, but do you know who won’t? The deployment server!
Sphinex will be included on a deployment server where it won’t be used at all. It is a documentation package and does not need to take part of your resources. Anyone wanting to see your docs can either view your hosted documentation on sites like readthedocs or your private server, or simply clone it locally and run Sphinx on it.
Testing packages are another great example. You can test your package anywhere, but it is not okay to do so on the deployment server.

This leads us to the main difference between the development and deployment environments. While deployment dependencies should be the bare minimum to make sure your code actually runs, the development environment can have other things that you think is fit for good development experience.

How can we separate these requirements using the old ways you say? It is possible but far from simple. Basically you need to have different requirements files, and your project structure will look something like this

`-- project_root
|-- requirements
|   |-- common.txt
|   |-- dev.txt
|   `-- prod.txt

Where the files represent

  • common.txt
    The common requirement among all environments
  • dev.txt
    Dev-only requirements
  • prod.txt
    Production-only requirements
Now to install a specific version of dependencies you can do the following:
  1. In each specific file, import the common one using -r common.txt
  2. Make the usual call pip install -r requirements/dev.txt

Now before you start celebrating, remember a vital thing: By choosing this route you have lost the power of pip freeze, and, these 3 requirements files will need to be modified by hand if needed. If you do not think a better solution is needed, just imagine what happens when you start changing your virtual environment by installing or updating packages… Yeah, this will get really ugly really fast.

Dependency Resolution

What do I mean by dependency resolution? Let’s say you’ve got a requirements.txt file that looks something like this:

package_a
package_b

Let’s say package_a has a sub-dependency package_c, and package_a requires a specific version of this package: package_c>=1.0. In turn, package_b has the same sub-dependency but needs package_c<=2.0.

Ideally, when you try to install package_a and package_b, the installation tool would look at the requirements for package_c (being >=1.0 and <=2.0) and select a version that fulfils those requirements. You’d hope that the tool resolves the dependencies so that your program works in the end. This is what I mean by “dependency resolution.”

Unfortunately, pip itself doesn’t have real dependency resolution at the moment, but there’s an open issue to support it (by the time of writing this blog, there was a promise of a pretty good update with pip >= 10).

The way pip would handle the above scenario is as follows:

  1. It installs package_a and looks for a version of package_c that fulfils the first requirement (package_c>=1.0).
  2. Pip then installs the latest version of package_c to fulfill that requirement. Let’s say the latest version of package_c is 3.1.
  3. If the version of package_c selected by pip fits the requirements, then it is fine. But if it doesn’t (like in our example), the installation will fail.

The “solution” to this problem is to specify the range required for the sub-dependency (package_c) in the requirements.txt file. That way, pip can resolve this conflict and install a package that meets those requirements:

package_c>=1.0,<=2.0
package_a
package_b

Just like before though, you’re now concerning yourself directly with sub-dependencies (package_c). The issue with this is that if package_a changes their requirement without you knowing, the requirements you specified (package_c>=1.0,<=2.0) may no longer be acceptable, and installation may fail… again. The real problem is that once again, you’re responsible for staying up to date with requirements of sub-dependencies.

Ideally, your installation tool would be smart enough to install packages that meet all the requirements without you explicitly specifying sub-dependency versions.

Enter Pipenv

Is it a pip? is it a venv? No it is pipenv
Pipenv is just like pip but better; let’s first see how the dependency management cycle works here and then why it is better this way.

Installation

Just like anything in python, you can install it with pip as follows:

pip install pipenv

And yes I can see the irony, but hopefully, from now you can replace any pip statement with pipenv

It also introduces two new files, the Pipfile (which is meant to replace requirements.txt) and the Pipfile.lock (which enables deterministic builds).

Pipenv uses pip and virtualenv under the hood but simplifies their usage with a single command line interface. Now let us dig a bit deeper

Creating a Venv

First, spawn a shell in a virtual environment to isolate the development of this app:

pipenv shell

This will create a virtual environment if one doesn’t already exist.
You can force the creation of a Python 2 or 3 environment with the arguments --two and --three respectively, or even something more specific like --python 3.6. Otherwise, Pipenv will use whatever default virtualenv finds.

Installing Packages

Just like with pip, you can simply

pipenv install flask

Or even install from github

pipenv install -e git+https://github.com/requests/requests.git#egg=requests

Note the -e argument above to make the installation editable. Currently, this is required for Pipenv to do sub-dependency resolution.

But here comes the dependency separation stuff. Imagine I want to install pytest only for development

pipenv install pytest --dev

Basically everything you install with pipenv is by default installed into the deployment environment, however, if you want to set something to be installed only for development then you need to specify this, the development environment will have everything in the deployment environment with a bit more

Sharing and Updating Environments

Okay, so let’s say you’ve got everything working in your local development environment and you’re ready to push it to production. To do that, you need to lock your environment so you can ensure you have the same one in production:

pipenv lock

This will create/update your Pipfile.lock, which you’ll never need to (and are never meant to) edit manually. You should always use the generated file.

Once you get your code and Pipfile.lock in your production environment, you should install the last successful environment recorded:

pipenv install --ignore-pipfile

This tells Pipenv to ignore the Pipfile for installation and use what’s in the Pipfile.lock. Given this, Pipenv will create the exact same environment you had when you ran pipenv lock, sub-dependencies and all.

The lock file enables deterministic builds by taking a snapshot of all the versions of packages in an environment (similar to the result of a pip freeze).

Now let’s assume that another developer wants to make some additions to your code. In this situation, they would get the code, including the Pipfile, and use this command:

pipenv install --dev

This installs all the dependencies needed for development, which includes both the regular dependencies and those you specified with the --dev argument during install.

Pipenv pitfalls

Any great tool comes with its own restrictions and issues and it is your job as a developer or a sysadmin to figure out what tool is the best to your particular task, let us go through some scenarios where pipenv my cause you some headache

1 – Packaging and Publishing

We have talked in depth about what pipenv is but let us see what it isn’t: These are great tools that will make your life extremely easier, mainly because they encapsulate the whole packaging process.

However, pipenv covers only one aspect of this process namely handeling dependencies, if you want to convert your library to a PyPi package then you will need to do it the old fashioned way using a setup.py pipenv will not simplify this task for you, even so, the introduction of the Pipefile might make things even harder for you. However, packages like flit can somehow simplify the task.

2 – Raiders of the Lost Virtual environment

One of the minor inconveniences with pipenv is while it will create a virtual environment for your project when you start to isolate from the global environment it will create it somewhere in the home directory (or some weird place on your disk if you are using windows) and not inside your own project, this is by far not optimal and is extremely similar to conda environments but somehow even more messier.

To make matters worst the way of activating this environment is through a separate shell, basically, you need to perform the following command from within the project directory:

pipenv shell

But this not at all optimal, what if I want to activate the virtual environment from another directory? for example to use jupyter notebooks?
Well, there is a workaround. basically by reverting to the old venv command like this:

source $(pipenv --venv)/bin/activate

And while this might seem ugly you can always declare aliases, stop nagging.

3 – Speed

The old ways are fast, pip and venv are certainly fast, however, pipenv and its alternatives basically build upon pip with added dependency resolution, this extra computation costs time, here is the closest we found to a detailed benchmarking of pipenv and one alternative poetry.

If your project needs to periodically reinstall packages and you have strict time constraints then maybe you should reconsider your choices and go along with good old pip.

4 – Orphan Sub-dependencies

Ok so this is a bit complex so stay focused.

Let us assume you want to use a certain package A, using pipenv install you install the package and use it. when this package is installed it might install some sub-dependencies let us call them (c,d,e).

Later on, you find out that there is a better option (package B), so you first uninstall package A, then install package B. but here is the catch, when you uninstalled A you would assume that pipenv would also uninstall any of the sub-packages of A that are not shared by default with other packages, Well guess Again cause it won’t these guys (c, d, and e) will remain.

How can I solve this? simply call pipenv update to re-update your venv using the pipeline file and so uninstall the sub dependencies.

So how often should I do the update? basically every time you are planning to deploy

5 – Problematic Development Process

Well, this issue has discoraged many people out there, basically, during the last year there are some incidents. basically between 2018-03-13 13:21 and 2018-03-14 13:44 (a little over 24 hours), Pipenv had 10 releases, ranging from v11.6.2 to v11.7.3. this can easily break many system using it once it is redeployed.

And it does not help that the main author is battling with mental health issues.

However, since then the process shifted from (OMG they have 10 releases per day these people are crazy) to (well they have a monthly well-documented release this package is well maintained)

Pipenv alternatives

There are currently 2 main competitors alongside pipenv and these are poetry and hatch so let us review the pros and cons of them in comparison with pipenv:

I really Love poetry

poetry tries to cover the whole packaging cycle including dependency management so if this is what you are looking for you should give it a look.

poetry and pipenv accomplish nearly the same goals. However, here is where they differ.

featurepipenvpoetry
strict project structureNopoetry imposes a strict structure similar to maven
speedslowa bit faster
packaging and puplishing NoYes, but the packages uses a special format making them not installable through simple means such as pip, so yes they will tie you to them.

Python’s version of NPM?

Well, this is what I felt when I was reading through it this tool is trying to automate nearly everything, However, by doing so it creates a single point of failure. The project is still new with continuous updates. and while in the future hatch, might become something truly remarkable it might be a better idea to wait and see.

Conclusion

In this article, we tried to cover the basic structure of dependency management in python, where does this structure fails, and what can we do to resolve such issues using pipenv, then we discussed the shortcomings of pipenv, and some newer boys in the yard. if you are interested in pipenv details either check their documentation or read this cheat sheet .

This was a rather long article I know :), well done you have completed it. if you found this piece useful check out our next article on git-submodules and how you can use them in Python. The next article is much shorter I promise.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.

.

Leave a Reply

Your email address will not be published. Required fields are marked *