Git submodules in the python world Why and How

The basic principle that makes many professional tech companies professional is the simple principle of domain engineering. Basically working for a long period of time on a small set of domains with the hope that you will grow your codebase to be more efficient and successful in developing projects from these domains. the main component in this formula is the idea of code reuse.

Sooner or later you will have a certain piece of code that you will use constantly across all your projects, if we are talking about NLP these might be your text normalizers your features extractors or even some utils you have.

Why

There is always the option of copying this piece of code across all of your projects. However, most of the time you will use these modules in the exact same way all over the place and more importantly any modification must be done across all of your projects.

One way to tackle such problems is to use git submodulos, these are basically independent git repos that you import as a part of your project, this means that by converting your most used codebases into subrepos you can have a centralized control over this code and import it with confidence in other projects.

The concept and usage of git submodules is greater than the scope of this article, if you are not familiar with the concept either check the original documentation or check the hundreds of tutorials online.

The goal of this article However is to see how these submodules can be used in python projects and how we can tune Python’s import system to add these modules in a clean way.

How

The usual starting case will be something like this:

BigProject 
  __init__.py
  main_files.py
  subLibrary1
  subLibrary2
  requirements.txt

Here in your big project the subLibrary1 and subLibrary2 are packages that you believe are independent enough to have their own repos and you believe that there is a good chance that you will incorporate them into other “BigProjects” in the future.

So like any sensible person, you create a new repo for each o these libraries and then add these repos as submodules in the BigProject, seems simple No?

Dependency management

However, since these 2 libraries are python packages and since you want them to be self-contained each will need to have it’s own requirements and dependencies, you can accommodate this by directly importing the submodules requirements into the BigProject requirements file as follows

---- BigProject/requirements.txt ----
dep1 
dep2 
...
-r subLibrary1/requirements.txt
-r subLibrary2/requirements.txt

While this might seem like an initial good solution many more issues can arise during the project life cycle. And other better approaches to handle dependency management than to manage different levels of requirements files, if you are interested, check out our piece on dependency management in python.

Imports

Regardless of the way you manage your dependencies, there is still a more important issue to figure out.

The main problem that you will face is that python import system will deal with the submodules not as an installed utility but rather as a part of your code that needs to be added to sys.path any change in the __init__.py files structure will cause agonizing modulo not found errors to pop up.

The simplest way to avoid all of this headache is to convert your submodules into full python packages, you don’t need to have them publicly shared on PyPi but you need to have a setup.py file in each of the submodules. If all of this talk about packaging made no sense to you then you should properly take a little dive in python’s packaging documentation then come back here.

You good? ok let’s keep going.

So ideally your sublibraries will become full flidged packages, once you clone the BigProject repo along with its submodules all you have to do is to use pip or pipenv to install these submodules.

pip install ./subLibrary1/
pip install ./subLibrary2/

This will mean that the import system in python will see these submodules as installed packages and you will get rid of all these modulo not found exceptions

And as an added bounce the packaging will manage the requirements of the submodules out of the box and you can still reap the benefits of git submodules for your version control

Take home notes

If you ever find yourself in a case where you are using git submodules in a python project the advised approach is as follows:

  • convert your independent and reusable sub-packages into git submodules
  • use pipenv to handle dependency management
  • make each of the submodules a python package with it’s own setup.py file
  • install the packages in your virtual environment and import them cleanly

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play: https://play.google.com/store/apps/details?id=io.almeta.almetanewsapp&hl=ar_AR

Leave a Reply

Your email address will not be published. Required fields are marked *