zipapps for fun

2023-06-25

While trying to document how I have been provisioning my own servers lately I realized what I really wanted was a minimal application to include in the example. I tend to get container fatigue with the amount of moving pieces to maintain with Docker, Podman, Kubernetes, etc. so I went looking for something simpler. What I found was Python's native zipapp.

My motivation is to demonstrate a real-enough application in the context of process management, system security, and network management. That alone feels broad enough that I don't want to bog things down with too many complexities of an example application or any one particular application format. I could have reused a container or static binary but instead I settled on a WSGI application running under a real WSGI server (as opposed to something like a Flask or Django development server). This is pretty close to demonstrative of a real workload and fits on a single screen.

I've written before about my order of preferences for deploying applications and I ranked copying Python's virtual environments around pretty low on the list of ideal candidates. Even worse though is the idea of making the deploy target do things like pip install my_application. It isn't really difficult to make a container image for this case but it is everything that follows that becomes annoying. Unless I'm going to recreate my issues with pip installing a package by requiring the deployment machine build container images I have to configure an account on a registry or work out how to copy images between the build and target machine myself.

My own experience makes me think whatever I write would quickly become out of date and I wouldn't learn much doing it again. Instead I'd like to learn something new about potential alternatives for deploying Python. What I will be deploying is the most basic WSGI server you can imagine: a 200 OK server in a file __main__.py

import waitress

def app(environ, start_response):
    content_length = environ.get('CONTENT_LENGTH', None)
    if content_length is not None:
        content_length = int(content_length)
    body = environ['wsgi.input'].read(content_length)
    content_length = str(len(body))
    start_response(
        '200 OK',
        [('Content-Length', content_length), ('Content-Type', 'text/plain')]
    )
    return [body]

if __name__ == '__main__':
    waitress.serve(app)

Here I am going to use waitress because I like how small the entire library is; It has no dependencies outside of the standard library and that kind of thing remains exciting to me. That file is the only file in a directory demo. In order to package the requirements (here just waitress) alongside it I can use the --target flag to pip:

$ python -m pip install waitress --target demo

The result is a directory tree that looks like this:

demo/
├── bin
│   └── waitress-serve
├── __main__.py
├── waitress
│   ├── adjustments.py
│   ├── buffers.py
│   ├── channel.py
│   ├── compat.py
│   ├── __init__.py
│   ├── __main__.py
│   ├── parser.py
│   ├── proxy_headers.py
│   ├── receiver.py
│   ├── rfc7230.py
│   ├── runner.py
│   ├── server.py
│   ├── task.py
│   ├── trigger.py
│   ├── utilities.py
│   └── wasyncore.py
└── waitress-2.1.2.dist-info
    ├── entry_points.txt
    ├── INSTALLER
    ├── LICENSE.txt
    ├── METADATA
    ├── RECORD
    ├── REQUESTED
    ├── top_level.txt
    └── WHEEL

With a __main__.py entrypoint for the zipapp module and all of my (one) requirements prepared it is possible to bundle everything into a single zip file like this:

$ python -m zipapp --output wsgi-demo.pyz demo

The result is a zip file that the Python interpreter can execute which includes the bundled dependencies in a format that requires no additional work to configure paths or environments. Of course because the interpreter isn't bundled with the zip file there is the potential to write or bundle code that is incompatible with the deployment system's version of python. Along with my preference toward the standard library I like using stable releases and simply testing for backwards compatibility. In this case I packaged the above on my laptop using Python 3.11 but tested it all the way back¹ to Python 3.7 which is going end-of-life in two days.

For my own example I will just use the deploy server's version of Python rather than a container for the sake of simplicity. I'm pleased with the flexibility here though; Whether the goal is to provision a known-good target environment and rely on having an LTS release of the runtime or to separate the application from the runtime environment in a container the ziapp seems to mostly stay out of the way.

Better Host Integration

One of the motivations for investigating alternative deployment options has been how stilted it can feel to drop containers into my existing workflow. Many container technologies don't really play nicely with the host operating system capabilities and instead want to dictate how things are done in the (probably justified) name of uniformity. For example, I like running services under dedicated user accounts with limited permissions. This is a capability built into most Linux distributions via systemd and I am comfortable with the work involved. Container runtimes provide similar functionality but it does not integrate in the same way and often instead relies on the container daemon having root level access to achieve the same results. It is possible, especially with Podman, to use "rootless" containers but they do not integrate with the init system cleanly and I find them harder to manage as a result.

Here then is an opportunity for trying out my zipapp in a workflow that I prefer. I have written previously about how systemd-socket-proxy can be used as a shim to socket activate services that were not built with socket activation in mind. With this blindingly simple demo though it is possible to build in socket activation - I'm trying to demonstrate how I would like to do things after all.

First, I'll change my WSGI application so that rather than binding ports it receives a file descriptor from the process management system (systemd). This is possible in a number of different servers but I like how simple it is under waitress:

import socket
import waitress

def app(environ, start_response):
    content_length = environ.get('CONTENT_LENGTH', None)
    if content_length is not None:
        content_length = int(content_length)
    body = environ['wsgi.input'].read(content_length)
    content_length = str(len(body))
    start_response(
        '200 OK',
        [('Content-Length', content_length), ('Content-Type', 'text/plain')]
    )
    return [body]


if __name__ == '__main__':
    SYSTEMD_FIRST_SOCKET_FD = 3
    sockets = [socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)]
    waitress.serve(app, sockets=sockets)

Where before it was possible to execute the server directly it now requires sockets to be prepared before starting. This is a boon for deployment but might be annoying for development. Not to worry though the systemd developers planned for this case too and just like systemd-socket-proxy allows for simulating socket activation, systemd-socket-active does basically the reverse. It will let you bind a socket and then invoke your program as though it were called by the init system. It sounds more complicated than it is, previously the zipapp was run like this:

$ python wsgi-demo.pyz

It is now run like this for socket activation:

$ systemd-socket-activate -l '127.0.0.1:8080' python wsgi-demo.pyz

Of course that is only for development and testing. The deploy process will be to define a systemd socket (wsgi-demo.socket):

[Socket]
ListenStream=8080

[Install]
WantedBy=sockets.target

Which will have an associated service (wsgi-demo.service):

[Unit]
Requires=wsgi-demo.socket
After=wsgi-demo.socket

[Service]
DynamicUser=true
PrivateNetwork=yes
ExecStart=/usr/bin/python /opt/wsgi-demo.pyz

Now my WSGI server has exceedingly tight restrictions on how and what it can access. It has no network access, cannot write to the host system, etc. Additional capabilities can be added slowly and as required which is a much better feeling than the one I get simply exposing ports out of a docker container. Perhaps more exciting though is how this lends itself to things like zero downtime upgrades or how much easier it becomes to scale up WSGI servers now that they are no longer managing their own ports. Some servers allow for a configurable number of workers but I have not yet seen one that allows the workers to be scaled up without restart. With systemd units it is possible to template the service file and as needed launch new services using the same port. This will generally require the application be stateless but can better distribute work across multiple CPUs without requiring changes to the application. The necessary configuration looks like this:

socket:

[Unit]
Description=socket for wsgi-demo %i

[Socket]
ListenStream=8080
ReusePort=true
Service=wsgi-demo@%i.service

[Install]
WantedBy=sockets.target

service:

[Unit]
Description=wsgi-demo server %i
Requires=wsgi-demo@%i.socket

[Service]
DynamicUser=true
PrivateNetwork=yes
ExecStart=/usr/bin/python /opt/wsgi-demo.pyz

With that much done it is possible to start 4 distinct sockets which will start 4 python processes by doing: systemctl start wsgi-demo@{1..4}.socket

Thoughts

The way I see it there are a few potential negatives with zipapps, one already discussed. The interpreter is not bundled and this invites the chance that the interpreter running the zipapp does not support some feature of the langauage used. The second is that zipapps cannot package C extensions. For most of my uses this does not matter. The linked page describes how to work around this but the level of effort is probably close to that of just using containers. The last hurdle I see is that the workflow is sufficiently different that it would probably be annoying to introduce in a team setting. I can easily imagine a less experienced person accidentally checking the entirety of the installed requirements into source control because of the way you can --target the local directory so easily. It seems the correct way to build would be with a temporary build directory that copies the application source and then installs requirements into it, rather than installing into your source directory directly. It is easy enough for me to imagine how to do this but it is one more thing to explain in a team setting.

For my own uses I'm pleasantly surprised with how simple zipapps have made experimenting with different architectures choices like comparing HAProxy roundrobin load balancing with reused ports with a single systemd socket. I will probably use them in the future simply to avoid pulling in ever more dependencies.

I said I would be avoiding containers for this post but they do have value in cases like this. Rather than futz with installing old versions of python to run my zipapp it is possible to use podman to pull old versions and point them at the zip file in a volume mount:
```
podman run -it --rm -p 8080:8080 -v $PWD:/demo:z python:3.7-slim python /demo/wsgi-demo.pyz
```