Notes on Provisioning a New Project

2026-02-14

Some incomplete notes on provisioning web stuff in a way that allows for fast feedback without leaving me hamstrung with technical debt at the level of project infrastructure.

While I've written in bits and pieces about how to configure a networking swiss army knife for mostly abstract pieces of separable, scalable pieces and documented how I like using a few pieces of the operating system which require nothing more than knowing they exist to leverage isolation, security and flexibility what I haven't yet done is tie it all together into a single document. This isn't complete but should be a good reminder to myself going forward and save myself searching around the archives.

Analysis paralysis is real; there is a never ending fountain of new things that might make any aspect of development easier, faster, or more attractive to nerds on the internet. Doing nothing is easier than simply picking something and beginning the rest of the work. With the goal of reducing this kind of decision making I'm drawing up some minimal configuration that I can (or already do) copy-paste when starting new projects. My priorities are usually low maintenance, low overhead, and easily replaceable in about that order of importance.

Rather than chase any particular platform or hosting provider and stitching together web-scale technologies into (theoretically) infinitely scalable labyrinths of YAML, I aim to do nearly the simplest possible thing that can work. More than any one application, language, or architecture I think it might be nice to nail down some of the things that have proven important but are often ignored in the early stages. Things for which there are a huge number of options and which provide the kind of siren song that programmers and operations-types alike are so susceptible to sinking months into.

Provisioning

I have lost count of the variety of "deployment strategies" I have had to deal with: GitOps, artisanal Powershell modules stitching together MSI files, uberjars scped to VPNed targets, Debian packages unpacked from tar files uploaded through a web app via the browser. They all have problems and shockingly, the problems don't even tend to be so different. Configuration drift happens and components continue to be tested in isolation, leading to bugs at the fault lines of packages that were never intended to be combined.

Burdened with the knowledge that nothing ever really gets better I have resigned myself to a few minimal better (if not best) practices. Applications should probably be packaged with their dependencies in a way that ensures whatever is used in testing is representative of what is used in production. Whether this is a single binary executable, a zipapp, or a container image doesn't seem to matter much. What is important is scripting the provisioning of the lower layers of the operating system. Often neglected, it is a place that can easily drift over time. Prescribing that all interaction setting up the host system be tracked in version control means that things are mostly rediscoverable when questions inevitably arise 5 years into what was supposed to be a proof of concept.

Ansible isn't great but it isn't markedly worse than any other option I've dealt with. I spoke at a local meetup about 10 years ago to advocate for using Ansible (or Chef, Salt, etc.) over doing everything by hand, poorly. It has been funny to watch the rise of kubernetes et al because the number of problems hasn't really changed so much as the kinds of problems. My own professional experience has tracked much of the shift from "do it by hand" to "scripted deployments" to "configuration as code" to the latest "infrastructure as code". I don't really experience fewer issues than I did 10 years ago and after long enough they all seem the same. As a result I wrote a tiny ansible playbook figuring it is mostly documented and doesn't present more issues than doing it any other way.

A Modest Example of a Web Application

I wrote previously about using zipapps to package Python applications. Sharing much of the spirit of that I've amended the application to record basic information from every HTTP request and store it in a database.

a database-backed web application

import socket
import sqlite3
import time
import threading
import waitress

thread_local = threading.local()

def get_db():
    if not hasattr(thread_local, 'conn'):
        thread_local.conn = sqlite3.connect(
            'file:/var/lib/wsgi-demo/requests.db?mode=rwc',
            uri=True,
            isolation_level=None
        )
        thread_local.conn.execute('pragma journal_mode=WAL')
        thread_local.conn.execute('pragma busy_timeout=10000')
        thread_local.conn.execute('pragma synchronous=normal')
        thread_local.conn.execute('pragma temp_store=memory')
    return thread_local.conn

def init_db():
    conn = sqlite3.connect('/var/lib/wsgi-demo/requests.db')
    conn.execute('''
        create table if not exists requests (
            id integer primary key autoincrement,
            timestamp real not null,
            method text not null,
            path text not null,
            content_length integer not null,
            remote_addr text
        )
    ''')
    conn.commit()
    conn.close()

init_db()

def app(environ, start_response):
    content_length = environ.get('CONTENT_LENGTH', None)
    if content_length is not None:
        content_length = int(content_length)
    else:
        content_length = 0

    body = environ['wsgi.input'].read(content_length)

    db_conn = get_db()
    db_conn.execute('begin immediate')
    try:
        db_conn.execute(
            'insert into requests (timestamp, method, path, content_length, remote_addr) values (?, ?, ?, ?, ?)',
            (time.time(), environ['REQUEST_METHOD'], environ['PATH_INFO'], len(body), environ.get('REMOTE_ADDR'))
        )
        db_conn.commit()
    except:
        db_conn.rollback()
        raise

    content_length_str = str(len(body))
    start_response(
        '200 OK',
        [('Content-Length', content_length_str), ('Content-Type', 'text/plain')]
    )
    return [body]

def main():
    SYSTEMD_FIRST_SOCKET_FD = 3
    sockets = [socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)]
    waitress.serve(app, sockets=sockets)

The idea isn't that this is a particularly interesting application but instead serves to characterize performance on an architectural level. The application uses the waitress WSGI server (which is minimal but production-quality) which uses 4 threads by default to respond to HTTP requests. Due to the multithreaded nature of the application I'm using a thread local variable to ensure the threads do not share a connection to the SQLite database. The database is configured for write ahead logging mode and the write transaction is declared BEGIN IMMEDIATE in order to guard against busy locks during concurrent HTTP requests. It uses synchronous Python, SQLite, and every single HTTP request results in a database write which is nearly pathological for many kinds of applications.

If the above application is app.py in a hierarchy like this:

.
├── build/
└── src
    ├── app.py
    └── __main__.py

And __main__.py is only this:

import app

if __name__ == '__main__':
    app.main()

The "build" step (which is a generous description) looks like this:

$ python -m pip install waitress --target build
$ cp -r src/* build/
$ python -m zipapp build -o wsgi-demo.pyz

The Ansible Playbook

With the packaging story being a .pyz file the deployment is very simple. I used the below playbook to provision two machines identically in order to run the application and perform a load test from the second machine:

ansible playbook.yml

- name: Prepare demo machines
  hosts: demo
  tasks:
   - name: Install demo dependencies
     ansible.builtin.dnf:
       name: firewalld,haproxy,python
       state: latest

   - name: Download hey: load testing tool
     ansible.builtin.get_url:
       url: https://storage.googleapis.com/hey-releases/hey_linux_amd64
       dest: /usr/local/bin/hey
       mode: '0755'

   - name: Copy application
     ansible.builtin.copy:
       src: application/wsgi-demo.pyz
       dest: /opt/wsgi-demo.pyz
       mode: '0755'

   - name: Copy wsgi-demo socket file
     ansible.builtin.copy:
       src: configuration/wsgi-demo@.socket
       dest: /etc/systemd/system/wsgi-demo@.socket

   - name: Copy wsgi-demo service file
     ansible.builtin.copy:
       src: configuration/wsgi-demo@.service
       dest: /etc/systemd/system/wsgi-demo@.service

   - name: daemon reload
     ansible.builtin.systemd:
       name: wsgi-demo@.socket
       daemon-reload: true

For this case I don't need any fancy routing. I'm confident something like HAProxy is wildly more capable than my little web application so I'm only interested in a load test of the python and SQLite side of things. In this case I just use the port the systemd socket is listening on since it is within a private network between the two machines:

wsgi-demo@.socket

[Unit]
Description=socket for wsgi-demo %i

[Socket]
ListenStream=8080
ReusePort=true
Service=wsgi-demo@%i.service

[Install]
WantedBy=sockets.target

The ReusePort option means multiple sockets/services can be spun up and will automatically share incoming connections between them. This is what I'm interested in testing out. It references the service to start:

wsgi-demo@.service

[Unit]
Description=wsgi-demo server %i
Requires=wsgi-demo@%i.socket

[Service]
DynamicUser=true
PrivateNetwork=yes
StateDirectory=wsgi-demo
ExecStart=/usr/bin/python /opt/wsgi-demo.pyz %i

The service declares a StateDirectory in order to persist and share the SQLite database across the instances of the service that will be launched.

Load Testing It

That's enough to do my load test, I'll be using hey because it is simple and seems to use available resources better than apache-bench. Each machine is configured identically so I will pick one to be the load generator and the other will be the server. This means sshing into the first and starting the number of sockets/services I intend to test:

# systemctl start wsgi-demo@1.socket

Then sshing to the load generator and starting the test (in my case the first machine was 10.0.0.2 and the second is 10.0.0.3):

The first test was a relatively high concurrent "user" setting using only 1 WSGI socket and service running:

# hey -z 120s -c 100 http://10.0.0.2:8080/

Summary:
  Total:        120.0622 secs
  Slowest:      2.5570 secs
  Fastest:      0.0230 secs
  Average:      0.1739 secs
  Requests/sec: 563.2331

Latency distribution:
  10%% in 0.0920 secs
  25%% in 0.1227 secs
  50%% in 0.1626 secs
  75%% in 0.2112 secs
  90%% in 0.2654 secs
  95%% in 0.3063 secs
  99%% in 0.3939 secs

Status code distribution:
  [200] 67611 responses

Error distribution:
  [12]  Get "http://10.0.0.2:8080/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

It seems a concurrency level of 100 is too much for this little application with just one instance running 12 of the requests failed due to connection limits within waitress. Next up is scaling out more instances of the application, all still sharing the same port and SQLite database. Here are 4 instances on a machine with 2 vCPUs, launched with # systemctl start wsgi-demo@{2..4}.socket:

# hey -z 120s -c 100 http://10.0.0.2:8080/

Summary:
  Total:        120.1134 secs
  Slowest:      1.3533 secs
  Fastest:      0.0017 secs
  Average:      0.0317 secs
  Requests/sec: 3154.1115

Latency distribution:
  10%% in 0.0134 secs
  25%% in 0.0186 secs
  50%% in 0.0263 secs
  75%% in 0.0379 secs
  90%% in 0.0538 secs
  95%% in 0.0688 secs
  99%% in 0.1134 secs

Status code distribution:
  [200] 378851 responses

All of the previous errors have been addressed and no connection limits within waitress are being exceeded any longer. Requests per second have reached more than 3,000.

For this test I used Hetzner, having some limited positive experience with them as a hosting provider. I'm able to buy two servers, create a private network between them, provision both machines with ansible, and then run my two load tests all in about 10 minutes. I'm reasonably impressed with Hetzner and I'm really pleased with the level of complexity in making this whole process reproducible. I was able to verify that "adding" sockets correctly scaled requests between new WSGI server instances and found a likely upper bound on SQLite writes with this application design. While the server with 4 instances of the application, each running 4 threads behind the scenes was able to serve more than 3,000 requests per second I will say it was probably pushing it on the Hetzner instance size I started on. Luckily, I was using the smallest, cheapest server they will rent you with 2 vCPU and 4GB of RAM (around $4 a month). I was able to confirm that scaling the machine up to 16 vCPU and 32GB or RAM allowed some better performance, with concurrency setting on the load testing tool configured to 200 and observed 4,600 requests per second on 8 instances of the WSGI application.

The performance seems to level off around 4,600 requests per second, the machine wasn't taxed by the load of 8 instances of the application but adding more instances did not improve performance further. I think this points to a limit in the performance available with both the architecture and technology choices — Python and a single-writer SQLite have to cap out somewhere. I am reasonably comfortable saying though that 4,600 requests per second on a machine that costs about $20 USD a month is totally sufficient for plenty of use cases. Even more exciting, these requests are really a combination measure of transactions too. If some requests are read-only or don't involve a database query (static assets or cached results) then performance can be expected to go up. I tested the server producing only "200 OK" results with no database and observed more than 20,000 requests per second on a machine with only 2 vCPU and expect further scaling on a larger machine in the absence of write contention on the database.

Load testing in general is usually best suited to actual testing scenarios, forming a question and then measuring a specific outcome. This kind of benchmark is admittedly artificial like so many of the other benchmarks for frameworks but in my defense what I'm testing is whether the specific technology choices prevent what I'll vaguely refer to as reasonable performance. If a write path to the database can be pushed down to hundreds microseconds I think I know well enough to recognize when that level of performance is adequate and when the focus will be better served on application design or even alternate frameworks on the same database.

Rouding Out the Architecture

Routing

Just use HAProxy. Having the ability to selectively route traffic is certainly nice to have and HAProxy makes that easy. The variety of knobs for tweaking in production gives me a level of confidence I haven't really experienced with other systems outside of "the cloud" which brings in a boatload of infrastructure baggage that I can't maintain on my own. I've written before about a variety of different HAProxy configurations and their approximate level of complexity, for just getting started I think this is a perfectly adequate place to start:

haproxy.cfg

global
    maxconn 2000
    log /dev/log local0
    user haproxy
    group haproxy
    stats socket /run/admin.sock user haproxy group haproxy mode 660 level admin

defaults
    timeout connect 10s
    timeout client 30s
    timeout server 30s
    log global
    mode http
    option httplog

frontend ingress
    bind *:80
    use_backend some_default

backend some_default
    server app1 127.0.0.1:8080

One thing I might consider as an alternative is to switch the use of fixed ports at the systemd socket and instead use a Unix domain socket and have HAProxy route to that in the backend configuration block. That is one way to further mitigate unintended network access and keeps the machine quite locked down. Doing so would look like this:

sandbox@.socket


[Socket]
ListenStream=/run/sandbox/my-app.sock

[Install]
WantedBy=sockets.target

alternate haproxy.cfg

global
    maxconn 2000
    log /dev/log local0
    user haproxy
    group haproxy
    stats socket /run/admin.sock user haproxy group haproxy mode 660 level admin

defaults
    timeout connect 10s
    timeout client 30s
    timeout server 30s
    log global
    mode http
    option httplog

frontend ingress
    bind *:80
    use_backend some_default

backend some_default
    server some_default unix@/run/sandbox/my-app.sock

In this way HAProxy listens on port 80, drops privileges on launch to the haproxy user, and that user directs traffic to a socket that activates a services to do the actual work.

Database Backups

Guarding against catastrophic failures means having a strategy for backing up data and in the case of SQLite that can be accomplished with Litestream. Since I've looked into it before I'll just copy-paste the working configuration (provided you have either an S3-like object store or an SFTP server):

$ cat /etc/litestream.yml
dbs:
  - path: /var/lib/wsgi-demo/requests.db
    replicas:
      - url: sftp://replicator@alpha/db
        key-path: /etc/replicator/replicator.key

That configuration can be used in combination with a systemd service that you can crib from the litestream project repository, you'll want to give it access to the directory with the SQLite file though (here I'm just declaring they share a StateDirectory):

[Unit]
Description=Litestream

[Service]
Restart=always
StateDirectory=wsgi-demo
ExecStart=/usr/bin/litestream replicate

[Install]
WantedBy=multi-user.target

In Summary

Where does all this leave us? Well, for less than $20 a month you can get a web application with persistent storage, database backups, flexible networking schemes and features like rate limiting. This is all demonstrated on "slow" technologies (Python, SQLite) that I have a history with and know that I can personally deliver useful software on. Moving to faster technologies (compiled languages, database servers, although there's no guarantee you won't see similar performance using postgresql) doesn't present too much of a challenge and with the architecture described here is achievable incrementally. If you imagine switching languages you might make an "API version 2" addressable through HTTP headers or subdomains or path based routing and you can stand that all up alongside this and handle the routing from HAProxy, it'd look nearly identical except for an additional backend and an ACL definition. Doing it this way though means you can cross that bridge when you get to it.

You might imagine moving from zipapps to containers and all that really has to change is the Ansible copy step. Instead of a copy operation you'll instead do something akin to a "docker pull" and "docker run" (or maybe you'll move to using a container registry and auto-update). You might find you need a message queue or pubsub mechanism and all that requires should be a new service definition and some choices around networking. I once described configuring NATS, it can do both. If you decide you need to arbitrarily segment available CPU between your web application and a message broker you can do that too, you don't need a second machine to configure resources.

This is all personal opinion, but it is based on practical experience. I haven't always followed all of these recommendations but I haven't yet regretted these designs. While there are countless ways to overcomplicate things, I think this provides a good foundation to start and hopefully curbs the worst aspects of complexity so prevalent in modern architectures. Scaling problems are bound to arise in the realm of application design, which requires its own set of design principles but these choices shouldn't preclude any of those designs.