Some incomplete notes on provisioning web stuff in a way that allows for fast feedback without leaving me hamstrung with technical debt at the level of project infrastructure.
While I've written in bits and pieces about how to configure a networking swiss army knife for mostly abstract pieces of separable, scalable pieces and documented how I like using a few pieces of the operating system which require nothing more than knowing they exist to leverage isolation, security and flexibility what I haven't yet done is tie it all together into a single document. This isn't complete but should be a good reminder to myself going forward and save myself searching around the archives.
Analysis paralysis is real; there is a never ending fountain of new things that might make any aspect of development easier, faster, or more attractive to nerds on the internet. Doing nothing is easier than simply picking something and beginning the rest of the work. With the goal of reducing this kind of decision making I'm drawing up some minimal configuration that I can (or already do) copy-paste when starting new projects. My priorities are usually low maintenance, low overhead, and easily replaceable in about that order of importance.
Rather than chase any particular platform or hosting provider and stitching together web-scale technologies into (theoretically) infinitely scalable labyrinths of YAML, I aim to do nearly the simplest possible thing that can work. More than any one application, language, or architecture I think it might be nice to nail down some of the things that have proven important but are often ignored in the early stages. Things for which there are a huge number of options and which provide the kind of siren song that programmers and operations-types alike are so susceptible to sinking months into.
I have lost count of the variety of "deployment strategies" I have
had to deal with: GitOps, artisanal Powershell modules stitching
together MSI files, uberjars scped to VPNed targets,
Debian packages unpacked from tar files uploaded
through a web app via the browser. They all have problems and
shockingly, the problems don't even tend to be so
different. Configuration drift happens and components continue to be
tested in isolation, leading to bugs at the fault lines of packages
that were never intended to be combined.
Burdened with the knowledge that nothing ever really gets better I have resigned myself to a few minimal better (if not best) practices. Applications should probably be packaged with their dependencies in a way that ensures whatever is used in testing is representative of what is used in production. Whether this is a single binary executable, a zipapp, or a container image doesn't seem to matter much. What is important is scripting the provisioning of the lower layers of the operating system. Often neglected, it is a place that can easily drift over time. Prescribing that all interaction setting up the host system be tracked in version control means that things are mostly rediscoverable when questions inevitably arise 5 years into what was supposed to be a proof of concept.
Ansible isn't great but it isn't markedly worse than any other option I've dealt with. I spoke at a local meetup about 10 years ago to advocate for using Ansible (or Chef, Salt, etc.) over doing everything by hand, poorly. It has been funny to watch the rise of kubernetes et al because the number of problems hasn't really changed so much as the kinds of problems. My own professional experience has tracked much of the shift from "do it by hand" to "scripted deployments" to "configuration as code" to the latest "infrastructure as code". I don't really experience fewer issues than I did 10 years ago and after long enough they all seem the same. As a result I wrote a tiny ansible playbook figuring it is mostly documented and doesn't present more issues than doing it any other way.
I wrote previously about using
zipapps to package Python applications. Sharing much of the
spirit of that I've amended the application to record basic
information from every HTTP request and store it in a database.
a database-backed web application
import socket
import sqlite3
import time
import threading
import waitress
thread_local = threading.local()
def get_db():
if not hasattr(thread_local, 'conn'):
thread_local.conn = sqlite3.connect(
'file:/var/lib/wsgi-demo/requests.db?mode=rwc',
uri=True,
isolation_level=None
)
thread_local.conn.execute('pragma journal_mode=WAL')
thread_local.conn.execute('pragma busy_timeout=10000')
thread_local.conn.execute('pragma synchronous=normal')
thread_local.conn.execute('pragma temp_store=memory')
return thread_local.conn
def init_db():
conn = sqlite3.connect('/var/lib/wsgi-demo/requests.db')
conn.execute('''
create table if not exists requests (
id integer primary key autoincrement,
timestamp real not null,
method text not null,
path text not null,
content_length integer not null,
remote_addr text
)
''')
conn.commit()
conn.close()
init_db()
def app(environ, start_response):
content_length = environ.get('CONTENT_LENGTH', None)
if content_length is not None:
content_length = int(content_length)
else:
content_length = 0
body = environ['wsgi.input'].read(content_length)
db_conn = get_db()
db_conn.execute('begin immediate')
try:
db_conn.execute(
'insert into requests (timestamp, method, path, content_length, remote_addr) values (?, ?, ?, ?, ?)',
(time.time(), environ['REQUEST_METHOD'], environ['PATH_INFO'], len(body), environ.get('REMOTE_ADDR'))
)
db_conn.commit()
except:
db_conn.rollback()
raise
content_length_str = str(len(body))
start_response(
'200 OK',
[('Content-Length', content_length_str), ('Content-Type', 'text/plain')]
)
return [body]
def main():
SYSTEMD_FIRST_SOCKET_FD = 3
sockets = [socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)]
waitress.serve(app, sockets=sockets)
The idea isn't that this is a particularly interesting application
but instead serves to characterize performance on an architectural
level. The application uses the waitress WSGI server (which is
minimal but
production-quality) which uses 4 threads by default to respond
to HTTP requests. Due to the multithreaded nature of the application
I'm using a thread local variable to ensure the threads do not share
a connection to the SQLite database. The database is configured for
write ahead logging mode and the write transaction is declared
BEGIN
IMMEDIATE in order to guard
against busy
locks during concurrent HTTP requests. It uses synchronous
Python, SQLite, and every single HTTP request results in a
database write which is nearly pathological for many kinds of
applications.
If the above application is app.py in a hierarchy like this:
.
├── build/
└── src
├── app.py
└── __main__.py
And __main__.py is only this:
import app
if __name__ == '__main__':
app.main()
The "build" step (which is a generous description) looks like this:
$ python -m pip install waitress --target build
$ cp -r src/* build/
$ python -m zipapp build -o wsgi-demo.pyz
With the packaging story being a .pyz file the
deployment is very simple. I used the below playbook to provision
two machines identically in order to run the application and perform
a load test from the second machine:
ansible playbook.yml
- name: Prepare demo machines
hosts: demo
tasks:
- name: Install demo dependencies
ansible.builtin.dnf:
name: firewalld,haproxy,python
state: latest
- name: Download hey: load testing tool
ansible.builtin.get_url:
url: https://storage.googleapis.com/hey-releases/hey_linux_amd64
dest: /usr/local/bin/hey
mode: '0755'
- name: Copy application
ansible.builtin.copy:
src: application/wsgi-demo.pyz
dest: /opt/wsgi-demo.pyz
mode: '0755'
- name: Copy wsgi-demo socket file
ansible.builtin.copy:
src: configuration/wsgi-demo@.socket
dest: /etc/systemd/system/wsgi-demo@.socket
- name: Copy wsgi-demo service file
ansible.builtin.copy:
src: configuration/wsgi-demo@.service
dest: /etc/systemd/system/wsgi-demo@.service
- name: daemon reload
ansible.builtin.systemd:
name: wsgi-demo@.socket
daemon-reload: true
For this case I don't need any fancy routing. I'm confident
something like HAProxy is wildly more capable than my little web
application so I'm only interested in a load test of the python and
SQLite side of things. In this case I just use the port the systemd
socket is listening on since it is within a private network between
the two machines:
wsgi-demo@.socket
[Unit]
Description=socket for wsgi-demo %i
[Socket]
ListenStream=8080
ReusePort=true
Service=wsgi-demo@%i.service
[Install]
WantedBy=sockets.target
The ReusePort option means multiple sockets/services
can be spun up and will automatically share incoming connections
between them. This is what I'm interested in testing out. It
references the service to start:
wsgi-demo@.service
[Unit]
Description=wsgi-demo server %i
Requires=wsgi-demo@%i.socket
[Service]
DynamicUser=true
PrivateNetwork=yes
StateDirectory=wsgi-demo
ExecStart=/usr/bin/python /opt/wsgi-demo.pyz %i
The service declares a StateDirectory in order to persist and share the SQLite database across the instances of the service that will be launched.
That's enough to do my load test, I'll be
using hey
because it is simple and seems to use available resources better
than apache-bench. Each machine is configured identically so I will
pick one to be the load generator and the other will be the
server. This means sshing into the first and starting
the number of sockets/services I intend to test:
# systemctl start wsgi-demo@1.socket
Then sshing to the load generator and starting the test
(in my case the first machine was 10.0.0.2 and the second is
10.0.0.3):
The first test was a relatively high concurrent "user" setting using only 1 WSGI socket and service running:
# hey -z 120s -c 100 http://10.0.0.2:8080/
Summary:
Total: 120.0622 secs
Slowest: 2.5570 secs
Fastest: 0.0230 secs
Average: 0.1739 secs
Requests/sec: 563.2331
Latency distribution:
10%% in 0.0920 secs
25%% in 0.1227 secs
50%% in 0.1626 secs
75%% in 0.2112 secs
90%% in 0.2654 secs
95%% in 0.3063 secs
99%% in 0.3939 secs
Status code distribution:
[200] 67611 responses
Error distribution:
[12] Get "http://10.0.0.2:8080/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
It seems a concurrency level of 100 is too much for this little
application with just one instance running 12 of the requests failed
due to connection limits within waitress. Next up is scaling out
more instances of the application, all still sharing the same port
and SQLite database. Here are 4 instances on a machine with 2 vCPUs,
launched with # systemctl start
wsgi-demo@{2..4}.socket:
# hey -z 120s -c 100 http://10.0.0.2:8080/
Summary:
Total: 120.1134 secs
Slowest: 1.3533 secs
Fastest: 0.0017 secs
Average: 0.0317 secs
Requests/sec: 3154.1115
Latency distribution:
10%% in 0.0134 secs
25%% in 0.0186 secs
50%% in 0.0263 secs
75%% in 0.0379 secs
90%% in 0.0538 secs
95%% in 0.0688 secs
99%% in 0.1134 secs
Status code distribution:
[200] 378851 responses
All of the previous errors have been addressed and no connection limits within waitress are being exceeded any longer. Requests per second have reached more than 3,000.
For this test I used Hetzner, having some limited positive experience with them as a hosting provider. I'm able to buy two servers, create a private network between them, provision both machines with ansible, and then run my two load tests all in about 10 minutes. I'm reasonably impressed with Hetzner and I'm really pleased with the level of complexity in making this whole process reproducible. I was able to verify that "adding" sockets correctly scaled requests between new WSGI server instances and found a likely upper bound on SQLite writes with this application design. While the server with 4 instances of the application, each running 4 threads behind the scenes was able to serve more than 3,000 requests per second I will say it was probably pushing it on the Hetzner instance size I started on. Luckily, I was using the smallest, cheapest server they will rent you with 2 vCPU and 4GB of RAM (around $4 a month). I was able to confirm that scaling the machine up to 16 vCPU and 32GB or RAM allowed some better performance, with concurrency setting on the load testing tool configured to 200 and observed 4,600 requests per second on 8 instances of the WSGI application.
The performance seems to level off around 4,600 requests per second, the machine wasn't taxed by the load of 8 instances of the application but adding more instances did not improve performance further. I think this points to a limit in the performance available with both the architecture and technology choices — Python and a single-writer SQLite have to cap out somewhere. I am reasonably comfortable saying though that 4,600 requests per second on a machine that costs about $20 USD a month is totally sufficient for plenty of use cases. Even more exciting, these requests are really a combination measure of transactions too. If some requests are read-only or don't involve a database query (static assets or cached results) then performance can be expected to go up. I tested the server producing only "200 OK" results with no database and observed more than 20,000 requests per second on a machine with only 2 vCPU and expect further scaling on a larger machine in the absence of write contention on the database.
Load testing in general is usually best suited to actual testing scenarios, forming a question and then measuring a specific outcome. This kind of benchmark is admittedly artificial like so many of the other benchmarks for frameworks but in my defense what I'm testing is whether the specific technology choices prevent what I'll vaguely refer to as reasonable performance. If a write path to the database can be pushed down to hundreds microseconds I think I know well enough to recognize when that level of performance is adequate and when the focus will be better served on application design or even alternate frameworks on the same database.
Just use HAProxy. Having the ability to selectively route traffic
is certainly nice to have and HAProxy makes that easy. The variety
of knobs for tweaking in production gives me a level of confidence
I haven't really experienced with other systems outside of "the
cloud" which brings in a boatload of infrastructure baggage that I
can't maintain on my own. I've written before about a variety of
different HAProxy configurations and their approximate level of
complexity, for just getting started I think this is a perfectly
adequate place to start:
haproxy.cfg
global
maxconn 2000
log /dev/log local0
user haproxy
group haproxy
stats socket /run/admin.sock user haproxy group haproxy mode 660 level admin
defaults
timeout connect 10s
timeout client 30s
timeout server 30s
log global
mode http
option httplog
frontend ingress
bind *:80
use_backend some_default
backend some_default
server app1 127.0.0.1:8080
One thing I might consider as an alternative is to switch the use
of fixed ports at the systemd socket and instead use a Unix domain
socket and have HAProxy route to that in the backend
configuration block. That is one way to further mitigate
unintended network access and keeps the machine quite locked
down. Doing so would look like this:
sandbox@.socket
[Socket]
ListenStream=/run/sandbox/my-app.sock
[Install]
WantedBy=sockets.target
alternate haproxy.cfg
global
maxconn 2000
log /dev/log local0
user haproxy
group haproxy
stats socket /run/admin.sock user haproxy group haproxy mode 660 level admin
defaults
timeout connect 10s
timeout client 30s
timeout server 30s
log global
mode http
option httplog
frontend ingress
bind *:80
use_backend some_default
backend some_default
server some_default unix@/run/sandbox/my-app.sock
In this way HAProxy listens on port 80, drops privileges on launch to the haproxy user, and that user directs traffic to a socket that activates a services to do the actual work.
Guarding against catastrophic failures means having a strategy for backing up data and in the case of SQLite that can be accomplished with Litestream. Since I've looked into it before I'll just copy-paste the working configuration (provided you have either an S3-like object store or an SFTP server):
$ cat /etc/litestream.yml
dbs:
- path: /var/lib/wsgi-demo/requests.db
replicas:
- url: sftp://replicator@alpha/db
key-path: /etc/replicator/replicator.key
That configuration can be used in combination with a systemd service that you can crib from the litestream project repository, you'll want to give it access to the directory with the SQLite file though (here I'm just declaring they share a StateDirectory):
[Unit]
Description=Litestream
[Service]
Restart=always
StateDirectory=wsgi-demo
ExecStart=/usr/bin/litestream replicate
[Install]
WantedBy=multi-user.target
Where does all this leave us? Well, for less than $20 a month you can get a web application with persistent storage, database backups, flexible networking schemes and features like rate limiting. This is all demonstrated on "slow" technologies (Python, SQLite) that I have a history with and know that I can personally deliver useful software on. Moving to faster technologies (compiled languages, database servers, although there's no guarantee you won't see similar performance using postgresql) doesn't present too much of a challenge and with the architecture described here is achievable incrementally. If you imagine switching languages you might make an "API version 2" addressable through HTTP headers or subdomains or path based routing and you can stand that all up alongside this and handle the routing from HAProxy, it'd look nearly identical except for an additional backend and an ACL definition. Doing it this way though means you can cross that bridge when you get to it.
You might imagine moving from zipapps to containers and all that really has to change is the Ansible copy step. Instead of a copy operation you'll instead do something akin to a "docker pull" and "docker run" (or maybe you'll move to using a container registry and auto-update). You might find you need a message queue or pubsub mechanism and all that requires should be a new service definition and some choices around networking. I once described configuring NATS, it can do both. If you decide you need to arbitrarily segment available CPU between your web application and a message broker you can do that too, you don't need a second machine to configure resources.
This is all personal opinion, but it is based on practical experience. I haven't always followed all of these recommendations but I haven't yet regretted these designs. While there are countless ways to overcomplicate things, I think this provides a good foundation to start and hopefully curbs the worst aspects of complexity so prevalent in modern architectures. Scaling problems are bound to arise in the realm of application design, which requires its own set of design principles but these choices shouldn't preclude any of those designs.