Which is faster - SQLite or the file system? Performance testing Flask with ApacheBench in a (minimal) real-world scenario.
As per usual, I'm using glue-boy as a test bed for these kind of experiments. I've written about it before, but it's a simple pastebin, designed to use the file-system. From storage to access records, it's super simple and leverages a lot of little UNIX tools for reporting and cleanup.
I got to thinking that I should try porting it to SQLite and do a compare/contrast of any differences. I decided I'd combine my efforts with a recent interest in performance testing and a great fondness for charts.
This is strictly a comparison test of relative performance between the two implementations. The cited numbers are running realistic loads under the production WSGI server, but all of it is running locally. This is intentional, as I do not want to test the network or the computer I'm running it all on. The response times should not be taken as an indicator of real-world performance.
You can check out the full changes made to port the application to sqlite3
on github, things to watch for:
I thought of a several ways to optimize both the current operations as well as how I might improve the SQLite implementation, but resisted the urge to change too much. The goal was a feature-for-feature port of the application with no perceivable changes, which I think I accomplished.
I ran each test instance individually and tried to monitor background activity
to give a clear picture of actual app performance rather than network lag or
background load. I ran the application with gunicorn
configured for 4 workers,
since that is how I run it in the wild, proxied behind Nginx.
ab
Does and Doesn't DoI used ApacheBench for my tests, it was already installed and I didn't see any deal-breakers, relative to some of the alternatives. There are however, things worth mentioning:
ab
can't do localhost
, the network address needs to be 127.0.0.1
POST
data must be base64/url-encoded1To issue 1,000 GET
requests, with 10 simulated concurrent users:
ab -n 1000 -c 10 http://127.0.0.1:8000/content/a8137a
To issue 1,000 POST
requests, with 10 simulated concurrent users:
ab -n 1000 -c 10 -T "application/x-www-form-urlencoded" \
-p post-body.txt http://127.0.0.1:8000/
It is probably worth mentioning, glue-boy has never seen anything near these levels, in total requests or concurrent users. The current system pretty much runs itself and has for over a year at this point.
ab
is pretty limited in what and how it tests, for the data below I was
repeatedly re-requesting the same URL/paste content. I inserted about 200 lines
of junk text from a log file totaling approximately 9kB.
I will admit some surprise at the results in favor of SQLite here, if you checked the code you will have noticed there are two separate SQL execution calls:
And yet, SQLite is about twice as fast as the original implementation which simply reads a text file with the record from disk.
Percentage | File System | SQLite |
---|---|---|
50% | 28 | 30 |
66% | 29 | 31 |
75% | 30 | 32 |
80% | 31 | 32 |
90% | 33 | 34 |
95% | 34 | 36 |
98% | 36 | 39 |
99% | 38 | 46 |
100% | 42 | 65 |
I borrowed heavily2 in creating the charts below, trying to really identify the meaningful information from the benchmark results.
I found the suggested scatterplot to be more informative, but non-intuitive at first. The thing to keep in mind is the x-axis "wall clock" measure is really only relevant for each individual "mode" (indicated by color).
File System | SQLite | |
---|---|---|
min | 7 | 5 |
mean | 66 | 29 |
median | 65 | 28 |
max | 84 | 106 |
I base64 encoded the same junk text file from the GET request, resulting in a
file size of about 12.5kB. The trick to matching ab
's file format was to
specify the form "key" after encoding. In this case the HTML form contains a
textarea called "content", so the file looks like:
content=LyoqDQogKiAgR2VuZXJpYyBIVFRQIERldmljZSB2MS4wLjIwMTYwNDAyDQogKg0KICogIFNv\dXJjZSBjb2RlIGNhbiBiZSBmb3VuZCBoZXJlOiBodHRwczovL2dpdGh1Yi5jb20vSlotU21hcnRUaGlu===
I have to mention my surprise again here, after the GET requests turned out in favor of SQLite I had thought for sure the POST requests would do the same. Instead, the file system did better than SQLite on writes, narrowly, but still better.
Percentage | File System | SQLite |
---|---|---|
50% | 28 | 30 |
66% | 29 | 31 |
75% | 30 | 32 |
80% | 31 | 32 |
90% | 33 | 34 |
95% | 34 | 36 |
98% | 36 | 39 |
99% | 38 | 46 |
100% | 42 | 65 |
File System | SQLite | |
---|---|---|
min | 3 | 14 |
mean | 28 | 31 |
median | 28 | 30 |
max | 42 | 65 |
I haven't yet decided how to proceed with this information, it is a drop-in replacement for the existing system. It is (marginally) faster on reads and nearly matches write speeds, and yet I can't help but feel some reluctance to toss out what has been working without issue for all this time.
The SQLite implementation is certainly more extensible than the plaintext system, with the ability to add database fields as necessary. SQLite also exhibits some curious spiking in the performance tests though, so while it routinely performs better, there are instances where it is dramatically worse. I haven't identified what causes this, probably write-locks.