Snippets and Stack Traces

2015-01-30

In an effort to get more exposure to the basics of web development I've created glue-boy, a recreation of a web community staple: the pastebin.

A What?

For those unfamiliar, a pastebin is an application that takes text input from a user and makes it available online. They are frequently used on IRC to refer to bits of code or share error messages without flooding the channel. I think pastebins are such an appealingly small project that everyone decides to make their own at one point or another (myself now included). In the interest of research, I've at least token experience with the following:

Each provides a variety of functionality that I decided to ignore for my own implementation, including:

syntax highlighting
line numbering
multi-file uploads (admittedly I've only seen this in GitHub's gists)
replying to pastes
configurable expiration
delete a paste

Wait, so what does it do then?

I realize that's a pretty long list of things that I won't be doing, but I'm not looking to make waves. In the interest of exploring web application development, most of those features aren't particularly interesting (or require a much more involved system than I want to dive right into). I settled on a much more limited feature set, one that I think defines the project:

take textual user input
makes it available online

Standing on the Shoulders of Giants

When I decided on recreating an easy project, I hadn't quite realized how easy I had made things. I used the Python micro-framework Flask for its familiarity and ease of use. For whatever complaints I might have about Flask in the abstract, hacking together a project like this is charmingly easy. I had a working prototype in an evening and in under 50 lines of code. While I didn't know it at the time, Flask's WSGI compliance makes for effortless production deployment.

I began the project with the assumption that I would store pastes in flat files, rather than any kind of database, until it became an issue or I felt that particular itch. The more I've thought about it though the more I find myself thinking flat files are the more correct solution. My application machine, while not under-powered, has limited resources:

    $ cat /proc/cpuinfo
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 62
    model name      : Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
    cpu MHz         : 2400.032
    cache size      : 15360 KB
    fpu             : yes
    bogomips        : 4800.06
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 40 bits physical, 48 bits virtual

    $ free -h
                 total       used       free     shared    buffers     cached
    Mem:          497M       436M        60M         0B        74M       220M

which would make any kind of always-on/daemonized system a pain. I'll readily admit I don't know the level of resources they require, but anything greater than zero is probably too much; especially considering I am expecting zero traffic on the application. The pastes are non-relational and the file system provides a few useful indexes itself. Even though flat files are difficult to query, I don't know of an instance in which making the pastes searchable would be desirable.

A few of the previously mentioned services provide an option to anonymize pasted information, mine does so by default, with the ID of each paste random and un-guessable; it provides a kind of weak security through obscurity (which is to say none at all). More than anything I settled on this scheme because I've been led to believe you can never trust user input. In that same vein, I opted for the easy way out on serving user input

Content-Type: "text/plain; charset=UTF-8;"

This avoids the need to sanitize pastes from <script> tags and such but makes styling a pain.

The Hard Part

As I mentioned, the application itself is about 50 lines of code and I had it working in an evening. The hard part of the whole process and the one which probably taught me the most was getting the application deployed to the web. The common wisdom for Python web applications is:

don't use a development server
reverse proxy the application server

So while Flask ships with a development server for local testing, it is unsuitable for production use. Luckily, Flask is "100% WSGI compliant", so plugging the application into Gunicorn was about as easy as it gets: gunicorn --bind=localhost:8000 glue-boy:app --daemon. Even setting up Nginx as a reverse proxy was a breeze after I conceded defeat on my initial improper configuration and properly setup my sites-available and sites-enabled:

    server {
      listen 80;
      server_name glue.nprescott.com;
      root [diretory]/glue-boy;

      access_log [directory]/logs/access.log;
      error_log  [directory]/logs/error.log;

      location / {
               proxy_pass http://127.0.0.1:8000;
            }
    }

The Wrong Level of Abstraction

Despite the ease with which those two things were accomplished I had to face facts that the application did not work. I spent hours pouring over the configurations and StackOverflow answers to try and divine where I had gone wrong before realizing I was looking at the wrong thing. The application was fine, the application server was fine, the reverse proxy was fine. What I had forgotten was to properly setup the DNS with my VPS to account for the subdomain. This is neither difficult nor that big a leap to make, but my issue was that I was scrutinizing an entirely wrong level of the web application. My biggest complaint with developing for the web is the number of layers of complexity that I need to keep in my head at one time. It's great to whip up a project in hours and have great libraries and tools freely available to publish online, what's not great is getting lost in layer upon layer of abstraction from the root of the problem. In my case it was a very fundemental issue with how the internet routes addresses and I simply wasn't in the right frame of mind when I hit the problem.

For Posterity

The actual resolution to my problem was to configure the following on my server:

    Record Type:       Name:       Hostname:

    CNAME              *           nprescott.com.

While I'm getting over the annoyance of it the sting will take a while to wear off. Guess I know better now.