In an effort to get more exposure to the basics of web development I've created glue-boy, a recreation of a web community staple: the pastebin.
For those unfamiliar, a pastebin is an application that takes text input from a user and makes it available online. They are frequently used on IRC to refer to bits of code or share error messages without flooding the channel. I think pastebins are such an appealingly small project that everyone decides to make their own at one point or another (myself now included). In the interest of research, I've at least token experience with the following:
Each provides a variety of functionality that I decided to ignore for my own implementation, including:
I realize that's a pretty long list of things that I won't be doing, but I'm not looking to make waves. In the interest of exploring web application development, most of those features aren't particularly interesting (or require a much more involved system than I want to dive right into). I settled on a much more limited feature set, one that I think defines the project:
When I decided on recreating an easy project, I hadn't quite realized how easy I had made things. I used the Python micro-framework Flask for its familiarity and ease of use. For whatever complaints I might have about Flask in the abstract, hacking together a project like this is charmingly easy. I had a working prototype in an evening and in under 50 lines of code. While I didn't know it at the time, Flask's WSGI compliance makes for effortless production deployment.
I began the project with the assumption that I would store pastes in flat files, rather than any kind of database, until it became an issue or I felt that particular itch. The more I've thought about it though the more I find myself thinking flat files are the more correct solution. My application machine, while not under-powered, has limited resources:
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
cpu MHz : 2400.032
cache size : 15360 KB
fpu : yes
bogomips : 4800.06
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
$ free -h
total used free shared buffers cached
Mem: 497M 436M 60M 0B 74M 220M
which would make any kind of always-on/daemonized system a pain. I'll readily admit I don't know the level of resources they require, but anything greater than zero is probably too much; especially considering I am expecting zero traffic on the application. The pastes are non-relational and the file system provides a few useful indexes itself. Even though flat files are difficult to query, I don't know of an instance in which making the pastes searchable would be desirable.
A few of the previously mentioned services provide an option to anonymize pasted information, mine does so by default, with the ID of each paste random and un-guessable; it provides a kind of weak security through obscurity (which is to say none at all). More than anything I settled on this scheme because I've been led to believe you can never trust user input. In that same vein, I opted for the easy way out on serving user input
Content-Type: "text/plain; charset=UTF-8;"
This avoids the need to sanitize pastes from <script>
tags and such but makes
styling a pain.
As I mentioned, the application itself is about 50 lines of code and I had it working in an evening. The hard part of the whole process and the one which probably taught me the most was getting the application deployed to the web. The common wisdom for Python web applications is:
So while Flask ships with a development server for local testing, it is
unsuitable for production use. Luckily, Flask is "100% WSGI compliant", so
plugging the application into Gunicorn was about as easy as it gets:
gunicorn --bind=localhost:8000 glue-boy:app --daemon
. Even setting up Nginx as
a reverse proxy was a breeze after I conceded defeat on my initial improper
configuration and properly setup my sites-available
and sites-enabled
:
server {
listen 80;
server_name glue.nprescott.com;
root [diretory]/glue-boy;
access_log [directory]/logs/access.log;
error_log [directory]/logs/error.log;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
Despite the ease with which those two things were accomplished I had to face facts that the application did not work. I spent hours pouring over the configurations and StackOverflow answers to try and divine where I had gone wrong before realizing I was looking at the wrong thing. The application was fine, the application server was fine, the reverse proxy was fine. What I had forgotten was to properly setup the DNS with my VPS to account for the subdomain. This is neither difficult nor that big a leap to make, but my issue was that I was scrutinizing an entirely wrong level of the web application. My biggest complaint with developing for the web is the number of layers of complexity that I need to keep in my head at one time. It's great to whip up a project in hours and have great libraries and tools freely available to publish online, what's not great is getting lost in layer upon layer of abstraction from the root of the problem. In my case it was a very fundemental issue with how the internet routes addresses and I simply wasn't in the right frame of mind when I hit the problem.
The actual resolution to my problem was to configure the following on my server:
Record Type: Name: Hostname:
CNAME * nprescott.com.
While I'm getting over the annoyance of it the sting will take a while to wear off. Guess I know better now.