[nolan@nprescott.com] $>  cat blog archive feed

Poor Man's System Monitoring

2017-07-28

I've been trying my hand at designing a stupidly simple system for monitoring the state of a server. This has turned into an exploration of the design of existing systems, and stealing shamelessly from those designs.

Defining "System Monitoring"

For several boring reasons, I've narrowly defined system monitoring to meet the following criteria: the periodic collection of (at least) 3 key metrics, load average, disk usage, memory usage. The intent isn't to make things too easy, but these are three useful, real-world values that give the project some relevance, without burdening it with a lot of hyper-specific scenarios.

Inevitably, you may notice this is re-implementing some portion of the popular open-source project Munin. This is true. My first exposure to Munin was in a project attempting to integrate the rudimentary reporting API with an external system. Say what you will about Munin, it is widely successful for its ability to deploy easily and the approach taken with the plugin architecture ensures that nearly anybody could write something and get it working. The down-side is, the project seems to have been designed without a real use-case in mind to report information outside of Munin. I can only imagine this is why there have been so many "modern" alternatives in recent years.

I won't apologize for this level of not-invented-here, but I also won't claim my solution to be an improvement on the functionality that exists in other projects. What doing it myself affords me is two-fold, an interesting project to work on, and a chance to explore a few technologies that I've been working with lately.

How It's Made

I've been trying to get a grip on Python's Twisted, so I'm availing myself of the opportunity here to utilize some of the batteries-included nature of both it and Python. Taking a page from the design of Munin, I am aiming for a daemon to "wake up" every n-minutes and collect load average, disk and memory usage. Unlike Munin, which uses RRDTool behind the scenes, I am intending to use SQLite for storage. Round robin databases have a few nice properties that I may miss, but for such a small prototype, I think they'd be overkill.

The Business End of Things

So far as actually collecting the desired measures of system performance, Python's standard library makes things pretty easy. I have the luxury of focusing only on Linux systems, specifically Debian-alike systems, which is reflected in a few specific paths to proc-files and their format.

Load Average

The easiest of the three, this is a one-liner with Python.

    def load_average(self):
        try:
            one_minute, five_minute, fifteen_minute = os.getloadavg()
            return ('cpu', five_minute)
        except Exception as e:
            logging.warn(f'An error occurred collecting load average: {e}')

Disk Usage

This one will obviously be tailored to any of those partitions I'm interested in monitoring, I'm using only the root partition for the sake of example.

    def partition_usage(self, partition):
        statvfs = os.statvfs(partition)
        total_bytes = float(statvfs.f_frsize * statvfs.f_blocks)
        free_bytes = float(statvfs.f_frsize * statvfs.f_bfree)
        return ((total_bytes - free_bytes) / total_bytes) * 100

    def disk_usage(self):
        try:
            return [('disk_' + partition, self.partition_usage(partition))
                    for partition in self.monitored_partitions]
        except Exception as e:
            logging.warn(f'An error occurred collecting disk usage: {e}')

Memory Usage

    def memory_usage(self):
        try:
            with open('/proc/meminfo', 'r') as meminfo:
                for line in meminfo:
                    if line.startswith('MemTotal:'):
                        total = float(line.split()[1])
                    if line.startswith('MemAvailable:'):
                        available = float(line.split()[1])
                        free_memory = available / total
            return ('memory', free_memory)
        except Exception as e:
            logging.warn(f'An error occurred collecting memory usage: {e}')

The most system-specific of the lot, this one is reading and parsing meminfo, which looks like this:

$ cat /proc/meminfo
MemTotal:         504228 kB
MemFree:          383232 kB
MemAvailable:     453024 kB
Buffers:           11304 kB
Cached:            62556 kB
SwapCached:            0 kB
Active:            50100 kB
...

This is pretty much exactly how free works, which is easily checked with strace (if you look past some of the noise):

$ strace free
...
open("/proc/meminfo", O_RDONLY)         = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "MemTotal:         504228 kB\nMemF"..., 8191) = 1251
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(4, "# Locale name alias data base.\n#"..., 4096) = 2995
read(4, "", 4096)                       = 0
close(4)                                = 0
...

Tying It Together

I haven't yet given too much thought to the best database schema. There seem to be two potential solutions:

Key-Value, single table

Definitely the simpler of the two schemas, I think this one may suffer a bit in storage for the necessary duplication of keys in each entry. This, to me, feels more like the statsd approach of fire-and-forget, allowing for more flexibility at the cost of some overhead.

CREATE TABLE metrics (
       key       TEXT,
       value     FLOAT,
       timestamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL
);

More Properly Relational

This one is more inline with standard relational practices, but (I think) would require look-ups on each insert to get the relevant key for each metric. I think this may eventually be the way to go.

CREATE TABLE systems (
       _id       INTEGER PRIMARY KEY AUTOINCREMENT,
       system    TEXT
);

CREATE TABLE metrics (
       system_id INTEGER,
       value     FLOAT,
       timestamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
       FOREIGN KEY(system_id) REFERENCES systems(_id)
);

Implementing It

The final bit that's missing is the until-now elided SystemMonitor and the MetricsDatabaseencapsulates the database writes:

class MetricsDatabase:
    db_pool = adbapi.ConnectionPool('sqlite3', 'metrics.sqlite',
                                    check_same_thread=False)
    table = 'metrics'

    @inlineCallbacks
    def save_metric(self, key, value):
        yield self.db_pool.runOperation("INSERT INTO metrics (key, value) "
                                        "VALUES (?, ?)", (key, value))


class SystemMonitor:
    def __init__(self):
        self.db = MetricsDatabase()
        self.monitored_partitions = ['/']

Twisted's adbapi is a pleasantly unsurprising wrapper the feels a lot like the standard library's. I'm sure I'll want to refactor the __init__ method in the SystemMonitorthe future, for easier dependency injection/testing, but for now this is a working implementation. The remaining thing is to configure a system daemon using Twisted.

Daemon-izing Things

Probably the most surprising part of the process for me was finding how easy Twisted makes it to implement daemons. While Munin uses cron to manage the periodic collection of system values, I've opted for a more self-contained system using Twisted's TimerService. Immediately, the home-grown version presents a greater flexibility in handling special cases than Munin can; in particular I'm referring to the ability to modify the data collection rate. One unpleasant discovery with Munin is how deeply ingrained the 5 minute cron interval is to everything working as intended.

from twisted.application import service
from twisted.application.internet import TimerService

from system_monitor import SystemMonitor

INTERVAL_MINUTES = 5

application = service.Application("SystemMonitor")
services = service.IServiceCollection(application)
monitor = SystemMonitor()
timer = TimerService((INTERVAL_MINUTES * 60), monitor.write_system_metrics)
timer.setServiceParent(services)

Running the Daemon

Finally, to setup automatic restarts at the operating system level, I've written a basic systemd service file:

[Unit]
Description=System Monitor

[Service]
ExecStart=/path/to/venv/twistd --nodaemon --python=/path/to/MonitoringService.py

Restart=always

[Install]
WantedBy=multi-user.target

The --nodaemon flag is an idiosyncrasy of systemd's handling of processes, instead of manually background-ing (or in this case not background-ing) the process, systemd manages foreground processes behind the scenes and can track logging more easily with journalctl (useful for debugging the service file).

Space Requirements

For my own curiosity, I've attempted to quantify my estimated storage requirements as follows:

60 minutes/hour
5 minute samples
24 hours/day
365 days/year
105120 samples per year
45 bytes per row
4730400 bytes per year
1024 bytes/kilobyte
4619.5313 kilobytes per year
1024 kilobytes per megabyte
4.5112610 megabytes per year

I've based the field sizes off of this document, which claims SQLite stores floats as "reals", which are 8 bytes, and this post which explains how SQLite manages text fields. It seems safe to assume all of the keys will likely fall within 1-byte lengths. In the methods above, the largest single key is 6 characters, so calculating with 10-bytes seems conservative enough. Dates are stored as ISO8601 strings (23 characters).

Thoughts

As is usually the case, now that I've gone through the trouble of writing it all up I am left with a real desire to change things pretty drastically. I think the entire design is a little too strongly influenced by Munin, which makes sense, in that is where I started. But ultimately, I think the design would be better if it took an approach closer to statsd, and decoupled the sender from the receiver a bit further. Right now, the whole system relies on a kind of single-worker that is managed on a timer; I think it may be more interesting if the timer approach could be supplemented with a system that may (or may not!) send values in without being invoked by the daemon directly.

That will have to be a project for a later time, I think the potential here will be in the consumption of this data, now that it exists. If we crib a bit further from Munin, this will mean some barebones visualizations, but I wonder if I can't do a bit better. Right now the only interface to the data is through sqlite:

sqlite> select * from metrics;
key         value              timestamp
----------  -----------------  -------------------
memory      0.862006869908057  2017-07-31 04:09:27
cpu         0.0                2017-07-31 04:09:27
disk_/      18.6195640114525   2017-07-31 04:09:27
memory      0.860142633887844  2017-07-31 04:09:32
cpu         0.0                2017-07-31 04:09:32
disk_/      18.6195640114525   2017-07-31 04:09:32
memory      0.860206097241724  2017-07-31 04:09:37
cpu         0.0                2017-07-31 04:09:37
disk_/      18.6195640114525   2017-07-31 04:09:37
[nolan@nprescott.com] $> █