Revamping Local System Monitoring

2020-11-27

I noticed something wrong with my local system monitoring scripts and spent some time rewriting a few pieces. I've stripped out some of the extraneous pieces after a few years of no updates to the thing.

I have written before about the kind of monitoring I do around here, but in short, I keep track of:

free disk
RAM usage
load average

While I originally wrote a Python "service" that polled a few system files and wrote to a SQLite database I've had enough with the always-on nature of the thing. While the service might have made sense if there were any real evented tasks, it wasn't very useful as a means to run another clock on the server.

In an effort to reclaim the ~30MB of RAM, I decided to re-implement it as a timer service, since there is already one running to produce the SVG image. series of three charts showing free memory, free disk space,
and load average. Updates every 5 minutes.

Free Memory

This is a direct translation from the Python code, just written in AWK:

awk '
BEGIN {
    total=-1;
    available=-1
}

/^MemTotal:/ { total = $2 }

/^MemAvailable:/ { available = $2 }

END {
    print available/total
}' /proc/meminfo)

Load Average

This ended up being even easier than the Python:

awk '{ print $2 }' /proc/loadavg

Disk Usage

Finally, in rewriting the disk usage calculation I realized my own monitoring has been wrong for several years. In my original Python code I wrote:

    def partition_usage(self, partition):
        statvfs = os.statvfs(partition)
        total_bytes = float(statvfs.f_frsize * statvfs.f_blocks)
        free_bytes = float(statvfs.f_frsize * statvfs.f_bfree)
        return ((total_bytes - free_bytes) / total_bytes) * 100

Which looked correct, but fails to account for a pretty vital bit of behavior. I first noticed the problem when comparing with the output of df:

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vda1       20614932 3861012  15689096  20% /

If you actually add things up, you'll notice that used + available doesn't equal the total number of blocks. I noticed this when comparing my new implementation value with the old one:

df -P / | awk '/\/$/ { sub(/%/, "", $5); print $5 }'

It turns out, I failed to account for blocks reserved by the filesystem. If you check man 5 ext4 there is a hint, but phrased in a way that is a little confusing:

The ext2 filesystem reserves a certain percentage of the available space (by default 5%, see mke2fs(8) and tune2fs(8)). These options determine who can use the reserved blocks. (Roughly: whoever has the specified uid, or belongs to the specified group.)

I don't really know why it is phrased like this is only the case with ext2, checking man 8 tune2fs gives a clearer answer:

Set the percentage of the filesystem which may only be allocated by privileged processes. Reserving some number of filesystem blocks for use by privileged processes is done to avoid filesystem fragmentation, and to allow system daemons, such as syslogd(8), to continue to function correctly after non-privileged processes are prevented from writing to the filesys‐ tem. Normally, the default percentage of reserved blocks is 5%.

The result is my new disk usage measurements are about 2% higher than previous. Not critical, but still a gaff.

A Better Schema

While I previously made a painfully simple key-valued table to hold repeated keys with a timestamped value, I thought it might be time to clean up the schema while I was in here moving things around. After my recent tests developing inside a single transaction and my older work in manual schema migrations, I felt well-equipped to make quick work of things.

begin transaction;
    
create table keys (
    id integer primary key autoincrement,
    value text not null unique
);

insert into keys (value) select distinct(key) from metrics;

create table newMetrics (
    key_id    integer,
    value     float,
    timestamp datetime default current_timestamp not null,
    foreign key(key_id) references keys(id)
);

-- populate new table
insert into newMetrics (key_id, value, timestamp)
select keys.id, m.value, m.timestamp from metrics m
join keys on keys.value = key;

-- rename table
drop table metrics;
alter table newMetrics rename to metrics;

-- rename field
update keys set value = 'disk' where value = 'disk_/';
    
commit transaction;

My first step was to create a single table of "keys", built from the distinct entries in the existing data. Then, a new table with a foreign key reference to the new keys table; populated from the old table joined with the new keys. I've mentioned before, but the drop-rename dance is necessary because SQLite cannot alter a table to add a foreign key.

Finally, with a single place to modify in order to rename a "key", I updated the label for disk usage to strip off the partition (since I'm only recording one and have been for years).

An Index

While I was pleasantly surprised to see I had shaved off over 10% of the database size with the move to foreign keys, I did start thinking about the potential for better performance. The database has a very simple pattern of access:

writes

insert into metrics (key_id, value) values
    ((select id from keys where value = 'memory'), $FREEMEMORY),
    ((select id from keys where value = 'cpu'),    $CPUUSAGE),
    ((select id from keys where value = 'disk'),   $DISKUSAGE);

reads

.once /tmp/memory.dat

  select metrics.value*100, strftime('%s', timestamp)
    from metrics
    join keys on key_id=keys.id
   where timestamp > date('now', '-2 day')
     and keys.value = 'memory'
order by timestamp asc;

.once /tmp/cpu.dat

  select metrics.value, strftime('%s', timestamp)
    from metrics
    join keys on key_id=keys.id
   where timestamp > date('now', '-2 day')
     and keys.value = 'cpu'
order by timestamp asc;

.once /tmp/disk.dat

  select metrics.value, strftime('%s', timestamp)
    from metrics
    join keys on key_id=keys.id
   where timestamp > date('now', '-2 day')
     and keys.value = 'disk'
order by timestamp asc;

In the case of one of the select statements above, the query plan ends up looking like this:

id    parent         notu  deta
----  -------------  ----  ----
4     0              0     SEARCH TABLE keys USING COVERING INDEX sqlite_autoindex_keys_1 (value=?)
8     0              0     SCAN TABLE metrics
23    0              0     USE TEMP B-TREE FOR ORDER BY

Run Time: real 0.224 user 0.150418 sys 0.063802

I believe both the scan and order by get progressively slower as the table grows (it currently has over 890,000 records). The most obvious thing I could think of was adding an index on the timestamp column:

create index ts on metrics (timestamp);

Which provided a dramatic increase in speed to the same query:

id    parent         notu  deta
----  -------------  ----  ----
5     0              0     SEARCH TABLE keys USING COVERING INDEX sqlite_autoindex_keys_1 (value=?)
9     0              0     SEARCH TABLE metrics USING INDEX ts (timestamp>?)

Run Time: real 0.003 user 0.003233 sys 0.000000

The real trade-off here is space, indexing all the timestamps nearly doubled the size of the database (from about 31MB to 54MB). I decided it was worth it, but I really like having the option to trade space for time.

While I'd like to think of a way to better arrange the three queries necessary for the monitoring charts, I haven't had any good ideas that don't over-complicate things. I previously made an assumption that each key-value for the distinct keys was created at the same time, due to the way the Python was written, but I don't think that is necessarily true with the above insert statement. I might think of something, but I'm pretty happy with how simple the entire thing has become.