Spooky Web Application Idea

2023-07-08

Just a silly idea for a web application that will put a chill down your spine.

A Button That Restarts System Services

This one is actually motivated by a project where I originally implemented something worse. The idea is to have a button in a web application that, when clicked, restarts a systemd service unit in the same vein as doing sudo systemctl restart example.service:

Why Is It Spooky?

Typically restarting services is a privileged operation. In the case of user services you might get by restarting things that you yourself started or control but what I am describing is separating the Linux user account performing the restart from the web application user initiating the action. This might be seen as analogous to hooking up sudo to the internet.

Is it possible to do this dangerous sounding thing with some margin of safety? How do you guard against the most obvious problem cases? Let's give it a shot!

Things that would concern me:

anything internet-facing should have limited permissions on the host
commanding service restarts (in my case) is limited to a single kind of service, generalizing to restarting any service, or worse, performing any privileged operation is much too broad
is it possible to layer web application access controls onto the operations performed?

Separating the web server from the "restart services" application seems like the biggest thing to tackle first. With it done it is possible to continue through with more precautions but they aren't worth much without this bare minimum satisfied. Having spent some time recently digging through the Python documentation and standard library I've been looking for a reason to try out xmlrpc.server. What if the web server only communicated via remote procedure call to a more locked down service capable of invoking restarts? Here is a tiny web application that has a single page presenting a form with one field. I'm using Bottle because it is comically small and seems to work well with waitress for socket activation:

import os, socket, xmlrpc.client

import bottle
import waitress

@bottle.route('/')
def main():
    return '''
        <form action="/restart" method="post">
            Service ID: <input name="service_id" type="text" />
            <input value="restart" type="submit" />
        </form>
    '''

def invoke_restart(identifier, rpc_server_address):
    proxy = xmlrpc.client.ServerProxy(rpc_server_address)
    return proxy.restart_unit(identifier)

@bottle.post('/restart')
def restart_handler():
    service_id = bottle.request.forms.get('service_id')
    return invoke_restart(service_id, bottle.request.app.config['RPC_SERVER'])

if __name__ == '__main__':
    SYSTEMD_FIRST_SOCKET_FD = 3
    sockets = [socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)]
    application = bottle.default_app()
    rpc_server = os.getenv('RPC_SERVER')
    print(f'RPC server is: {rpc_server}')
    application.config['RPC_SERVER'] = rpc_server
    waitress.serve(application, sockets=sockets)

I'm using the same approach I laid out in a recent post about zipapps. This application and dependencies can be bundled into a single zip file that can be executed by Python directly. I plan to run it as a systemd service so I've built in socket activation and rely on an environment variable to direct RPC requests to another server. The systemd socket definition is unremarkable:

[Socket]
ListenStream=8080

[Install]
WantedBy=sockets.target

The service definition is a little more interesting:

[Unit]
Description=web app and xml-rpc client
Requires=client.socket

[Service]
DynamicUser=true
ProtectHome=true
PrivateUsers=true
Environment="RPC_SERVER=http://10.0.0.2:8081"
ExecStart=/usr/bin/python /opt/client.pyz

I am running the web application under a non-privileged user with very locked-down access to the host system (via the DynamicUser and "protect" directives).

The RPC server in this case will only have one method registered.

import dataclasses, os, re, socket, socketserver, xmlrpc.server

import dbus


class ThreadedSimpleXMLRPCServer(socketserver.ThreadingMixIn,
                                 xmlrpc.server.SimpleXMLRPCServer):
    """A threaded, socket-activated SimpleXMLRPCServer"""
    def __init__(self):
        xmlrpc.server.SimpleXMLRPCServer.__init__(self, (None, None), bind_and_activate=False)
        SYSTEMD_FIRST_SOCKET_FD = 3
        self.socket = socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)


def valid_id(ident):
    # a more robust validation would check existence instead of format
    return bool(re.match(r'^[0-9]{4}$', str(ident)))


@dataclasses.dataclass
class Result:
    success: bool
    message: str


class Service:
    def restart_unit(self, identifier):
        if not valid_id(identifier):
            print(f'invalid identifer: {identifier}')
            return Result(success=False, message='invalid identifier')
        try:
            sysbus = dbus.SystemBus()
            systemd1 = sysbus.get_object("org.freedesktop.systemd1", "/org/freedesktop/systemd1")
            manager = dbus.Interface(systemd1, "org.freedesktop.systemd1.Manager")
            job = manager.RestartUnit(f"example@{identifier}.service", "replace")
            return Result(success=True, message='success')
        except Exception as e:
            print(f'Exception occurred: {e}')
            return Result(success=False, message=f'{e}')


if __name__ == '__main__':
    # TODO this doesn't error out with a bad FD when not
    # socket-activated, should fix that
    with ThreadedSimpleXMLRPCServer() as server:
        server.register_instance(Service())
        server.serve_forever()

For my use case the services being restarted are limited to a single kind of service, with multiple instances running via the templating capability of systemd services (e.g. example@0001.service, example@4321.service). The validation is a little light but should at least convey my idea - inputs should be checked and only then plugged into a fixed format before being executed. Below is the socket and service definitions for this second server:

[Socket]
ListenStream=8081

[Install]
WantedBy=sockets.target

[Unit]
Description=xml-rpc server (and service restarter)
Requires=server.socket

[Service]
User=restarter
DynamicUser=true
ProtectHome=true
PrivateUsers=true
ExecStart=/usr/bin/python /opt/server.py

dbus is not technically part of the standard library but has been present on all the Linux distros I cared to check. I think it is baked into enough operating system packages that it can be relied upon not disappearing from the distribution mid-release. I have looked at alternatives to using dbus directly in the past but convenience wins out here, because it is available from the host OS there are no dependencies to bundle and the "server" is a single Python file. Otherwise this has much the same configuration as the previous service: limited network access, locked down system access with DynamicUser, no socket binding performed directly. One curiousity here is in the addition of the User directive, this does not lose the guardrails of the dynamic user but does run the process under a fixed name which enables the next piece to making this restart-service work safely.

With the above configuration any request made from the RPC client (web application) to the RPC server (restart service) will complete with the following¹:

{
  "success": false,
  "message": "org.freedesktop.DBus.Error.InteractiveAuthorizationRequired: Interactive authentication required."
}

Even invoking the restart via dbus doesn't eliminate the need to authorize the restart command. The user running the RPC server ("restarter") has no permissions on the system to perform such an operation. To grant a very narrow slice of permissions to this user I wrote a polkit rule:

polkit.addRule(function(action, subject) {
  if (action.id == "org.freedesktop.systemd1.manage-units" &&
      /^example@(\d{4})\.service$/.test(action.lookup("unit")) &&
      action.lookup("verb") == "restart" &&
      subject.user == "restarter") {
    return polkit.Result.YES;
  }
});

This rule tests that the action being invoked matches a fixed format (also qualified by the "validation" in the RPC server) like: example@1234.service and that the command is being invoked by the user "restarter". Even in the event the validation of the RPC server is circumvented the requests will be rejected for not matching the pattern tested. If another user gains access to the RPC server they will have no ability (short of getting root) to restart these services.

Not Implemented Here

To the final concern I raised about integrating with web application access controls; I didn't do that at all here! I don't have a minimal representative sample to plug into this example. What I'm imagining though should, I think, slot cleanly into everything up to this point. If the web application required a user login then you could invoke the RPC call based on some attribute of the user object rather than from a form field. This could obviously further limit the range of potential inputs and is another logical place for validation. Similarly, validation at the RPC server could verify that the requested service exists before commanding a restart.

Thoughts

Is this a weird, niche architecture that presents some horrible flaw in the long term? I don't know! If I look at how this works and compare it to more "modern" software designs it doesn't seem to have any more problems. The real argument in favor of this from my perspective is how low the overhead is to accomplishing it. In this example I've developed the web application and RPC server on two distinct machines sharing a private network (the 10.0.0.0/8 address space). Partly I did this to test my thinking but it is a cinch to move the pieces around the network without big changes. If the RPC server were moved to share a host with the web application all that would change would be the RPC_SERVER environment variable. It seems like there is an easy path forward if I were to use a name-based scheme like DNS or even just HAProxy with path or subdomain based routing.

I think there is some more work that could be done to scrub or validate RPC request inputs but reading the documentation it doesn't sound very onerous. Additionally the xmlrpc library has some level of support on the client for HTTP basic authentication. I think it wouldn't be too much trouble to extend the SimpleXMLRPCRequestHandler to support basic auth as a way to further limit access to the RPC server. Configuring the secrets on both the client and server sound like the exact case for LoadCredential which is built into systemd services.

Rummaging around in the Python standard libary and trying to leverage the capabilities of a modern Linux operating system continues to impress me. I think there is an argument to be made that I am basically building my own version of all sorts of other tools provided by the frameworks of the moment but to that I say "so what". I like discovering the motivations for some of the architectures of today and I think the best way to do that is to try inventing things on my own. It also leaves me feeling much more comfortable with the idea that I could work my way out of the kinds of situations that I'm more likely to face day-to-day, untangling some mess of legacy architecture bolted onto a Kubernetes deployment.

A few neat things about this return value: SimpleXMLRPCServer deals in basic types which includes dictionaries. dataclasses have a __dict__ method that produces their attributes so the dataclass ends up serialized directly into the RPC response. Bottle automatically detects when a dictionary is returned from a request and produces a JSON response type.