Just a silly idea for a web application that will put a chill down your spine.
This one is actually motivated by a project where I originally
implemented something worse. The idea is to have a button
in a web application that, when clicked, restarts a systemd service
unit in the same vein as doing sudo systemctl restart
example.service
:
Typically restarting services is a privileged operation. In the case
of user services you might get by restarting things that you
yourself started or control but what I am describing is separating
the Linux user account performing the restart from the web
application user initiating the action. This might be seen as
analogous to hooking up sudo
to the internet.
Is it possible to do this dangerous sounding thing with some margin of safety? How do you guard against the most obvious problem cases? Let's give it a shot!
Things that would concern me:
Separating the web server from the "restart services" application
seems like the biggest thing to tackle first. With it done it is
possible to continue through with more precautions but they aren't
worth much without this bare minimum satisfied. Having spent some
time recently digging through the Python documentation and standard
library I've been looking for a reason to try
out xmlrpc.server
.
What if the web server only communicated via remote procedure call
to a more locked down service capable of invoking restarts? Here is
a tiny web application that has a single page presenting a form with
one field. I'm using Bottle because it is comically small and seems
to work well with waitress for socket activation:
import os, socket, xmlrpc.client
import bottle
import waitress
@bottle.route('/')
def main():
return '''
<form action="/restart" method="post">
Service ID: <input name="service_id" type="text" />
<input value="restart" type="submit" />
</form>
'''
def invoke_restart(identifier, rpc_server_address):
proxy = xmlrpc.client.ServerProxy(rpc_server_address)
return proxy.restart_unit(identifier)
@bottle.post('/restart')
def restart_handler():
service_id = bottle.request.forms.get('service_id')
return invoke_restart(service_id, bottle.request.app.config['RPC_SERVER'])
if __name__ == '__main__':
SYSTEMD_FIRST_SOCKET_FD = 3
sockets = [socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)]
application = bottle.default_app()
rpc_server = os.getenv('RPC_SERVER')
print(f'RPC server is: {rpc_server}')
application.config['RPC_SERVER'] = rpc_server
waitress.serve(application, sockets=sockets)
I'm using the same approach I laid out in a recent post about zipapps. This application and dependencies can be bundled into a single zip file that can be executed by Python directly. I plan to run it as a systemd service so I've built in socket activation and rely on an environment variable to direct RPC requests to another server. The systemd socket definition is unremarkable:
[Socket]
ListenStream=8080
[Install]
WantedBy=sockets.target
The service definition is a little more interesting:
[Unit]
Description=web app and xml-rpc client
Requires=client.socket
[Service]
DynamicUser=true
ProtectHome=true
PrivateUsers=true
Environment="RPC_SERVER=http://10.0.0.2:8081"
ExecStart=/usr/bin/python /opt/client.pyz
I am running the web application under a non-privileged user with
very locked-down access to the host system (via
the DynamicUser
and "protect" directives).
The RPC server in this case will only have one method registered.
import dataclasses, os, re, socket, socketserver, xmlrpc.server
import dbus
class ThreadedSimpleXMLRPCServer(socketserver.ThreadingMixIn,
xmlrpc.server.SimpleXMLRPCServer):
"""A threaded, socket-activated SimpleXMLRPCServer"""
def __init__(self):
xmlrpc.server.SimpleXMLRPCServer.__init__(self, (None, None), bind_and_activate=False)
SYSTEMD_FIRST_SOCKET_FD = 3
self.socket = socket.fromfd(SYSTEMD_FIRST_SOCKET_FD, socket.AF_INET, socket.SOCK_STREAM)
def valid_id(ident):
# a more robust validation would check existence instead of format
return bool(re.match(r'^[0-9]{4}$', str(ident)))
@dataclasses.dataclass
class Result:
success: bool
message: str
class Service:
def restart_unit(self, identifier):
if not valid_id(identifier):
print(f'invalid identifer: {identifier}')
return Result(success=False, message='invalid identifier')
try:
sysbus = dbus.SystemBus()
systemd1 = sysbus.get_object("org.freedesktop.systemd1", "/org/freedesktop/systemd1")
manager = dbus.Interface(systemd1, "org.freedesktop.systemd1.Manager")
job = manager.RestartUnit(f"example@{identifier}.service", "replace")
return Result(success=True, message='success')
except Exception as e:
print(f'Exception occurred: {e}')
return Result(success=False, message=f'{e}')
if __name__ == '__main__':
# TODO this doesn't error out with a bad FD when not
# socket-activated, should fix that
with ThreadedSimpleXMLRPCServer() as server:
server.register_instance(Service())
server.serve_forever()
For my use case the services being restarted are limited to a single
kind of service, with multiple instances running via the templating
capability of systemd services
(e.g. example@0001.service
, example@4321.service
). The
validation is a little light but should at least convey my idea -
inputs should be checked and only then plugged into a fixed format
before being executed. Below is the socket and service definitions
for this second server:
[Socket]
ListenStream=8081
[Install]
WantedBy=sockets.target
[Unit]
Description=xml-rpc server (and service restarter)
Requires=server.socket
[Service]
User=restarter
DynamicUser=true
ProtectHome=true
PrivateUsers=true
ExecStart=/usr/bin/python /opt/server.py
dbus
is not technically part of the standard library
but has been present on all the Linux distros I cared to check. I
think it is baked into enough operating system packages that it can
be relied upon not disappearing from the distribution mid-release. I
have looked at alternatives to using dbus directly in the past but
convenience wins out here, because it is available from the host OS
there are no dependencies to bundle and the "server" is a single
Python file. Otherwise this has much the same configuration as the
previous service: limited network access, locked down system access
with DynamicUser
, no socket binding performed
directly. One curiousity here is in the addition of
the User
directive, this does not lose the guardrails
of the dynamic user but does run the process under a fixed name
which enables the next piece to making this restart-service work
safely.
With the above configuration any request made from the RPC client (web application) to the RPC server (restart service) will complete with the following1:
{
"success": false,
"message": "org.freedesktop.DBus.Error.InteractiveAuthorizationRequired: Interactive authentication required."
}
Even invoking the restart via dbus doesn't eliminate the need to authorize the restart command. The user running the RPC server ("restarter") has no permissions on the system to perform such an operation. To grant a very narrow slice of permissions to this user I wrote a polkit rule:
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.systemd1.manage-units" &&
/^example@(\d{4})\.service$/.test(action.lookup("unit")) &&
action.lookup("verb") == "restart" &&
subject.user == "restarter") {
return polkit.Result.YES;
}
});
This rule tests that the action being invoked matches a fixed format
(also qualified by the "validation" in the RPC server)
like: example@1234.service
and that the command is
being invoked by the user "restarter". Even in the event the
validation of the RPC server is circumvented the requests will be
rejected for not matching the pattern tested. If another user gains
access to the RPC server they will have no ability (short of getting
root) to restart these services.
To the final concern I raised about integrating with web application access controls; I didn't do that at all here! I don't have a minimal representative sample to plug into this example. What I'm imagining though should, I think, slot cleanly into everything up to this point. If the web application required a user login then you could invoke the RPC call based on some attribute of the user object rather than from a form field. This could obviously further limit the range of potential inputs and is another logical place for validation. Similarly, validation at the RPC server could verify that the requested service exists before commanding a restart.
Is this a weird, niche architecture that presents some horrible flaw
in the long term? I don't know! If I look at how this works and
compare it to more "modern" software designs it doesn't seem to have
any more problems. The real argument in favor of this from
my perspective is how low the overhead is to accomplishing it. In
this example I've developed the web application and RPC server on
two distinct machines sharing a private network (the 10.0.0.0/8
address space). Partly I did this to test my thinking but it is a
cinch to move the pieces around the network without big changes. If
the RPC server were moved to share a host with the web application
all that would change would be the RPC_SERVER
environment variable. It seems like there is an easy path forward if
I were to use a name-based scheme like DNS or even just HAProxy with
path or subdomain based routing.
I think there is some more work that could be done to scrub or
validate RPC request inputs but reading the documentation it doesn't
sound very onerous. Additionally the xmlrpc library has some level
of support on the client for HTTP basic authentication. I think it
wouldn't be too much trouble to extend
the SimpleXMLRPCRequestHandler
to support basic auth as
a way to further limit access to the RPC server. Configuring the
secrets on both the client and server sound like the exact case
for LoadCredential
which is built into systemd services.
Rummaging around in the Python standard libary and trying to leverage the capabilities of a modern Linux operating system continues to impress me. I think there is an argument to be made that I am basically building my own version of all sorts of other tools provided by the frameworks of the moment but to that I say "so what". I like discovering the motivations for some of the architectures of today and I think the best way to do that is to try inventing things on my own. It also leaves me feeling much more comfortable with the idea that I could work my way out of the kinds of situations that I'm more likely to face day-to-day, untangling some mess of legacy architecture bolted onto a Kubernetes deployment.
A few neat things about this return value: SimpleXMLRPCServer
deals in basic types which includes dictionaries. dataclasses
have a __dict__
method that produces their
attributes so the dataclass ends up serialized directly into the
RPC response. Bottle automatically detects when a dictionary is
returned from a request and produces a JSON response type.