indexpost archiveatom feed syndication feed icon

An Interesting Architecture

2025-07-27

A few notes on an interesting software design I've been noodling on. I'm still trying to untangle how to foster organic growth in software systems.

If you wish to make an apple pie from scratch, you must first invent the universe.

—Carl Sagan

I've been trying to think how best to onboard people into software projects, in doing so I've been re-reading Peter Naur's Programming as Theory Building and trying to crystallize concepts and models for working in systems which are quite involved. Programmers tend to use jargon to elide minutiae but to the uninitiated the concepts can be lost in a sea of meaningless babble. In an effort to give more concrete models to a familiar problem I have tried drawing up a mostly working example to a design that could be described as:

A network of physical devices are represented by digital twins with event handlers responding to device data dynamically.

It sounds like something I might put on a resume but does little to help new programmers grapple with modifying the system. I have tried doing code walkthroughs before with mixed success. I think it takes a particular kind of person and a significant overlap in experiences between the presenter and reader to be effective. Instead I think it might be more productive to produce representative models for each part of the system. When the entire model is small enough to fit on one screen it tends to be easier to ask specific questions. In that spirit then I've tried to define or model all the relevant pieces of that explanation in a way that is almost as brief as possible while remaining faithful to the concepts undergirding them.

Digital Twin

This one isn't too hard to explain because it is simply an industry term for any number of design patterns for smart devices. The important concept then is that there is a physical device in the real world and the software process has a corresponding representation for interfacing with the real object. The intent is to mirror the real-world state of the thing and encapsulate methods of interacting through software. In my case the real world device produces data and can be commanded by sending it messages. The data exchange is bidirectional but because the thing which we are sending messages to is entirely separate it is sufficient to sketch out the idea that we send something without overspecifying how the interaction is handled.

real device device object receive send

Devices

In any real onboarding an explanation of devices would usually entail some hands on experience using the thing. It isn't essential to explain the devices here though so I'll focus on the software relevant aspects in a slightly backwards way:

@dataclasses.dataclass(kw_only=True)
class Device(EventHandler):
    device_id: str
    network: 'Network'

    def send(self, *args, **kwargs):
	...

Hopefully obvious, there isn't much complexity to a Device itself. It is an EventHandler (not yet explained), it has identifying information and some associated network (not yet explained). Really key to this model is the understanding that the software object for a device is intrinsically uncomplicated. To make that clear though we have to understand what an EventHandler is because the use of inheritance complects the two together. The total omission of how "send" is achieved is intended to convey that the system can send but has little guarantees for what that means because the responsibility lays with the receiving device.

Event Handlers

An event handler is a pattern for registering callbacks on an object. In the case being described the Device is an event handler in the object oriented inheritance meaning of the word. Callbacks are just function/method invocations which we are associating to discrete data types (MessageTypes) by name. I've invented only 1 message type here but I figure it is obvious how extending known message types works.

class MessageType(enum.Enum):
    INFORMATION = enum.auto()


@dataclasses.dataclass(kw_only=True)
class EventHandler:
    _handlers: dict[MessageType, typing.Callable] = dataclasses.field(default_factory=dict)

    def on(self, message_type: MessageType):
	def decorator(func):
	    @functools.wraps(func)
	    def wrapper(*args, **kwargs):
		return func(*args, **kwargs)

	    self._handlers[message_type] = wrapper
	    return wrapper
	return decorator

    def call(self, message: Message, *args, **kwargs) -> typing.Any:
	if handler := self._handlers.get(message.type):
	    return handler(device=self, message=message, *args, **kwargs)

Network

Networking is a bottomless well of potential complexity. Thankfully the very concept of networking can be distilled to basically two ideas for the purpose of explanation: devices are registered into the network and messages are routed to devices by some identifying information they carry.

@dataclasses.dataclass(kw_only=True)
class Network:
    devices: dict[str, Device] = dataclasses.field(default_factory=dict)

    def register_device(self, device: Device):
	self.devices[device.device_id] = device

    def route_message(self, target: Device, message: Message):
	if target.device_id in self.devices:
	    self.devices[target.device_id].call(message)

That covers most of the important concepts so far, save a definition for Messages (they are a plain old bag of data):

@dataclasses.dataclass(kw_only=True, frozen=True)
class Message:
    type: MessageType
    payload: str

What We Have So Far

Now, using only the standard library1 and a few dozen lines of code we have a runnable model for a minimal example of our software correspondence to a physical device on a (hand-wavey) network:

network = Network()
device_1 = Device(device_id="brave little toaster", network=network)
device_2 = Device(device_id="kirby", network=network)
network.register_device(device_1)

# handle specific message type per-device
@device_1.on(MessageType.INFORMATION)
def handle_info(*, device: Device, message: Message):
    print(f"From: {device.device_id}\n{message.type}: {message.payload}")

# Simulate incoming message
network.route_message(
    device_1,
    message=Message(
	type=MessageType.INFORMATION,
	payload=json.dumps(
	    {
		"vendor": "Westinghouse",
		"model": "one of a kind",
		"version": "0.1-RC3"
	    }, indent=2
	)
    )
)

Hopefully unsurprising, the above message causes the following to be printed in the console:

$ python test.py
From: brave little toaster
MessageType.INFORMATION: {
  "vendor": "Westinghouse",
  "model": "one of a kind",
  "version": "0.1-RC3"
}

There tend to be two primary reactions to laying out such a simplified model and getting it running, either a lightbulb moment of realization or incredulity that I've waved away the real complexity of the system. I like this level of detail because it serves to start asking really interesting questions which are not yet answered by the model. Good questions might be "What happens if a device with the same identifier is registered to the network?" or "Can messages to or from the device be lost in the network?". I think I like those kinds of questions because they give rise to explanations for all the added complexity that does exist in the real system. The real device object might be significantly more complicated to answer for all of these edge cases but you might not see it at first if you don't understand how simplified the big picture can get.

A Concrete Example

Using the example above for an INFORMATION message we can suggest that a device knows what it is and it can tell you but knowing when it will tell you is a bit of a mystery. It is required that the physical device emit a message before the system can be aware of the device's information - the device being registered to the network does not guarantee full knowledge of the device specifics. I hope the model given makes this obvious.

Knowing what sort of device it is determines how it should operate and that is driven through configuration (a new concept). There's endless variety in what sort of configuration is possible but in practice there's a handful of generally useful starting points to operating the device. Configuration comes in combinations of readable-writable, write-only, read-only (a specialization of the new concept). The determining factors to a device's identity are:

I am a big fan of trying to do the simplest thing that could possibly work so my initial suggestion might be something like this:

config_map = {
    ('vendorA', 'modelB', '1.0'):   (('some_key', 'some_value'),),
    ('vendorX', 'modelY', '24.15'):   (('some_key', 'some_other_value'), ('another_key', 'another_value')),
    ('vendorZ', 'modelZZZ', '0.1-RC3'): (('novel_key', 'novel_value'),)
}

config_map.get((input_vendor, input_model, input_version), tuple())

The above is almost comically simple but it hints at a few nice properties. I've omitted any class, function, or method definitions but I hope you can see how the inputs to the system are fixed and the return types are regular. You put in an immutable thing (a tuple of strings) and get out an iterable of pairs (tuples). The callers do not have to perform checks for None if an unknown device arrives because a for-loop over an empty tuple is just as good.

It is also a good starting place for design discussions, assuming the above was wrapped up in a nice interface is it sufficient to configure devices? Are all the combinations of (vendor, model, version) known in advance? If so it might be good enough to stop here.

Of course it is rarely the case that things are quite so easy. For example, what if you want the same set of configuration values regardless of version? What if you want the same set of configuration for either of two different model types? We mentioned readable configuration values, how do they factor into this scheme?

More Modeling

Instead of guessing too much let's start with some examples of how things could work and then work down to how they should work. If we assume (or implement) a way to get and change configuration exists under some logical name we might define ReadWrite and WriteOnly configuration like this:

@dataclasses.dataclass(frozen=True)
class Configuration:
    key: str
    value: str


class ReadWrite(Configuration):
    def apply(self, device):
	if device.get_configuration(self.key) != self.value:
	    device.change_configuration(key=self.key, value=self.value)


class WriteOnly(Configuration):
    def apply(self, device):
	device.change_configuration(key=self.key, value=self.value)

While this is subject to the real device supporting these get and set operations through our digital twin I am going to assure you, dear reader, that things really do work that way. Not only is there a definite model for the real thing, there is an entire industry specification! The signatures are fundamentally:

get_configuration(key: str) -> str
change_configuration(key: str, value: str) -> void

As described, a device is subject to more than one configuration value. Rather than subject all the callers to every nuance of these details I would prefer to group multiple configurations together:

@dataclasses.dataclass(frozen=True)
class Group:
    configurations: tuple[Configuration, ...] = ()

    def apply(self, device):
	for config in self.configurations:
	    config.apply(device)

Let us challenge ourselves to support more than exact literal matching like the dictionary-keyed-by-tuples example above. How can we map a group of configuration to a flexible criteria for matching devices? Determining "is a match?" sounds an awful lot like a predicate so I am inclined to start with a generalization like this:

class Predicate(typing.Protocol):
    def matches(self, value: str) -> bool: ...

Of course, that doesn't answer for how it is used so we'll match the functionality of an "exact match" while also supporting a new functionality "matches any of ...":

@dataclasses.dataclass
class OneOf:
    valid_values: typing.Container[str]

    def matches(self, value: str) -> bool:
	return value in self.valid_values

The Predicate described above though only checks a single attribute (string) presumably sent by the device. To match the demonstrated INFORMATION message from a device we would need an aggregate matcher. An aggregate for these Predicates should be both easy to describe and implement:

@dataclasses.dataclass(frozen=True)
class DeviceMatcher:
    vendor_matcher: Predicate
    model_matcher: Predicate
    version_matcher: Predicate

    def matches(self, hostname: str, vendor: str, model: str, version: str) -> bool:
	return (
	    self.vendor_matcher.matches(vendor) and
	    self.model_matcher.matches(model) and
	    self.version_matcher.matches(version)
	)

Now then, we have a means to group configuration values together and a defined way by which they are applied to devices. We have a means of identifying whether a device matches a set of identifying criteria and a pattern for definining new match criteria. What we need is a way to tie these two things together. In my mind this is a rule for when to configure a device, subject to its information:

@dataclasses.dataclass(frozen=True)
class Rule:
    matcher: DeviceMatcher
    config: Group
    priority: int = 0

    def matches(self, hostname: str, vendor: str, model: str, version: str) -> bool:
	return self.matcher.matches(hostname, vendor, model, version)

I have snuck in an attribute to that class "priority" because I am aware of an additional wrinkle that adds some fun to this whole system. Devices may be subject to multiple rules, imagine for example two rules:

  1. device vendor is: Foo Corporation, model is: ModelBar
  2. device vendor is: Foo Corporation, model is: ModelBar, version is: greater than 1.0

The second rule obviously needs to be tested first otherwise the first rule will always answer for it. That means the second rule needs a higher priority within the system and devices should be checked against it first. This begs the question for how rules are aggregated and how this priority resolution might work.

import bisect

class Registry:
    def __init__(self):
	self._rules = []

    def register(self, rule: Rule):
	bisect.insort(self._rules, rule, key=lambda r: -r.priority)

    def get_configuration(
	    self, vendor: str, model: str, version: str
    ) -> Group:
	for rule in self._rules:
	    if rule.matches(vendor, model, version):
		return rule.config
	return Group()

In general this is unsurprising but I was delighted to discover the bisect module in the standard library. It isn't exactly hard to do something like:

some_list.append(some_value)
some_list.sort(key=...)

But it doesn't exactly encapsulate what I am trying to do, which is a descending ordered insert.

Where Is This All Going?

We've built up two approximately independent models for how a digital twin of a real-world device might communicate in a network to dynamically respond to rules-based definitions for device configuration. The big reveal, as it were, is that they are not really independent. I qualified the configuration system with the idea that we can get and set configruation for a device, that requires extending the Device model like this:

class Device(EventHandler):
    device_id: str
    network: 'Network'

    def send(self, *args, **kwargs):
	...

    def get_configuration(self, key: str):
        ...

    def change_configuration(self, key: str, value: str):
        ...

In practice though, get and change configuration are derivative of Device.send. The rules system can be modeled like this:

registry = Registry()
registry.register(
    Rule(
        matcher=DeviceMatcher(
            vendor_matcher=OneOf({"Westinghouse"}),
            model_matcher=OneOf({"one of a kind"}),
            version_matcher=OneOf({"0.1-RC3"})
        ),
        config=Group(
            ReadWrite(key="foo", value="bar"),
        )
    )
)

The existing device EventHandler then has to callback into this registry on receipt of an INFORMATION message, supplying the data to see which configuration applies to the device-associated (vendor, model, version) information. Waving away some detail over message ordering but still working within our model of devices that looks something like this:

@device_1.on(MessageType.INFO)
def handle_info(*, device: Device, message: Message):
    information = json.loads(message.payload)
    desired_config = registry.get_configuration(
        information['vendor'],
        information['model'],
        information['version']
    )
    desired_config.apply(device)

Thoughts

Zero Defect Programming and Cleanroom Software Engineering propose a development process by which every line of code is reviewed in a formal process to prevent bugs from making it into the program. I have grappled with how to achieve that kind of formalism given the volume of code that is usually required to achieve anything in most systems. Keeping those principles in mind though, this system of designing models with well defined interfaces isn't such a wild idea. It is required at some point to subordinate detail for the implementation of a particular protocol or subsystem but I am not sure it is necessary to lose the essential detail of the thing. While I have omitted some detail of the device model and network interactions the system of configuration has actually proven itself in practice. Reviewing every line of code usually sounds daunting for having to keep all of the possible permutations and decision points in mind; here I've outlined a mostly working system with 4 IF statements and 2 FOR loops. It isn't inconceivable that those might be individually reviewed in the context of a system that is otherwise intended to be correct by construction. I think that is essentially the message of making illegal states un-representable.

Perhaps obvious is the overlap in some of my other ideas or annoyances with Python and integration testing. Many real systems don't really require full fidelity for integration and testing; useful software projects can be simultaneously abstract and specific. Imagine what it would take to substitute the model for a network with something like a Unix socket or a message broker's publish-subscribe architecture. Where those abstractions break down tend to be logical places for encapsulation or further abstraction.


  1. There is of course the necessary import prelude to run the thing but I figure you know well enough to: import dataclasses, enum, functools, json, typing