indexpost archiveatom feed syndication feed icon

SQLite and JSON Documents

2024-12-28

I've been munging configuration data for a bunch of hardware devices and it has me thinking on ways to improve some of that workflow. These are just a few notes on what has proven useful or interesting as I continue to explore my options.

Generating Data

In this particular case I'm working with some hardware that uses an industry protocol that allows introspecting equipment to gather configuration data. The configuration is given as a JSON array of objects of the form:

key
while the protocol requires some keys it is possible for vendors to create their own
value
technically optional and limited to 500 characters
readonly
largely unimportant for my uses but it is a boolean intended to signal whether the field is modifiable

A representative example of a single device configuration might be:

[{"key":"AllowOfflineTxForUnknownId","readonly":false,"value":"true"},{"key":"AuthorizationCacheEnabled","readonly":false,"value":"false"},{"key":"AuthorizationRequired","readonly":false,"value":"false"},{"key":"AuthorizeRemoteTxRequests","readonly":false,"value":"false"},{"key":"ChargeProfileMaxStackLevel","readonly":true,"value":"10"},{"key":"ChargingScheduleMaxPeriods","readonly":true,"value":"100"},{"key":"ConnectionTimeOut","readonly":false,"value":"120"},{"key":"ConnectorSwitch3to1PhaseSupported","readonly":true,"value":"false"},{"key":"EnableOfflineCharging","readonly":false,"value":"true"},{"key":"GetConfigurationMaxKeys","readonly":true,"value":"50"},{"key":"HeartbeatInterval","readonly":false,"value":"360"},{"key":"LocalAuthListEnabled","readonly":false,"value":"false"},{"key":"LocalAuthListMaxLength","readonly":true,"value":"10000"},{"key":"LocalAuthorizeOffline","readonly":false,"value":"true"},{"key":"LocalPreAuthorize","readonly":false,"value":"true"},{"key":"MaxChargingProfilesInstalled","readonly":true,"value":"100"},{"key":"MaxEnergyOnInvalidId","readonly":false,"value":"0"},{"key":"MeterValuesAlignedData","readonly":false,"value":"Energy.Active.Import.Register,SoC,Power.Active.Import,Current.Offered,Current.Import,Voltage,Power.Offered"},{"key":"MeterValuesSampledData","readonly":false,"value":"Energy.Active.Import.Register,SoC,Power.Active.Import,Current.Offered,Voltage,Current.Import,Power.Offered"},{"key":"MinimumStatusDuration","readonly":false,"value":"0"},{"key":"NonAuthorizedTag","readonly":false,"value":"SomeString"},{"key":"PeakPhaseVoltage","readonly":false,"value":"180.0"},{"key":"ReserveConnectorZeroSupported","readonly":true,"value":"true"},{"key":"ResetRetries","readonly":false,"value":"3"},{"key":"SecurityProfile","readonly":true,"value":"3"},{"key":"SendLocalListMaxLength","readonly":true,"value":"250"},{"key":"StopTransactionMaxMeterValues","readonly":true,"value":"0"},{"key":"StopTransactionOnEVSideDisconnect","readonly":false,"value":"true"},{"key":"StopTransactionOnInvalidId","readonly":false,"value":"true"},{"key":"StopTxnAlignedData","readonly":false,"value":""},{"key":"SupportedFileTransferProtocols","readonly":true,"value":"FTP,FTPS,HTTP,HTTPS"},{"key":"TransactionMessageAttempts","readonly":false,"value":"3"},{"key":"TransactionMessageRetryInterval","readonly":false,"value":"120"},{"key":"UnlockConnectorOnEVSideDisconnect","readonly":false,"value":"true"},{"key":"WebSocketPingInterval","readonly":false,"value":"300"},{"key":"ChargingScheduleAllowedChargingRateUnit","readonly":true,"value":"Current,Power"},{"key":"ClockAlignedDataInterval","readonly":false,"value":"60"},{"key":"ConnectorPhaseRotation","readonly":false,"value":"0.Unknown,1.Unknown"},{"key":"ConnectorPhaseRotationMaxLength","readonly":true,"value":"100"},{"key":"MeterValuesAlignedDataMaxLength","readonly":true,"value":"100"},{"key":"MeterValueSampleInterval","readonly":false,"value":"900"},{"key":"MeterValuesSampledDataMaxLength","readonly":true,"value":"123"},{"key":"NumberOfConnectors","readonly":true,"value":"1"},{"key":"StopTxnAlignedDataMaxLength","readonly":true,"value":"100"},{"key":"StopTxnSampledData","readonly":false,"value":"Energy.Active.Import.Register"},{"key":"StopTxnSampledDataMaxLength","readonly":true,"value":"100"},{"key":"SupportedFeatureProfiles","readonly":true,"value":"Core,LocalAuthListManagement,Reservation,SmartCharging,RemoteTrigger"},{"key":"SupportedFeatureProfilesMaxLength","readonly":true,"value":"10"}]

The specifics of the protocol and the meaning of any particular value isn't too important and instead I am focusing on the structure of the data and how I need to access it. In my case I am once again interested in the kinds of groups formed by the differences in configuration. In practice, collecting this information requires sending a request to the hardware device and receiving a response. If you'll trust that it is as uncomplicated as that I will spare some trouble and give myself some data to work with by instead duplicating the above record as many times as I figure is needed to get a sense of query performance. Rather than do anything interesting to key each record I'll be using an auto incrementing primary key. You'll have to use your imagination for how this might otherwise integrate into a broader database of hardware information.

create table configuration(id integer primary key, data json);

insert into configuration(data)
select '[{"key":"AllowOfflineTxForUnknownId","readonly":false,"value":"true"},{"key":"AuthorizationCacheEnabled","readonly":false,"value":"false"},{"key":"AuthorizationRequired","readonly":false,"value":"false"},{"key":"AuthorizeRemoteTxRequests","readonly":false,"value":"false"},{"key":"ChargeProfileMaxStackLevel","readonly":true,"value":"10"},{"key":"ChargingScheduleMaxPeriods","readonly":true,"value":"100"},{"key":"ConnectionTimeOut","readonly":false,"value":"120"},{"key":"ConnectorSwitch3to1PhaseSupported","readonly":true,"value":"false"},{"key":"EnableOfflineCharging","readonly":false,"value":"true"},{"key":"GetConfigurationMaxKeys","readonly":true,"value":"50"},{"key":"HeartbeatInterval","readonly":false,"value":"360"},{"key":"LocalAuthListEnabled","readonly":false,"value":"false"},{"key":"LocalAuthListMaxLength","readonly":true,"value":"10000"},{"key":"LocalAuthorizeOffline","readonly":false,"value":"true"},{"key":"LocalPreAuthorize","readonly":false,"value":"true"},{"key":"MaxChargingProfilesInstalled","readonly":true,"value":"100"},{"key":"MaxEnergyOnInvalidId","readonly":false,"value":"0"},{"key":"MeterValuesAlignedData","readonly":false,"value":"Energy.Active.Import.Register,SoC,Power.Active.Import,Current.Offered,Current.Import,Voltage,Power.Offered"},{"key":"MeterValuesSampledData","readonly":false,"value":"Energy.Active.Import.Register,SoC,Power.Active.Import,Current.Offered,Voltage,Current.Import,Power.Offered"},{"key":"MinimumStatusDuration","readonly":false,"value":"0"},{"key":"NonAuthorizedTag","readonly":false,"value":"SomeString"},{"key":"PeakPhaseVoltage","readonly":false,"value":"180.0"},{"key":"ReserveConnectorZeroSupported","readonly":true,"value":"true"},{"key":"ResetRetries","readonly":false,"value":"3"},{"key":"SecurityProfile","readonly":true,"value":"3"},{"key":"SendLocalListMaxLength","readonly":true,"value":"250"},{"key":"StopTransactionMaxMeterValues","readonly":true,"value":"0"},{"key":"StopTransactionOnEVSideDisconnect","readonly":false,"value":"true"},{"key":"StopTransactionOnInvalidId","readonly":false,"value":"true"},{"key":"StopTxnAlignedData","readonly":false,"value":""},{"key":"SupportedFileTransferProtocols","readonly":true,"value":"FTP,FTPS,HTTP,HTTPS"},{"key":"TransactionMessageAttempts","readonly":false,"value":"3"},{"key":"TransactionMessageRetryInterval","readonly":false,"value":"120"},{"key":"UnlockConnectorOnEVSideDisconnect","readonly":false,"value":"true"},{"key":"WebSocketPingInterval","readonly":false,"value":"300"},{"key":"ChargingScheduleAllowedChargingRateUnit","readonly":true,"value":"Current,Power"},{"key":"ClockAlignedDataInterval","readonly":false,"value":"60"},{"key":"ConnectorPhaseRotation","readonly":false,"value":"0.Unknown,1.Unknown"},{"key":"ConnectorPhaseRotationMaxLength","readonly":true,"value":"100"},{"key":"MeterValuesAlignedDataMaxLength","readonly":true,"value":"100"},{"key":"MeterValueSampleInterval","readonly":false,"value":"900"},{"key":"MeterValuesSampledDataMaxLength","readonly":true,"value":"123"},{"key":"NumberOfConnectors","readonly":true,"value":"1"},{"key":"StopTxnAlignedDataMaxLength","readonly":true,"value":"100"},{"key":"StopTxnSampledData","readonly":false,"value":"Energy.Active.Import.Register"},{"key":"StopTxnSampledDataMaxLength","readonly":true,"value":"100"},{"key":"SupportedFeatureProfiles","readonly":true,"value":"Core,LocalAuthListManagement,Reservation,SmartCharging,RemoteTrigger"},{"key":"SupportedFeatureProfilesMaxLength","readonly":true,"value":"10"}]'
  from generate_series(0)
 limit 200000;

One of the motivating reasons for any of this is to answer a question like "how many devices have some key set to some value?". Doing that is more easily accomplished if I can break out each JSON object into a row. SQLite has support for JSON and it makes building out such a table easy:

create table disaggregated_config(id, config_key, config_value);

insert into disaggregated_config(id, config_key, config_value)
select configuration.id,
       value->>'$.key',
       value->>'$.value'
  from configuration, json_each(configuration.data);

The result is an unassuming table like:

id key value
12345 AllowOfflineTxForUnknownId true
12345 AuthorizationCacheEnabled false
... ... ...

That is enough to start slicing and dicing my data! In one case I wanted a breakdown of the number of devices per key-value pair for a given subset of configurations. Viewed from the perspective of a single device or a request per-device this might be tedious but with a database like this it becomes easy:

  select config_key, config_value, count(*)
    from disaggregated_config
   where config_key in ('ConnectorPhaseRotation','BlinkRepeat')
group by config_key, config_value;

Triggering Updates

Trying to think on how I might prefer to keep this table of key-values updated as new documents are written into the source table I tried writing out a trigger to disaggregate all of the key value pairs:

create trigger disaggregate_json_array after insert on configuration
  begin
      insert into disaggregated_config(id, config_key, config_value)
      select NEW.id,
             value->>'$.key',
             value->>'$.value'
        from configuration, json_each(configuration.data)
       where configuration.id = NEW.id;
  end;

I tripped myself up thinking the possibility of updating a device's configuration and the corresponding source row of JSON data meant the above was going to be insufficient for the resulting build up of disaggregated key-value rows over time. It only just occurred to me that I could remove all the disaggregated rows each time an insert was made to the JSON table.

create trigger disaggregate_json_array after insert on configuration
  begin
      delete from disaggregated_config
            where disaggregated_config.id = NEW.id;

      insert into disaggregated_config(id, config_key, config_value)
      select NEW.id,
             value->>'$.key',
             value->>'$.value'
        from configuration, json_each(configuration.data)
       where configuration.id = NEW.id;
  end;

Indexing

Believe it or not I haven't found much of interest in trying to optimize my queries so far. Simply dropping the following in makes most of my queries sufficiently fast and all it takes is a few seconds and some disk space:

create index id_idx on disaggregated_config (id);
create index config_key_idx on disaggregated_config (config_key);

I had hoped to think of something clever to index the JSON directly with a generated column or index expression but the need to map an index 1:1 with a row proved too much to argue against the table of row-per-object configurations.

Thoughts

I'm really pleased to have 80% of a solution in approximately 10 lines of a single SQL trigger. I know that triggers can make some people uneasy so I'm left to consider naming schemes that might hint appropriately that the disaggregated_config table should not be modified or alternative approaches that might help me achieve the same results without the implicit behavior of a trigger. Personally though I am inclined to argue for the use of a trigger in this sort of scenario. The real downside is in how slow the trigger is during bulk inserts. I think I should spend some time figuring that out but it is inessential for the moment.