I had an interesting problem to work on this afternoon that led to a few features of Postgres I haven't previously used. It was an interesting exercise in some quick prototyping in pure SQL.
I'm working on a way to visualize groups of key-value pairs that are both dynamic in the number of groups possible, as well as the possible keys and values. It is a bit of an interesting exercise in developing queries for patterns rather than specific information.
Generally speaking, I'm thinking of two tables, a names table:
id | name |
---|---|
1 | alpha |
2 | bravo |
3 | charlie |
4 | delta |
create table names (
id serial primary key,
name text not null
);
and a table of key-value pairs, containing a reference to the names table:
key | value | name-id |
---|---|---|
a | foo | 1 |
a | foo | 2 |
b | bar | 2 |
c | qux | 3 |
c | qux | 4 |
create table keyValues (
key text not null,
value text not null,
name_id integer references names(id)
);
What I'd like is to collect discrete "groups" of keys and values, so in the above table the groups would be:
group one | group two | group three |
---|---|---|
alpha | bravo | charlie, delta |
So while a: foo
is referenced by alpha
and bravo
, bravo
has an
additional key-value (b: bar
) that makes it a separate group to
itself. The only group with more than one element is the third, where
charlie
and delta
share a set of key-value pairs (in this case the
single pair c: qux
).
Because there aren't any real constraints in the data on what is or isn't a valid key or value, I'd like to combine the two pieces of information into one composite piece.
select name_id, (key || ': ' || value) kv_string
from example
order by key, value
Here I'm just concatenating the two fields together to create:
name-id | kv-string |
---|---|
1 | a: foo |
2 | a: foo |
2 | b: bar |
3 | c: qux |
4 | c: qux |
That becomes useful when used in conjunction with the function
string_agg
which is an aggregation into a string for multiple rows
(where concatenating was useful to "aggregate" multiple
columns). Below I'm using a placeholder "temp_table" that will be
clarified below, but assume it has the information in the table above:
select name_id, string_agg(kv_string, ', ') key_values
from temp_table
group by name_id
name-id | key-values |
---|---|
3 | c: qux |
4 | c: qux |
2 | a: foo, b: bar |
1 | a: foo |
Each name_id
now has a single column of all it's key-value
pairs. All that remains is grouping those names with a shared
key_value
field. For this, I found array_agg
, which is like
string_agg
, but it aggregates into an array (which Postgres can
handle in a variety of ways).
select array_agg(id) groups
from temp_table_2
group by key_values
groups |
---|
{1} |
{2} |
{3,4} |
All of those references to "temp_tables" might be annoying if we had to actually create those tables ourselves, but luckily, Postgres supports common table expressions (which I've mentioned before). The separate queries can be combined like this:
with concatenated_key_values as (
select id, (key || ' ' || value) kv_string
from example
order by key, value
), combined_columns as (
select id, string_agg(kv_string, ', ') key_values
from concatenated_key_values
group by id
) select array_agg(id) groups
from combined_columns
group by string_agg;
As a final touch, to bring the end result in-line with what I outlined
as the goal, it'd be nice to enumerate the groups and use the name,
rather than the name_id
.
with concatenated_key_values as (
select id, (key || ' ' || value) kv_string
from example
order by key, value
), combined_rows as (
select id, string_agg(kv_string, ', ') key_values
from concatenated_key_values
group by id
) select row_number() over () group_number,
array_agg(name) group_members
from combined_rows
join names using(id)
group by key_values;
group number | group members |
---|---|
1 | {alpha} |
2 | {bravo} |
3 | {charlie,delta} |
This ended up being an interesting exercise is some of PostgreSQL's
aggregate
functions,
it took a while for things to click with me that I was doing
aggregations in two "directions", first across columns and then across
rows. Having the ability to omit an order by
clause from the
row_number
function was an interesting twist, but because I don't
actually care which number is assigned to which "group", it does make
some sense. Because I am developing the tables at the same time as the
queries, and before the application it is a pleasant surprise how
easy it is to refine these sorts of prototypes directly in Postgres.