More SQL Prototypes

2020-10-17

Noodling on a different software I've wanted to write at one point or another I've been considering about how I'd design one specific kind of application; the manual test case management system.

This Seems Familiar

Years ago I made a similar speculative design, but frankly I lacked experience and got bogged down in useless details. Thinking about it today I gave myself a kind of time-boxed writing prompt as a personal exercise. I settled on sketching out just the data model and the sorts of queries necessary to test out my design.

I like giving myself these sorts of small exercises because it provides a chance to try out new tools or processes. For example, today I wanted to try out an approach described in The Art of PostgreSQL, where the workflow is to build the application inside a SQL file inside of a single transaction. The idea is that while the design is in flux you just run the entire script, including data loading and queries.

Part of what appeals to me about this workflow is how well it integrates with my development environment. I'm using emacs, which (of course) has a host of SQL functionality, including sql-send-buffer which shuffles an entire buffer to a connected SQL process. This means I can evaluate an entire file and view the results without leaving the editor. Because of the transaction scope of the file, at the end of each evaluation I'm returned back to a clean slate and don't have to worry about cleaning things up or guarding table creation with "drop if exists" or similar.

To use this kind of workflow the following skeleton is necessary:


begin transaction;
pragma foreign_keys = on;

-- everything else is written within the transaction

rollback;

The foreign keys pragma is necessary to enable foreign keys within SQLite.

A Design

I've used a few different kinds of software in the space of manual test case management. None of them have ever been very good but the core features tend to be very similar. A "test case" usually has the following:

name
author
series of steps
suite of related functionality
version information
record of execution with a fixed number of statuses (pass, fail, skip)

It occurs to me that some of these are more free-form than the others and make little sense to "bake into" the schema. In the case of a "suite" for example, it is often the case that a single test might belong to several suites of tests. Consider a test that might be run as a kind of smoke test, while it is run frequently it is also probably a part of a larger integration or release test process. Similarly, a single test often relates to multiple release versions of a product. It occurred to me to abstract these into a general "tag", with no restrictions on tag content or relation to a testcase.

create table tag (
       id integer primary key autoincrement,
       value text not null unique
);

create table testcase (
       id integer primary key autoincrement,
       name text not null,
       author text not null
);

create table tagmap (
       tag_id integer,
       testcase_id integer,
       foreign key(tag_id) references tag(id),
       foreign key(testcase_id) references testcase(id)
);

In this way, the general many-to-many relation "tagmap" supports any array of information you might imagine. Consider a testcase being tagged "version-1.0", "version-1.1", "smoke test", and "candidate for automation". Adding or removing tags has no real bearing on the content of the testcase and this schema should cleanly separate the two.

insert into tag (value) values
       ('version-1.0'),
       ('version-1.1'),
       ('smoke test'),
       ('candidate for automation');

insert into testcase (name, author) values
       ('login as admin', 'alyssa p. hacker'),
       ('modify existing entry', 'ben bitdiddle'),
       ('delete existing entry', 'optimus prime');

insert into tagmap (tag_id, testcase_id) values
       (1, 2),
       (2, 2),
       (4, 1);

I am interested to learn more about SQLite's JSON support, so I've decided to write the queries as if I were writing a web API response directly. A likely query for the tagged format above would be to return the list of tags associated with each testcase:

with allTests as (
     select json_object('name', name, 'tags', json_group_array(value)) item
       from testcase
  left join tagmap on tagmap.testcase_id = testcase.id
  left join tag on tagmap.tag_id = tag.id
   group by testcase.id
) select json_group_array(item)
    from allTests;

The result is an array of objects, each made up of a "name" and an array of tags:

[
    {
        "name": "login as admin",
        "tags": [
            "candidate for automation"
        ]
    },
    {
        "name": "modify existing entry",
        "tags": [
            "version-1.0",
            "version-1.1"
        ]
    },
    {
        "name": "delete linked entry",
        "tags": [
            null
        ]
    }
]

The unfortunate part of this particular query is the handling of untagged testcases. In order to include those cases without tags I've used a LEFT JOIN, but as a result the NULL is used as a value when aggregating into an array. The result is untagged tests have a tags array with a single null in it, where my preference would have been to produce an empty array instead.

Edit: I've actually thought of a way to avoid the array of a single null, but I'll warn you, it is a little ugly. The solution is to check with a case statement and fallback to a default empty array rather than aggregating:

with allTests as (
     select json_object('name', name, 
                        'tags', case
                                when value is null then
                                     json_array()
                                else json_group_array(value)
                                end) item
       from testcase
  left join tagmap on tagmap.testcase_id = testcase.id
  left join tag on tagmap.tag_id = tag.id
   group by testcase.id
) select json_group_array(item)
    from allTests;


[
    {
        "name": "login as admin",
        "tags": [
            "candidate for automation"
        ]
    },
    {
        "name": "modify existing entry",
        "tags": [
            "version-1.0",
            "version-1.1"
        ]
    },
    {
        "name": "delete linked entry",
        "tags": []
    }
]

The next piece is the actual instruction information that would constitute a testcase. I've seen systems that lay out an action-result pair per-step, but I've found that awkward enough in the past that I prefer a simpler ordered series of steps.

create table step (
       testcase_id integer,
       description text,
       ordinal integer,
       foreign key(testcase_id) references testcase(id),
       primary key(testcase_id, ordinal)
);

The above is a basic many-to-one design that imposes only a key constraint on the ordinal (step number) and testcase identifier. My thinking here is that the ordinal would be implied during the writing and not explicitly written (imagine an array of steps or an in-order table on screen).

insert into step (testcase_id, description, ordinal) values
       (1, 'prepare the missiles', 1),
       (1, 'launch the missiles', 2),
       (1, 'await victory', 3),
       (1, 'celebrate', 4),
       (2, 'plan', 1),
       (2, 'win', 2);

with orderedSteps as (
     select testcase_id as id,
            description
       from step
   order by ordinal
) select json_object('name', testcase.name,
                     'steps', json_group_array(orderedSteps.description))
    from orderedSteps
    join testcase using(id)
   where id = 1
group by name;

The result of this query is a single JSON object with the test name and the related steps, in-order:

{
    "name": "login as admin",
    "steps": [
        "prepare the missiles",
        "launch the missiles",
        "await victory",
        "celebrate"
    ]
}

Finally, I think the last really necessary piece is a means to record testcase executions. Rather than make a record on the testcase itself, I've laid out another many-to-one table with some supplementary information linking testcases along with a new table "status". While I had considered making the status information another tag, I couldn't come to a design that I liked as it seemed to pollute the idea of a tag being metadata on a testcase. I think what I have isn't too bad, because of how rarely the number of execution outcomes will change.

create table status (
       id integer primary key autoincrement,
       name text unique
);

create table execution (
       execution_id integer primary key autoincrement,
       testcase_id integer,
       status_id integer,
       note text not null default '',
       tester text,
       ts timestamp default current_timestamp,
       foreign key(testcase_id) references testcase(id),
       foreign key(status_id) references status(id)
);

insert into status (name) values
       ('PASS'),
       ('FAIL'),
       ('SKIP');

insert into execution (testcase_id, status_id, tester) values
       (1, 2, 'A. Hacker'),
       (1, 1, 'B. Bitdiddle'),
       (2, 3, 'M. Manager');

I don't actually have too many ideas for how executions would be queried. In the past my interest has been limited to something like "I want to know all the tests that failed or were skipped for a given release". Answering that question is totally possible, but not too interesting, just a little gnarly for the number of joins and aggregations:

with executionRecords as (
    select testcase.id, testcase.name testname, execution.tester, status.name statusname
      from execution
      join status on execution.status_id = status.id
      join testcase on execution.testcase_id = testcase.id
 left join tagmap on tagmap.testcase_id = testcase.id
 left join tag on tagmap.tag_id = tag.id
     where (status.name = 'FAIL' or status.name = 'SKIP')
     and tag.value = 'version-1.0'
) select json_group_array(json_object('name', testname,
                                      'tester', tester,
                                      'status', statusname))
    from executionRecords;

[
    {
        "name": "modify existing entry",
        "tester": "M. Manager",
        "status": "SKIP"
    }
]

User Interface Considerations

I got as far as mocking up an HTML interface for what a "new testcase" form might look like before running up against my self-imposed time limit. There are a few things that I haven't entirely worked out, like how to handle a large number of tags. I think the information density isn't too much of a problem but it seems likely there would need to be a means to group and order tags beyond a basic lexographic ordering.

Missing

I haven't thought of a way to build compound tag queries that I really like. It seems inevitable that a search for things like (version-1.0 OR version-1.1) AND candidate for automation would arise in short order. While it isn't very hard to write this as a SQL query against the given schema I don't know how to make an interface less like writing SQL and more intuitive.

Thoughts

This has been a fun exercise, albeit one I probably won't continue with. I've satisfied my own curiousity about designing a workable system but because I don't work with anything like this much any more I can't see much point in building it up further. I'm happy with having learned a bit more about the SQLite support for JSON and while it is pretty nifty, it is a little raw for writing an entire application I think. A more traditional controller layer to take the tuple results and build data objects still seems easier for a full application.

In terms of workflow, this is a nice way to develop within SQL and I would recommend it. The whole thing is about 100 lines of code and came together very fast, so much so that I probably spent more brain power trying to think of different user interfaces that I might use and then discarded. As I make my way through my list of software to try and write I'll need to keep this in mind, I think building a core set of technologies with which ideas come together readily is a big part of being an experienced programmer.

One last thing that occurred to me was how similar this design would be to a recipe database. There are a few obvious differences, for example, recipes inevitably have large static assets (food photos) associated with them and there probably isn't much need to record recipe "executions". I'm now thinking about modifying this design into something like a personal recipe repository, but not today!