indexpost archiveatom feed syndication feed icon

A New SQLite Function

2021-01-24

I was browsing through public bike-share data for the city of Pittsburgh and thinking about interesting ways of slicing and dicing through it. As a part of that I fell into a rabbit hole of previously unexplored SQLite novelties.

Prelude

I haven't got too much to say about the data itself, I got so distracted with all of this that I haven't done much beyond prepare to look at it. I pulled down some Excel spreadsheets (.xlsx files) from healthyridepgh.com. It happens that I didn't have anything to view the files with on my computer so I downloaded the gnumeric CLI tools, including ssconvert, which will output CSV files.

$ ssconvert 'Healthy Ride Station Locations 2019 Q1.xlsx' stations.csv

The station information will be imported into a table stations:

$ sed 1d stations.csv > raw-stations.csv
$ sqlite3 bikeshare.db
>> create table stations (
  id integer,
  name string,
  racks integer,
  latitude float,
  longitude float,
  primary key (id)
);
>> .separator ","
>> .import raw-stations stations

This all worked and is generally uninteresting. What occurred to me while ingesting the data was how it might be interesting to include distance information between stations. Calculating such a distance is possible through the use of the Haversine formula and some interesting historical background is provided in the article 10 Secret Trig Functions Your Math Teachers Never Taught You. With that done it began to look like calculating distances given two latitude, longitude pairs was both feasible and well-trodden. All that was required was access to some basic trigonometry functions which it so happens are included in SQLite.

The First Digression

So it turns out that isn't exactly true. If you read that last link more closely than I did you'll notice there's a big ** DRAFT ** notice on the page. Many of the math functions were added last month, in this commit. They haven't made it into a release yet and are hidden behind a compile-time flag. I know, I'll just compile SQLite myself with the flag enabled! How bad could it be?

Not Bad At All

It turns out this was pretty painless, there are a number of pages in the SQLite wiki about compiling , all I needed was the flag name and the existing compiler flags. Rather than do something smart, I just downloaded the source code and ran make, which printed the gcc arguments. After it finished I re-ran the same commands with my flag (-DSQLITE_ENABLE_MATH_FUNCTIONS).

With this done, I was ready. Fire up my custom build of the SQLite interpreter and see what we can do. Here I'm using a simplified form based on the spherical law of cosines:

>> select acos(
    sin(radians(40.441326)) *
    sin(radians(40.440877)) +
    cos(radians(40.441326)) *
    cos(radians(40.440877)) *
    cos(radians(-80.00308) -
        radians(-80.004679))
    ) * 6371000;
144.235870242059

I ended up verifying that number was plausible or at least mostly correct via Google Earth. I won't say much beyond the fact that Google Earth was harder to use than a custom build of SQLite.

The Second Digression

While I was stoked to get all of that working, I realized pretty shortly that I didn't have a good way to encapsulate such an ungainly formula. SQLite doesn't have user defined functions like Postgres, opting instead for integration with applications and loadable modules. What I really wanted was something akin to the following in Postgres:

CREATE FUNCTION sdist(float, float, float, float) RETURNS float
    AS 'select acos(sin($1) * sin($3) +
                    cos($1) * cos($3) * cos($4 - $2)) * 6371000;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

More reading and I had to conclude that there wasn't going to be a direct equivalent for SQLite.

I Would Have To Write A Loadable Extension

Now, hear me out, this isn't actually so bad. I had put off digging into the C API because I figured it would be terrible and error-prone. This was entirely unfair to SQLite given how well everything else works. In fact, after some reading on the wiki and some serious inspiration from their rot13 example I was able to come up with something.

What I really realized was how direct the mapping was from those functions recently added to SQLite and those available in math.h, which I ended up using. Specifically, the translation from the above SQL into C was basically one-to-one:

double sdist(double lat1, double lon1, double lat2, double lon2) {
  return acos(sin(lat1) * sin(lat2) +
              cos(lat1) * cos(lat2) * cos(lon2 - lon1)
              ) * 6371000;    // approximate radius of earth
}

The Full Extension

#include "sqlite3ext.h"
SQLITE_EXTENSION_INIT1
#include <assert.h>
#include <string.h>
#include <math.h>

/* distance via spherical law of cosines, result is meters */
double sdist(double lat1, double lon1, double lat2, double lon2) {
  return acos(sin(lat1) * sin(lat2) +
              cos(lat1) * cos(lat2) * cos(lon2 - lon1)
              ) * 6371000;
}

static void sdistfunc(
  sqlite3_context *context,
  int argc,
  sqlite3_value **argv
){
  int i;

  assert( argc==4 );
  for(i=0; i<argc; i++) {
    if( sqlite3_value_type(argv[i])==SQLITE_NULL ) {
      return;
    }
  }

  double lat1 = sqlite3_value_double(argv[0]);
  double lon1 = sqlite3_value_double(argv[1]);
  double lat2 = sqlite3_value_double(argv[2]);
  double lon2 = sqlite3_value_double(argv[3]);
  double distance = sdist(lat1, lon1, lat2, lon2);
  sqlite3_result_double(context, distance);
}

int sqlite3_sdist_init(
  sqlite3 *db, 
  char **pzErrMsg, 
  const sqlite3_api_routines *pApi
){
  int rc = SQLITE_OK;
  SQLITE_EXTENSION_INIT2(pApi);

  (void)pzErrMsg;  /* Unused parameter */

  rc = sqlite3_create_function(db, "sdist", 4,
                   SQLITE_UTF8|SQLITE_INNOCUOUS|SQLITE_DETERMINISTIC,
                   0, sdistfunc, 0, 0);
  return rc;
}

Compiling the extension was easy, especially since I already had a local copy of the source code:

$ gcc -g -fPIC -shared sdist.c -o sdist.so -lm

With all of that done, it was a cinch to actually use the thing:

$ ./sqlite3 bikeshare.db
SQLite version 3.35.0 2021-01-18 12:35:16
Enter ".help" for usage hints.

sqlite> .load ./sdist.so

sqlite> select sdist(radians(40.441326), radians(-80.004679),
   ...>              radians(40.440877), radians(-80.00308));
144.235870242059

Thoughts

I am sure I will acclimate to it eventually but for now I remain amazed at how well executed the entire SQLite project is. The documentation is thorough and full of useful examples, not just of use but extending and building. Despite my inexperience with C and in extending SQLite I was able to build a working extension to implement a spherical distance function in an afternoon. This was the most fun I've had hacking in a while.

Having played around with it I think it probably makes sense to bake the conversion to radians into the distance function, if only to ease the amount of typing involved. Also, the radians function is one of those math utilities that hasn't yet made it into a release of SQLite.

One final demo because I can't resist playing with it, before I realized how fast it was going to be in practice I had thought I would pre-compute the distances for each station pair. I was imagining something like:

sqlite> create table pairdistance (
  from_id integer,
  to_id integer,
  distance float,
  foreign key(from_id) references stations(id),
  foreign key(to_id) references stations(id),
  primary key(from_id, to_id)
);

sqlite> insert into pairdistance (from_id, to_id, distance)
select s1.id, s2.id, 
       sdist(radians(s1.latitude), radians(s1.longitude), 
             radians(s2.latitude), radians(s2.longitude))
  from stations s1 join stations s2;

sqlite> select A.name start, B.name end, distance
          from pairdistance
     left join stations A on pairdistance.from_id=A.id
     left join stations B on pairdistance.to_id=B.id;
start                     end                                          distance        
------------------------  -------------------------------------------  ----------------
Liberty Ave & Stanwix St  Liberty Ave & Stanwix St                     0.0             
Liberty Ave & Stanwix St  Forbes Ave & Market Square                   144.235870242059
Liberty Ave & Stanwix St  Third Ave & Wood St                          349.420452282805
Liberty Ave & Stanwix St  First Ave & Smithfield St (Art Institute)    585.802108399069
Liberty Ave & Stanwix St  First Ave & B St (T Station)                 881.457115966137
Liberty Ave & Stanwix St  Forbes Ave & Grant St                        658.460591846865
...

This results in a single lookup table of 11,664 rows with the Cartesian product of each possible pairing and the corresponding distance.