# A New SQLite Function

2021-01-24I was browsing through public bike-share data for the city of Pittsburgh and thinking about interesting ways of slicing and dicing through it. As a part of that I fell into a rabbit hole of previously unexplored SQLite novelties.

## Prelude

I haven't got too much to say about the data itself, I got so
distracted with all of this that I haven't done much beyond
prepare to look at it. I pulled down some Excel spreadsheets
(`.xlsx`

files)
from healthyridepgh.com. It
happens that I didn't have anything to view the files with on my
computer so I downloaded the gnumeric CLI tools,
including `ssconvert`

, which will output CSV files.

`$ ssconvert 'Healthy Ride Station Locations 2019 Q1.xlsx' stations.csv`

The station information will be imported into a table `stations`

:

```
$ sed 1d stations.csv > raw-stations.csv
$ sqlite3 bikeshare.db
>> create table stations (
id integer,
name string,
racks integer,
latitude float,
longitude float,
primary key (id)
);
>> .separator ","
>> .import raw-stations stations
```

This all worked and is generally uninteresting. What occurred to me
while ingesting the data was how it might be interesting to include
distance information between stations. Calculating such a distance
is possible through the use of
the Haversine
formula and some interesting historical background is provided
in the
article *10
Secret Trig Functions Your Math Teachers Never Taught You*.
With
that done it began to look like calculating distances given two
latitude, longitude pairs was both feasible and well-trodden. All
that was required was access to some basic trigonometry functions
which it so happens
are included
in SQLite.

## The First Digression

So it turns out that isn't exactly true. If you read that last link more closely than I did you'll notice there's a big ** DRAFT ** notice on the page. Many of the math functions were added last month, in this commit. They haven't made it into a release yet and are hidden behind a compile-time flag. I know, I'll just compile SQLite myself with the flag enabled! How bad could it be?

### Not Bad At All

It turns out this was pretty painless, there are a number of pages
in the SQLite wiki
about compiling
, all I needed was the flag name and the existing compiler
flags. Rather than do something smart, I just downloaded the source
code and ran `make`

, which printed the `gcc`

arguments. After it finished I re-ran the same commands with my flag
(`-DSQLITE_ENABLE_MATH_FUNCTIONS`

).

With this done, I was ready. Fire up my custom build of the SQLite interpreter and see what we can do. Here I'm using a simplified form based on the spherical law of cosines:

```
>> select acos(
sin(radians(40.441326)) *
sin(radians(40.440877)) +
cos(radians(40.441326)) *
cos(radians(40.440877)) *
cos(radians(-80.00308) -
radians(-80.004679))
) * 6371000;
144.235870242059
```

I ended up verifying that number was plausible or at least mostly correct via Google Earth. I won't say much beyond the fact that Google Earth was harder to use than a custom build of SQLite.

## The Second Digression

While I was stoked to get all of that working, I realized pretty shortly that I didn't have a good way to encapsulate such an ungainly formula. SQLite doesn't have user defined functions like Postgres, opting instead for integration with applications and loadable modules. What I really wanted was something akin to the following in Postgres:

```
CREATE FUNCTION sdist(float, float, float, float) RETURNS float
AS 'select acos(sin($1) * sin($3) +
cos($1) * cos($3) * cos($4 - $2)) * 6371000;'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
```

More reading and I had to conclude that there wasn't going to be a direct equivalent for SQLite.

### I Would Have To Write A Loadable Extension

Now, hear me out, this isn't actually so bad. I had put off digging into the C API because I figured it would be terrible and error-prone. This was entirely unfair to SQLite given how well everything else works. In fact, after some reading on the wiki and some serious inspiration from their rot13 example I was able to come up with something.

What I really realized was how direct the mapping was from those
functions recently added to SQLite and those available
in `math.h`

, which I ended up using. Specifically, the
translation from the above SQL into C was basically one-to-one:

```
double sdist(double lat1, double lon1, double lat2, double lon2) {
return acos(sin(lat1) * sin(lat2) +
cos(lat1) * cos(lat2) * cos(lon2 - lon1)
) * 6371000; // approximate radius of earth
}
```

### The Full Extension

```
#include "sqlite3ext.h"
SQLITE_EXTENSION_INIT1
#include <assert.h>
#include <string.h>
#include <math.h>
/* distance via spherical law of cosines, result is meters */
double sdist(double lat1, double lon1, double lat2, double lon2) {
return acos(sin(lat1) * sin(lat2) +
cos(lat1) * cos(lat2) * cos(lon2 - lon1)
) * 6371000;
}
static void sdistfunc(
sqlite3_context *context,
int argc,
sqlite3_value **argv
){
int i;
assert( argc==4 );
for(i=0; i<argc; i++) {
if( sqlite3_value_type(argv[i])==SQLITE_NULL ) {
return;
}
}
double lat1 = sqlite3_value_double(argv[0]);
double lon1 = sqlite3_value_double(argv[1]);
double lat2 = sqlite3_value_double(argv[2]);
double lon2 = sqlite3_value_double(argv[3]);
double distance = sdist(lat1, lon1, lat2, lon2);
sqlite3_result_double(context, distance);
}
int sqlite3_sdist_init(
sqlite3 *db,
char **pzErrMsg,
const sqlite3_api_routines *pApi
){
int rc = SQLITE_OK;
SQLITE_EXTENSION_INIT2(pApi);
(void)pzErrMsg; /* Unused parameter */
rc = sqlite3_create_function(db, "sdist", 4,
SQLITE_UTF8|SQLITE_INNOCUOUS|SQLITE_DETERMINISTIC,
0, sdistfunc, 0, 0);
return rc;
}
```

Compiling the extension was easy, especially since I already had a local copy of the source code:

`$ gcc -g -fPIC -shared sdist.c -o sdist.so -lm`

With all of that done, it was a cinch to actually use the thing:

```
$ ./sqlite3 bikeshare.db
SQLite version 3.35.0 2021-01-18 12:35:16
Enter ".help" for usage hints.
sqlite> .load ./sdist.so
sqlite> select sdist(radians(40.441326), radians(-80.004679),
...> radians(40.440877), radians(-80.00308));
144.235870242059
```

## Thoughts

I am sure I will acclimate to it eventually but for now I remain amazed at how well executed the entire SQLite project is. The documentation is thorough and full of useful examples, not just of use but extending and building. Despite my inexperience with C and in extending SQLite I was able to build a working extension to implement a spherical distance function in an afternoon. This was the most fun I've had hacking in a while.

Having played around with it I think it probably makes sense to bake the conversion to radians into the distance function, if only to ease the amount of typing involved. Also, the radians function is one of those math utilities that hasn't yet made it into a release of SQLite.

One final demo because I can't resist playing with it, before I realized how fast it was going to be in practice I had thought I would pre-compute the distances for each station pair. I was imagining something like:

```
sqlite> create table pairdistance (
from_id integer,
to_id integer,
distance float,
foreign key(from_id) references stations(id),
foreign key(to_id) references stations(id),
primary key(from_id, to_id)
);
sqlite> insert into pairdistance (from_id, to_id, distance)
select s1.id, s2.id,
sdist(radians(s1.latitude), radians(s1.longitude),
radians(s2.latitude), radians(s2.longitude))
from stations s1 join stations s2;
sqlite> select A.name start, B.name end, distance
from pairdistance
left join stations A on pairdistance.from_id=A.id
left join stations B on pairdistance.to_id=B.id;
start end distance
------------------------ ------------------------------------------- ----------------
Liberty Ave & Stanwix St Liberty Ave & Stanwix St 0.0
Liberty Ave & Stanwix St Forbes Ave & Market Square 144.235870242059
Liberty Ave & Stanwix St Third Ave & Wood St 349.420452282805
Liberty Ave & Stanwix St First Ave & Smithfield St (Art Institute) 585.802108399069
Liberty Ave & Stanwix St First Ave & B St (T Station) 881.457115966137
Liberty Ave & Stanwix St Forbes Ave & Grant St 658.460591846865
...
```

This results in a single lookup table of 11,664 rows with the Cartesian product of each possible pairing and the corresponding distance.