indexpost archiveatom feed

# A Minor SQL Epiphany

2021-07-15

I previously delved into using my own loadable module to circumvent the lack of user defined functions in SQLite. A more typical, relational solution occurred to me today and I thought I should write it down before I forget.

## The Problem

I started with the Haversine formula, populated here with sample data:

``````select acos(
sin(radians(40.441326)) * -- latitude 1
sin(radians(40.440877)) + -- latitude 2
cos(radians(40.441326)) * -- latitude 1
cos(radians(40.440877)) * -- latitude 2
cos(radians(-80.00308) -  -- longitude 2
radians(-80.004679))  -- longitude 1
) * 6371000;``````

I went in search of a user-defined function in order to package up the slightly complicated query that I did not want to use directly (and repeatedly) in the analysis I was interested in.

## Previous Solution

I wrote a function `sdist` for "spherical distance" (admittedly not a very good name) inside of a loadable module (shared C libary) that took as parameters the two latitude and longitude pairs.

## New Solution

I have no excuse for why it didn't occur to me the first time I wrote all this but there is an obvious relational database construct for this scenario: a view.

``````create view distances (from_id, to_id, distance) as

select s1.id, s2.id, acos(
) * 6371000
from stations s1, stations s2;``````

I think this is a much cleaner solution. Not only does it maintain the entire solution within plain SQL and the database directly; it still provides the ability to precompute a look-up table for each possible pairing:

``````create table precomputed_distance as
select * from distances;``````

## Nice and Simple

Doing this got me thinking more about a book I read, A Curious Moon. In it the author describes the kind of "good hygiene" to be practiced with owning and operating a database. I figured I could better document the entire process that led up to the actual analysis by producing a single SQL (well, SQLite) document that manages the ingestion and munging of data:

``````.import --csv 2019Q1_stations.csv raw_stations_2019Q1
.import --csv 2019Q1_rentals.csv raw_rentals_2019Q1

create table stations (
id integer,
name string,
racks integer,
latitude float,
longitude float,
primary key (id)
);

insert into stations select * from raw_stations_2019Q1;

-- there is one record where a latitude is incorrectly negative, let's
-- fix it:
update stations
set latitude = abs(latitude)
where latitude < 0;

create table rides (
id integer,
starttime datetime,
endtime datetime,
bikeid integer,
duration integer,
from_id integer,
to_id integer,
usertype string,
foreign key(from_id) references station(id),
foreign key(to_id) references station(id),
primary key(id)
);

insert into rides
select "Trip id",
"Starttime",
"Stoptime",
"Bikeid",
"Tripduration",
"From station id",
"To station id",
"Usertype"
from raw_rentals_2019Q1;

create view distances (from_id, to_id, distance) as

select s1.id, s2.id, acos(