Setting up and mastering radius search in PostgreSQL (9.1)

Radius search in PostgreSQL may come in employing a light and/or a much more sophisticated version. This article discusses the light one, namely the cube and the earth distance extensions, most probably sufficient for the web user’s getting here and there requirements. earth distance, depending on cube, assumes the earth to be perfectly spherical, anyone demanding a higher accuracy level, especially for the mountainous parts, may take a look at the PostGIS project.

Although radius search, the light variety, will be up fast and performing well, there may be some mantrap around, for the ones who prefer to read documentation the easy way too. First of all, PostgreSQL: Documentation: 9.1: earthdistance indicates that the point-based earth distance calculation is hard-wired to statute miles in units. You may use this circumstance to your advantage, like datachomp did in Radius Queries in Postgres, as long as you know what you’re doing. Second to that, taking on the alternate cube-based earth distance calculation, the earth_box function, accepting a lat/long and a radius on input, may return locations farther than the actual radius given (documented alike). This is because earth_box, as the name implies, still handles a box geometry on the idealized sphere (and not some higher order circle surface). But more on that below.

Installation

Unless you already did, catch the PostgreSQL contrib packages for your linux distribution (thanks to rodush, Postgres and earthdistance extension):

# check up
apt-cache search postgres | grep contrib
postgresql-contrib-9.1 - additional facilities for PostgreSQL
postgresql-contrib - additional facilities for PostgreSQL (supported version)

# install
apt-get install postgresql-contrib postgresql-contrib-9.1

Register the extensions with your database (here comes Johann du Toit, Searching in a Radius using Postgres, watch out for the code correction for earth_box(ll_to_earth(lat,lng), radius) being in the comments only):

su - postgres
psql -d <database>
CREATE EXTENSION cube;
CREATE EXTENSION earthdistance;

Set up index support in precalculating the result of ll_to_earth(), where xlalo is a point datatype carrying lat and long respectively:

CREATE INDEX xort_ll_to_earth_idx on xort
  USING gist(ll_to_earth(xlalo[0], xlalo[1]));

Try and check

Ready to launch now, I’m just taking some point lat/long from a table of locations as a seed:

select xort_id, xtown, xlalo
from xort
  where xlalo is not null
limit 1
;
 xort_id |          xtown          |         xlalo         
---------+-------------------------+-----------------------
     520 | Erlau                   | (51.007124,12.931024) 

Next on, I’m trying earth_box() to find any other table location within a radius of 6km:

select xort_id, xtown, xlalo,
  earth_distance(ll_to_earth(51.007124, 12.931024),
    ll_to_earth(xlalo[0], xlalo[1])) as dist
from xort
  where xlalo is not null
    and earth_box(ll_to_earth(51.007124, 12.931024), 6000) @>
      ll_to_earth(xlalo[0], xlalo[1])
order by dist
limit 20
;
 xort_id |          xtown          |         xlalo         |       dist       
---------+-------------------------+-----------------------+------------------
     520 | Erlau                   | (51.007124,12.931024) |                0
     529 | Mittweida               | (50.983414,12.981557) | 4416.05637628265
     535 | Schweikershain          | (51.046584,12.950253) | 4594.37765890373
     534 | Ringethal bei Mittweida | (51.005618,12.999024) | 4766.10077622375
     517 | Altmittweida            | (50.964298,12.948988) | 4930.79953490496
    1787 | Mittweida               | (51.028585,12.994964) | 5075.12517745445
     526 | Königshain              | (50.973175,12.878008) | 5299.30926697392
     539 | Topfseifersdorf         | (51.03454,12.85753)   | 5983.27938106796
     541 | Wiederau                | (50.97385,12.840986)  | 7315.97246135706
     524 | Geringswalde            | (51.076348,12.905077) | 7917.13030883707
     523 | Geringswalde            | (51.07474,12.873538)  | 8534.98538185683

Uuhm, surprising, right? There are three locations showing up a distance greater then the 6km on input!!? Well, RTFM dude, includes me, the earth_box() documentation obviously reads:

earth_box()… Returns a box suitable for an indexed search using the cube operator for points within a given great circle distance of a location. Some points in this box are further than the specified great circle distance from the location, so a second check using earth_distance should be included in the query.

In a visual sketch, using a circle and a box shape in 2D for simplicity (and the incredible hamstermap site, Quickmap), the correlation may become much more apparent. Some locations still fit the box but not the circle, the good old quadrature of the circle affair.

Ok, following the advice and just supersetting the above query with a select wrapper, applying the suggested where accordingly delivers the expected result:

select t1.* from (
  select ...
) t1 where dist <= 6000
order by dist
limit 20
;
 xort_id |          xtown          |         xlalo         |       dist       
---------+-------------------------+-----------------------+------------------
     520 | Erlau                   | (51.007124,12.931024) |                0
     529 | Mittweida               | (50.983414,12.981557) | 4416.05637628265
     535 | Schweikershain          | (51.046584,12.950253) | 4594.37765890373
     534 | Ringethal bei Mittweida | (51.005618,12.999024) | 4766.10077622375
     517 | Altmittweida            | (50.964298,12.948988) | 4930.79953490496
    1787 | Mittweida               | (51.028585,12.994964) | 5075.12517745445
     526 | Königshain              | (50.973175,12.878008) | 5299.30926697392
     539 | Topfseifersdorf         | (51.03454,12.85753)   | 5983.27938106796

Seems double wired, huh? Why not safe oneself from using earth_box() at all since earth_distance() needs to be used anyway? Let’s have a look at the plans for a statement without / with earth_box():

explain
  select xort_id, xtown, xlalo,
    earth_distance(ll_to_earth(51.007124, 12.931024),
      ll_to_earth(xlalo[0], xlalo[1])) as dist
  from xort
    where xlalo is not null
     and earth_distance(ll_to_earth(51.007124, 12.931024),
       ll_to_earth(xlalo[0], xlalo[1])) < 6000
  order by dist
  limit 20
;
Limit  (cost=1127.25..1127.30 rows=20 width=30)
 ->  Sort  (cost=1127.25..1128.50 rows=501 width=30)
       Sort Key: (sec_to_gc(cube_distance('(3911518.35458477, 898086.643280137, 4957266.54307919)'::cube, (ll_to_earth(xlalo[0], xlalo[1]))::cube)))
       ->  Seq Scan on xort  (cost=0.00..1113.92 rows=501 width=30)
             Filter: ((xlalo IS NOT NULL) AND (sec_to_gc(cube_distance('(3911518.35458477, 898086.643280137, 4957266.54307919)'::cube, (ll_to_earth(xlalo[0], xlalo[1]))::cube)) < 6000::double precision))
select t1.* from (
  select ...
) t1 where dist <= 6000 order by dist limit 20
;
Limit  (cost=13.64..13.64 rows=1 width=30)
 ->  Sort  (cost=13.64..13.64 rows=1 width=30)
       Sort Key: (sec_to_gc(cube_distance('(3911518.35458477, 898086.643280137, 4957266.54307919)'::cube, (ll_to_earth(xort.xlalo[0], xort.xlalo[1]))::cube)))
       ->  Bitmap Heap Scan on xort  (cost=4.52..13.63 rows=1 width=30)
             Recheck Cond: ('(3905518.354806, 892086.64350137, 4951266.54330042),(3917518.35436354, 904086.643058903, 4963266.54285795)'::cube @> (ll_to_earth(xlalo[0], xlalo[1]))::cube)
             Filter: ((xlalo IS NOT NULL) AND (sec_to_gc(cube_distance('(3911518.35458477, 898086.643280137, 4957266.54307919)'::cube, (ll_to_earth(xlalo[0], xlalo[1]))::cube)) <= 6000::double precision))
             ->  Bitmap Index Scan on xort_ll_to_earth_idx  (cost=0.00..4.52 rows=2 width=0)
                   Index Cond: ('(3905518.354806, 892086.64350137, 4951266.54330042),(3917518.35436354, 904086.643058903, 4963266.54285795)'::cube @> (ll_to_earth(xlalo[0], xlalo[1]))::cube)

Got it? Cutting down on earth_box() also cuts you off from a potential index usage. Not a good perspective, right?

Have fun (with PostgreSQL)

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.