Implementing bottom-up path traversal for hierarchical tables

Update: There is a newer and imho better approach available with Bottom-up hierarchical root path query recipe in oracle .

oracle supports hierarchical queries on tables by the start with and connect by clauses. this usually preassumes that the rows of some table are chained by a value in column dad_id of some row pointing to a value in column son_id in a parent row, where son_id also denotes the primary key in the table. the root rows of the tree being set up that way, furthermore have a dedicated value in column dad_id. for positive, evolving numbers (id’s ;-), one may just choose 0 here. so far, a typical hierarchical query for a (male) family tree may look like this:

select son_id, dad_id, son_name from family
start with dad_id = 0 connect by prior son_id = dad_id

there is tons of virtual columns around hierarchical queries because this type of database exploration is actually no longer some plain two-dimensional, column-and-row sql. it is nothing less than graph theory, albeit some very simple subset of it. kind of great stuff is level, sys_connect_by_path, siblings and connect_by_root as well as connect_by_isleaf. with the following query we therefore get the level of this son in the tree, find the most top ancestor id for this son (makes sense for 1+n root rows), find out whether this son has another son already, get a path of names of the family tree and are able to order all siblings according to some self-defined scheme.

select son_id, dad_id, son_name, level, connect_by_root, connect_by_isleaf,
  sys_connect_by_path(son_name, '/') as family_path
from family
start with dad_id = 0 connect by prior son_id = dad_id
order siblings by son_name

this is more than sufficient for standard requirements and a (relatively) small tree-table. however, as the tree-table grows in data/row-size and the queries become more dedicated to a couple of rows, another problem arises when employing sys_connect_by_path. because oracle needs to construct a tree before any other where-clause application, selecting a path for, say, 5 out of 100.000 rows does no longer perform efficiently.

select son_id, dad_id, son_name, family_path
from (
  select son_id, dad_id, son_name, sys_connect_by_path(son_name, '/')
    as family_path
  from family
  start with dad_id = 0 connect by prior son_id = dad_id
) where son_id in (1,2,3,4,5)

remember, the where-clause with the hierarchical query itself is applied before tree-construction and some start with restriction may not make sense for a single or a few root rows.

ok, what we do above is just a simple top-down traversal of the tree. for the 5 out of 100.000 rows requirement, however, it might be more clever to try a bottom-up traversal because this is much more selective having those five son id’s at hand already. this is nothing difficult also, we change direction just by adapting the start with and connect by expressions accordingly.

select son_id, dad_id, son_name, sys_connect_by_path(son_name, '/')
  as family_path
from family
start with son_id in (1,2,3,4,5) connect by prior dad_id = son_id

hhm, now it is fast again but the paths are reversed now, both in order and in the amount of path components (try that out on your data, takes me too much time to prepare a full showcase here). also, we get more than the expected number of rows in the result set, that is, every number of levels for any son id. will that approach take one further?

sort of, iff you just regard this select as a subselection of the actual tree-table and apply another top-down traversal with the connect_by_isleaf restriction … got it? let’s note down the sql first.

select son_id, dad_id, son_name, sys_connect_by_path(son_name, '/')
  as family_path
from (
  select son_id, dad_id, son_name, connect_by_isleaf as cbleaf
  from family
  start with son_id in (1,2,3,4,5) connect by prior dad_id = son_id
) where connect_by_isleaf = 1
start with cbleaf = 1 connect by prior son_id = dad_id

the inner statement does a bottom-up traversal first, revealing the root rows as leafs of the tree by cbleaf = 1. confused about roots being leafs? they are, because we changed traversal direction! the outer statements than does a usual top-down traversal starting at the depicted root rows (cbleaf = 1). it furthermore imposes a where clause that is not applied before tree construction because of the virtual nature of the connect_by_isleaf column.

you are advised to put this code into pl/sql for easy application because one cannot propagate those son id’s into the inner statement from a sql join or whatever. in doing so, iff you attempt to pass the path separator to sys_connect_by_path as a variable, like me, you will probably raise ORA-30003 illegal parameter in SYS_CONNECT_BY_PATH function. oracle simply does not allow any expression as a second parameter to sys_connect_by_path. even a pl/sql constant was not applicable. also, ORA-30004 when using SYS_CONNECT_BY_PATH may hit you when the path separator character already exists as a substring of the son’s names.

have fun!