Categories and Subcategories

Categories and Subcategories

The adjacency model

The fundamental structure of the adjacency model is a one-to-many relationship between a parent entry and its child entries. As with any one-to-many relationship, the child entries carry a foreign key to their parent. What makes the adjacency model different is that the parent and child entries are both stored in the same table.

create table categories
( id       integer     not null  primary key 
, name     varchar(37) not null
, parentid integer     null
, foreign key parentid_fk (parentid) 
      references categories (id)

Here’s some sample data that might populate this table, and we should be able to get an idea of the parent-child relationships (if not grasp the entire hierarchy) just by looking at the data:

id name parentid
1 animal NULL
2 vegetable NULL
3 mineral NULL
4 doggie 1
5 kittie 1
6 horsie 1
7 gerbil 1
8 birdie 1
9 carrot 2
10 tomato 2
11 potato 2
12 celery 2
13 rutabaga 2
14 quartz 3
15 feldspar 3
16 silica 3
17 gypsum 3
18 hunting 4
19 companion 4
20 herding 4
21 setter 18
22 pointer 18
23 terrier 18
24 poodle 19
25 chihuahua 19
26 shepherd 20
27 collie 20

Terms commonly used with the adjacency model include tree, root, node, subtree, leaf, path, depth and level. There can be one or more trees in the table, and the parent foreign key is NULL for each tree’s root node. A root node is therefore at the “top” of its tree. A node is any entry, while a leaf is any node that has no children, i.e. for which there exists no other node having that node as its parent. A subtree is the portion of the tree “under” any node. The depth of a subtree is the maximum number of levels of subtree beneath that node. These may not be official terminology definitions, but they work for me.

Why is it called a tree when it grows down from the “root” which is at the top? Mere convention.

Now let’s see how a tree or hierarchy can be used to implement a category/subcategory structure.

Working with categories and subcategories

Using the adjacency model to implement categories and subcategories can be reduced to two simple steps:

  1. manage the hierarchical data
  2. display the hierarchical data

Managing the hierarchy is nothing special. Just look again at the table layout. There’s a primary key column (id) and a foreign key referencing it (parentid). Other than that, it’s a dead simple table. Use INSERT, UPDATE, and DELETE as with any other table. Whether we actually declare the foreign key on parentid, which is necessary for referential integrity, is secondary to the basic design. (Referential integrity means that the parent row should exist before the child row referencing it is inserted, and so on. See the article Relational Integrity in the Resources below.)

Displaying the hierarchy is challenging, but not difficult. Categories and subcategories can be handled in HTML in many ways. Current best practice is to use nested unordered lists. For further information, see Listamatic: one list, many options in the Resources below.

Displaying all categories and subcategories: site maps and navigation bars

To display the hierarchy, we must first retrieve it. The following method involves using as many LEFT OUTER JOINs as necessary to cover the depth of the deepest tree. For our sample data, the deepest tree has four levels, so the query requires four self-joins. Each join goes “down” a level from the node above it. The query begins at the root nodes.

select  as root_name
     , as down1_name
     , as down2_name
     , as down3_name
  from categories as root
left outer
  join categories as down1
    on down1.parentid =
left outer
  join categories as down2
    on down2.parentid =
left outer
  join categories as down3
    on down3.parentid =
 where root.parentid is null
    by root_name 
     , down1_name 
     , down2_name 
     , down3_name

Notice how the WHERE clause ensures that only paths from the root nodes are followed. This query produces the following result set:

root_name down1_name down2_name down3_name
animal birdie NULL NULL
animal doggie companion chihuahua
animal doggie companion poodle
animal doggie herding collie
animal doggie herding shepherd
animal doggie hunting pointer
animal doggie hunting setter
animal doggie hunting terrier
animal gerbil NULL NULL
animal horsie NULL NULL
animal kittie NULL NULL
mineral feldspar NULL NULL
mineral gypsum NULL NULL
mineral quartz NULL NULL
mineral silica NULL NULL
vegetable carrot NULL NULL
vegetable celery NULL NULL
vegetable potato NULL NULL
vegetable rutabaga NULL NULL
vegetable tomato NULL NULL

Each row in the result set represents a distinct path from a root node to a leaf node. Notice how the LEFT OUTER JOIN, when extended “below” the leaf node in any given path, returns NULL (representing the fact that there was no node below that node, i.e. satisfying that join condition).

As we can see, this result set contains all our original categories and subcategories. If the categories and subcategories are being displayed on a web site, this query can therefore be used to generate the complete site map. An abbreviated query, that goes down only a certain number of levels from the roots, regardless of whether there may be nodes at deeper levels, can be used for the site’s navigation bar.

We can display this sample data using nested unordered lists like this:

  • animal
    • birdie
    • doggie
      • companion
        • chihuahua
        • poodle
      • herding
        • collie
        • shepherd
      • hunting
        • pointer
        • setter
        • terrier
    • gerbil
    • horsie
    • kittie
  • mineral
    • feldspar
    • gypsum
    • quartz
    • silica
  • vegetable
    • carrot
    • celery
    • potato
    • rutabaga
    • tomato

What’s the easiest way to transform the result set into the nested ULs? In ColdFusion, we use nested CFOUTPUT tags, with the GROUP= parameter on all but the innermost list. Very straightforward indeed. In other scripting languages, as the saying goes, your mileage may vary. Take comfort in the fact that once you’ve coded it, you will never have to change your site map page again.

What if the hierarchy is more than, say, three or four levels deep? What if it’s fifteen levels deep? My response to this question is threefold.

First, a query with fifteen self-joins may be a little more tedious to code but most assuredly will not present any difficulty to your database engine.

Second, in certain databases such as Oracle and DB2, recursion is built in, so you can go as many levels deep as you wish—although don’t fool yourself, the coding required to display an arbitrary number of levels is no picnic either. Do not make the mistake of simulating recursion by coding a script module that calls itself, because from the database perspective, this is a series of calls (a query in a loop) and the performance will reflect this.

Thirdly, if you have a tree that goes more than three or four levels deep, you may have difficulty conveying this structure satisfactorily in a visual way. You may want to go back and re-think how you expect your users to actually navigate through the hierarchy. Sometimes the best solution is simply to show no more than three levels, with some sort of visual clue that there are further levels below the nodes shown.

The path to the root: the breadcrumb trail

Retrieving the path from any given node, whether it is a leaf node or not, to the root at the top of its path, is very similar to the site map query. Again, we use LEFT OUTER JOINs, but this time we go “up” the tree from the node, rather than “down.”

select as node_name 
     , as up1_name 
     , as up2_name 
     , as up3_name 
  from categories as node
left outer 
  join categories as up1 
    on = node.parentid  
left outer 
  join categories as up2
    on = up1.parentid  
left outer 
  join categories as up3
    on = up2.parentid
    by node_name

Here’s the result set from this query:

node_name up1_name up2_name up3_name
birdie animal NULL NULL
carrot vegetable NULL NULL
celery vegetable NULL NULL
chihuahua companion doggie animal
collie herding doggie animal
companion doggie animal NULL
doggie animal NULL NULL
feldspar mineral NULL NULL
gerbil animal NULL NULL
gypsum mineral NULL NULL
herding doggie animal NULL
horsie animal NULL NULL
hunting doggie animal NULL
kittie animal NULL NULL
pointer hunting doggie animal
poodle companion doggie animal
potato vegetable NULL NULL
quartz mineral NULL NULL
rutabaga vegetable NULL NULL
setter hunting doggie animal
shepherd herding doggie animal
silica mineral NULL NULL
terrier hunting doggie animal
tomato vegetable NULL NULL
vegetable NULL NULL NULL

Here each row in the result set is a single path, one for every node in the table. On a web site, such a path is often called a breadcrumb trail. (This name is somewhat misleading, because it suggests that it might represent how the visitor arrived at the page, which is not always the case. The accepted meaning of breadcrumb is simply the path from the root.)

In practice, we’d have a WHERE clause that would specify a single node, so in effect, the results above are all of the breadcrumbs in the table.

To display a breadcrumb trail in the normal fashion, from root to node, just display the result set columns in reverse order, and ignore the nulls. For example, let’s say we run the above query for the category “companion” and get this:

node_name up1_name up2_name up3_name
companion doggie animal NULL

The breadcrumb would look like this:

Simple, eh?


Published by


Tôi là Lê Thanh Tuấn, và tôi chia sẻ những điều mình cho rằng nó là thú vị, hay giúp ích cho bạn!

Leave a Reply