When relationships are predictable and the paths to traverse them have a known depth, it is pretty simple to create a normalized structure to represent them. Unfortunately, the real world is often more flexible and my data model needs to be just as flexible to accurately represent it. The following are a few approaches for handling flexible relational data that have helped me over the years.

Approaches

Adjacency List

A simple tree structure where a child can only have one parent.

idmanager_idname
1nullErik
2nullKeegan
31Alice
41Bob
52Chuck
62Dave
76Eve

When to use: Your relationships follow a strict hierarchy where each child can only have one parent.

How to query: You could join the table on itself up to N times where N is the max depth of your hierarchy, but I’d rather just do that mapping in code after I pull back all the data that I care about.

Edge List

Supports a graph structure where each nodes are stored in one table and edges (relationships) in another. The edges can be directional or not, you can have multiple between the same node, and you can assign metadata to them.

cities (nodes)

idname
1Los Angeles
2Miami
3Chicago
4New York

distance (edges)

city_id_1city_id_2distance
122,732 mi
132,015 mi
142,789 mi
231,383 mi
241,281 mi
34794 mi

When to use:

  • Children can have more than one parent
  • Relationships aren’t strictly hierarchical
  • You don’t need to care about directionality (you still can)
  • You need multiple relationships between the same nodes
  • Relationships can have different meanings

How to query: Basically the same approach as the adjacency list.

Path Enumeration

A more queryable tree structure where you get full ancestry information without having to traverse the whole branch. Each node is stored with the full path from the root of the tree. The ids in the path are concatenated in sequence with a delimiter.

idpathname
11Erik
21Keegan
31.3Alice
41.4Bob
52.5Chuck
62.6Dave
72.6.7Eve

When to use: You want to filter based on ancestry and you have fairly static data (moving a node changes the path of all descendants).

How to query: In its most primitive form you are doing a string begins-with query. Thankfully many databases support this structure and allow more efficient and standardized queries (see Postgres ltree).

Nested Set

An elegant technique for storing hierarchical relationships that is super fast to query. Each node has a left and right value. All of the descendants for a node will have left and right values that are inside that of their ancestors.

Here is a diagram to help you visualize it.

Nested Set

idleftrightname
116Erik
2714Keegan
323Alice
445Bob
589Chuck
61013Dave
71112Eve

Works well when: You need to impress someone… or if you have very static data and need a quick way to query ancestry and descendants. It is more powerful than Path Enumeration from a query perspective but much more costly to update. Modifying the structure updates every node with a right value greater than the node you are updating.

How to query: To find the descendants of some node you find those with a left (or right) value is between the left and right of the target node. To find the ancestors of some node you find those whose left is less than the target left and right is greater than the target right.

Practical Usage

If I can safely assume a strict hierarchy then I use an Adjacency list paired with Path Enumeration for queryability. The Adjacency list maintains the referential integrity and the Path Enumeration is generated off of it via a trigger. Outside of that, I use an Edge List. Its structure is barely different but you gain a massive amount of flexibility. Sadly there aren’t great tricks for easy querying so I’d end up doing more in code. One day I would love to use Nested Set but I have yet to find a data set large enough and static enough to merit its obscure beauty.

One specific application of these concepts that I have appreciated is using Path Enumeration for authorization. I tie a user’s role to a path and use the path to query the paths on entities they are trying to access. I’ll write more about that in a future post. Happy Hierarching!

Tags:
  1. Database Design

Erik Murphy

Erik is an agile software developer in Charlotte, NC. He enjoys working full-stack (CSS, JS, C#, SQL) as each layer presents new challenges. His experience involves a variety of applications ranging from developing brochure sites to high-performance streaming applications. He has worked in many domains including military, healthcare, finance, and energy.

Copyright © 2023 Induro, LLC All Rights Reserved.