When relationships are predictable and the paths to traverse them have a known depth, it is pretty simple to create a normalized structure to represent them. Unfortunately, the real world is often more flexible and my data model needs to be just as flexible to accurately represent it. The following are a few approaches for handling flexible relational data that have helped me over the years.
A simple tree structure where a child can only have one parent.
When to use: Your relationships follow a strict hierarchy where each child can only have one parent.
How to query: You could join the table on itself up to N times where N is the max depth of your hierarchy, but I’d rather just do that mapping in code after I pull back all the data that I care about.
Supports a graph structure where each nodes are stored in one table and edges (relationships) in another. The edges can be directional or not, you can have multiple between the same node, and you can assign metadata to them.
When to use:
- Children can have more than one parent
- Relationships aren’t strictly hierarchical
- You don’t need to care about directionality (you still can)
- You need multiple relationships between the same nodes
- Relationships can have different meanings
How to query: Basically the same approach as the adjacency list.
A more queryable tree structure where you get full ancestry information without having to traverse the whole branch. Each node is stored with the full path from the root of the tree. The ids in the path are concatenated in sequence with a delimiter.
When to use: You want to filter based on ancestry and you have fairly static data (moving a node changes the path of all descendants).
How to query: In its most primitive form you are doing a string begins-with query. Thankfully many databases support this structure and allow more efficient and standardized queries (see Postgres ltree).
An elegant technique for storing hierarchical relationships that is super fast to query. Each node has a left and right value. All of the descendants for a node will have left and right values that are inside that of their ancestors.
Here is a diagram to help you visualize it.
Works well when: You need to impress someone… or if you have very static data and need a quick way to query ancestry and descendants. It is more powerful than Path Enumeration from a query perspective but much more costly to update. Modifying the structure updates every node with a right value greater than the node you are updating.
How to query: To find the descendants of some node you find those with a left (or right) value is between the left and right of the target node. To find the ancestors of some node you find those whose left is less than the target left and right is greater than the target right.
If I can safely assume a strict hierarchy then I use an Adjacency list paired with Path Enumeration for queryability. The Adjacency list maintains the referential integrity and the Path Enumeration is generated off of it via a trigger. Outside of that, I use an Edge List. Its structure is barely different but you gain a massive amount of flexibility. Sadly there aren’t great tricks for easy querying so I’d end up doing more in code. One day I would love to use Nested Set but I have yet to find a data set large enough and static enough to merit its obscure beauty.
One specific application of these concepts that I have appreciated is using Path Enumeration for authorization. I tie a user’s role to a path and use the path to query the paths on entities they are trying to access. I’ll write more about that in a future post. Happy Hierarching!
Erik is an agile software developer in Charlotte, NC. He enjoys working full-stack (CSS, JS, C#, SQL) as each layer presents new challenges. His experience involves a variety of applications ranging from developing brochure sites to high-performance streaming applications. He has worked in many domains including military, healthcare, finance, and energy.