The Cusp of Helix

Fertile Forest Model
[C.1: Hierarchical Data in a RDB]

Roles of Model

The role of the model to store hierarchical data in RDB are summarized in two paragraphs as:

  1. To store the hierarchical data in RDB table.
  2. To write the query to find the relevant nodes efficiently for any nodes.

(1) To store the hierarchical data in RDB table.

We can define the criteria whether the model has function to store hierarchical data, as:

Can reproduce the original hierarchical structure by the saved all nodes.

If can not reproduce hierarchical data by all records in RDB, we can not say that the model has the function. It is too basic thing that needless to say. Therefore, I think that nobody can disagree it.

(2) To write the query to find the relevant nodes efficiently.

In general, we save data into a database for searching them. We think naturally as: If to save hierarchical data in a database, various relationship node of any nodes (such as "subtree", "parent node", "child nodes", "ancestor nodes", "descendant nodes" and etc.) will be searchable. It is the important role of a model to find related nodes of any nodes efficiently.

In the figure below, [A], [B], [C], ... represents a node.

              [A]
               |
           +---+---+
           |       |
          [B]     [C]
           |       |
         +-+-+   +-+-+
         |   |   |   |
        [D] [E] [F] [G]
                     |
                   +-+-+
                   |   |
                  [H] [I]

When a human search the sub-tree of [C] visually, pick up all the nodes by tracing the branches extending downward from [C]. In this example, set of partial tree of the [C] is as:

          [C]
           |
         +-+-+
         |   |
        [F] [G]
             |
           +-+-+
           |   |
          [H] [I]

To trace a branches is regularity routine. Therefore, a lot of people think as "If to store hierarchical data in a database, we can find related nodes of any nodes easily".

In the hierarchical data, use frequently SQL to find a related nodes sets. Describe with the example in previous figure as:

Basic Node Relation Set of Nodes
[B]Parent[A]
[B]Children[D] [E]
[H]Ancestors[A] [C] [G]
[C]Descendants[F] [G] [H] [I]
[C]Subtree[C] [F] [G] [H] [I]

The word of "Efficient" is most important phrase in this items. If efficiency is unnecessary, to devise search query is also unnecessary. Firstly get all nodes, and them examining whether or not to match with conditions by programming language.

However, it is not realistic to get all nodes everytime in a table that has record of millions. When use the way to get all nodes against huge table, need execution time and a large amount of memory. If the table has serveral hundreds of million records and match nodes are a few, it is natural to design the system to be acquired at a high speed by specifying conditions.