The Cusp of Helix

Fertile Forest Model
[C.1: Hierarchical Data in a RDB]

Criteria of the Model

Superiority of the model to handle hierarchical data can be determined by the two criteria as:

  1. Has the execution speed of the search query in a range that can withstand practical use.
  2. Number of total column to save tree structure data is more fewer.

(1) Has the execution speed of the search query in a range that can withstand practical use.

When there is a website that takes two minutes until to see the next page, nobody thinks that the website has enough speed to withstand practical use. In the current Japanese Internet environment, even if five seconds will be considered as slow. Because of a database has applications to be used in the dynamic display of web page, we can think the criteria whether human feel slowness by turning over a web page.

When the search query can refer to the index, it keeps the execution speed that can withstand practical use, even if contains several hundreds million records in the table. When the query can not be referenced index, even if the records are only the several hundreds of million records, can not get enough execution speed depending on the conditions. The first criterion is able to be said to be the same as "whether can write the search query to be refered the index".

When find relative nodes in table that contains hierarchical data, the most often used purpose is for sub-tree and child nodes. Therefore, "write a practical search query in a tree structure data model" is said as:

Can write a search query to find a subtree and child nodes of an arbitrary node with refering the index.

When the search query for finding sub-tree and child nodes can refer index, even if a table has a billion number of nodes and a hundred hierarchy of depth, it can keep search speed of withstand practical use.

(2) Number of total column to save tree structure data is more fewer.

Number of total column is the words that represented in consideration of the number of records to store hierarchical data in RDB. We consider about some models. The model has n-number of hierarchical structure column, and a rule that each node must be saved by m-number of records. In a case of that model, the number of total column is (n * m). Since the number of records to save is m = 1 in most of the model. Therefore, we can consider that total number of columns equals hierarchical structure the number of columns.

A model has the total number of records for saving nodes as small as possible, we can say that it is an excelent and economical model design. Because a few amount of data reduces capacity of DB server, and impact of finding query on the search speed.

The second criterion is no longer so importance now. Small capacity of the hard disk was an important issue in past, but now, the evolution of the hardware makes to reduce the problems related to capacity.

However, nobody says to disagree about the terms "it is an excellent model that contains columns for storing hierarchical data in RDB as few as possible". I think so.