Table D (File architecture): Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
== Overview ==
== Overview ==


== <div id="Table D (File Architecture) List and Bit Map pages">List and Bit Map Pages</div>==
== <div id="Table D (File Architecture) List and Bit Map pages">List and [[Bit Maps (File Architecture)|Bit Map]] Pages</div>==


== <div id="Table D (File Architecture) Existence Bit Map">The Existence Bit Map</div>==
== <div id="Table D (File Architecture) Existence Bit Map">The Existence Bit Map</div>==
<p>The Existence Bit Map tracks which record numbers in Table B hold base records.</p>
<p>The execution of any [[FIND statement]] includes an implied 'and exists' in its processing, and an unqualified find statement simply returns the existence bit map as its found set.</p> 
<p>The existence bit map also supports the concept of logically deleted records, where the bit is turned off, but no space cleanup is done. See the [[DELETE ALL RECORDS statement]] for further information on this technique.</p> 





Revision as of 04:48, 17 April 2013

Overview

List and Bit Map Pages

The Existence Bit Map

The Existence Bit Map tracks which record numbers in Table B hold base records.

The execution of any FIND statement includes an implied 'and exists' in its processing, and an unqualified find statement simply returns the existence bit map as its found set.

The existence bit map also supports the concept of logically deleted records, where the bit is turned off, but no space cleanup is done. See the DELETE ALL RECORDS statement for further information on this technique.


Ordered Index

Understanding the Ordered Index

Types of nodes

The Ordered Index can have the following types of tree nodes, which are shown in the diagram below and described in the following table:

Node type Description
Root Highest level node, which normally contains on one page all the pointers to the next level down-either an intermediate node or leaf node, if no intermediate nodes exist. Ordered Index searches begin at the root node.
Intermediate(s) Level(s) below the root and above the leaf, which act as directory nodes and contain only pointers to leaf nodes or to lower intermediate nodes.
Leaf Lowest level node, which contains a key field-name-equals-value pair, plus a record access pointer to each Table B record containing the pair, or to a Table D record list, or bit pattern that points to the Table B record. (Record lists and bit patterns are described in .) A small Ordered Index tree can fit on a single node, in which case the root is also a leaf node.

In general, each node that has pointers to a lower level node has a large number of such pointers, which are referred to as a large fanout in searching down the tree.

Ordered Index B-Tree diagram

This diagram represents a portion of a typical Ordered Index B-Tree, depicting the relationship of the tree nodes. The figure does not show leaf-node pointers to Table B records or to Table D record lists or bit patterns. It also does not represent the likely case in which the tree nodes contain a few hundred pointers per page.

Logical Structure of Ordered Index B-Tree Processing

The Ordered Index B-Tree is processed as follows:

  • When new Ordered Index entries are inserted in a full leaf node, a split of the node into two new nodes occurs, and an additional pointer is added to the node at the next higher level.
  • If this node is also full, it is split into two new nodes, and a new pointer is added to an intermediate node at the next higher level.
  • If this process reaches back to a full root node, the root is split into two intermediate nodes that are pointed to by a new root, one level higher.

This newly created level represents an increase by one of the tree's depth. Only root node splits increase the tree's depth. All leaf node entries are at the same depth.

An Ordered Index tree with a depth of three levels, with reasonable assumptions about entry sizes, can hold about 500 million entries. Model 204 automatically maintains the tree structure.

Ordered Index spacing parameters

Certain characteristics of the Ordered Index tree structure are controlled by the attributes set when the ORDERED field is defined.

Ordered Index spacing parameters
Parameter Default Range Specifies...
LRESERVE 15 0 - 99 Percentage of left-hand leaf node space to leave free upon a node split during deferred update mode processing, including file loading or Ordered Index tree restructuring with the REORGANIZE OI command. Too low a setting can cause excessive splitting during subsequent minor modifications.
NRESERVE 15 0 - 99 Percentage of left-hand intermediate node space to leave free upon a split during deferred update mode processing, including file loading or Ordered Index tree restructuring with the REORGANIZE OI command. Too low a setting can cause excessive splitting during subsequent minor modifications.
SPLITPCT (or SPLT) 50 1 - 100

Percentage of node data placed on the left-hand node when a split occurs during non-deferred User Language or Host Language Interface requests.

Note: Unlike the two previous parameters, SPLITPCT specifies data to go on the left node, rather than space.

Too high or too low a setting can cause excessive splitting.

Consider the characteristics of the data insetting this parameter. If, for example, you are indexing a processing date (or similar) it is likely that the data is always going to be increasing, and so a high SPLITPCT would be a good choice.

IMMED 1 0 and 255 Number of direct (immediate) pointers to Table B records per leaf page entry per segment in the Ordered Index tree. If the number of records to be pointed to in an entry is greater than the value of IMMED, all the record pointers are moved to a Table D list or bit map containing pointers to Table B. Storing information in Table D lists slows range retrieval but reduces Ordered Index tree depth.

Spacing parameter guidelines

The following guidelines, while not precise, might help you reduce the number of splits and increase update performance at your site. How these parameters function is closely dependent on the data and the application. Model 204 customer support strongly suggests that you try the default settings, as well as experiment with these parameter settings to see what works for your site.

Because updates that cause page splits consume more resources than updates to less-than-full pages, set the spacing parameters to allow sufficient space for subsequent updating. Allowing too much space, however, might increase the B-Tree's depth and thereby impede performance. Knowing the likely updating characteristics of the ORDERED fields is the best guide for setting these parameters.

The settings of LRESERVE, NRESERVE, SPLITPCT, and IMMED are recommended settings only. The actual movement of data during an update might not conform exactly to the value of the appropriate parameter. The spacing parameters can be changed by specifying the parameter and a new value with the REDEFINE command:

  • Use the default setting for SPLITPCT (50), unless the field is updated Online and most values are added in strict ascending or descending order, such as dates or employee ID numbers. If field values are added in:
  • Ascending order, set SPLITPCT higher than 50.
  • Descending order, set SPLITPCT lower than 50.
  • If you load records either via the File Load Utility (FLOD) or as deferred updates applied with the Z command, and the ordered field is heavily updated in Online mode, set LRESERVE and NRESERVE higher: up to 50. This causes more splits in the FLOD Z step (see : ), but, by not packing the tree as tightly during the Z step, it should decrease the number of Online splits. Of these two parameters, LRESERVE is more important for tuning. If the field is not heavily updated Online, use the defaults for LRESERVE and NRESERVE.
  • If the field is updated Online with information that is generally not in order, Model 204 customer support suggests that you experiment with the IMMED parameter. This parameter is less important if most updates are entered during the FLOD Z step. With the IMMED parameter, there might be a trade-off between Table D LIST I/O and B-Tree I/O.

Record Map of Preallocated Fields

Also in Table D (and no more than one page) is a map of the preallocated fields defined within the file. This is used when ever a new record is added to the file, as space is reserved for these fields just after the extension pointer at the beginning of the record.

The DISPLAY RECORD command against the file shows the map. The fields appear in the map in the same order they are defined to the file, so, if you execute the following DEFINE FIELD commands:

DEFINE FIELD PRE.FIELD.2 (OCC 1 LEN 9) DEFINE FIELD PRE.FIELD.1 (OCC 1 BIN) DEFINE FIELD PRE.FIELD.3 (OCC 1 LEN 20)

The DISPLAY RECORD command produces:

LENGTH OCCURS PAD NAME 9 1 X'00' PRE.FIELD.2 BIN 1 X'00' PRE.FIELD.1 20 1 X'00' PRE.FIELD.3


Procedures

There are two parts to the storage of procedures in a Model 204 file:

The actual procedure 'text' (stored sets of commands and program statements run against Model 204 files) are stored, with procedures taking a minimum of 1 page, regardless of how small the procedure is.

And, so as not to have to scan the entire procedure store whenever you need to access one, a procedure dictionary. This dictionary associates a procedure name or alias with information about the location of the procedure's text, and with a class, if the procedure is secured.

Most sites keep their procedures in different files than their data, but, regardless, the files have identical structures whether they contain only procedures, only data, or both.

The procedure dictionary is defined using the PDSIZE and PDSTRPPG parameters.

The Procedure Dictionary

The procedure dictionary is allocated in blocks of one or more contiguous pages (defined by the PDSIZE parameter). When Model 204 searches for a procedure name, it begins searching on a random page in the first block. If the name is not found on that page, the remaining pages in the same block are searched. If the name is still not found, Model 204 searches the pages in the second block, and so on.

When a new procedure is being stored, and Model 204 (as expected) does not find the name, it stores the new name in the first block in which it can find space. Model 204 allocates a new block of PDSIZE parameter pages when it cannot find space for a new name in any of the preceding blocks. Space used by deleted names is reused.

Approaches to setting PDSIZE

There are two possible paths you can take in choosing a PDSIZE:

  • Have one large block containing many pages. Because name searches always begin with the first block, this increases the likelihood of finding a name on the first page read. However, as the pages fill up, Model 204 might allocate a new block when space still exists on the old block.
  • Have a number of smaller blocks with fewer pages. Although it might take Model 204 longer to find the procedure name, there is less impact on Table D when a new block is allocated.

Rocket Software generally favors the first approach, particularly as the number of procedures you have grows.


PDSTRPPG parameter

PPDSTRPPG specifies the maximum number of procedure entries per procedure dictionary page.

It is, of course, tied very tightly to the length of procedure names and alias. As such, knowledge of your installations naming standards is critical in setting this parameter (see File Size Calculation for details.