Table D (File architecture)

From m204wiki
Jump to navigation Jump to search

Overview

Table D contains a variety of structures:

  • Record lists and bitmaps used for indexing
  • The existence bitmap (a special case of the above)
  • The Ordered Index B-Tree
  • The Record Map of preallocated fields
  • The Procedures (if any)

List and bitmap pages

The list and bitmap pages in Table D are the pointers to sets of records with particular field value pairs. Whether you navigate to these pages via the B-tree or hash indices, these lists and bitmaps exist when there are more than IMMEDiate records (B-Tree) or one (hash index) such record in a segment.

List pages

List pages may contain lists for more than one field value pair.

The chart below illustrates the process that adds values to a list. The notable points are:

  • When a list exceeds 30% of the usable page size (6144 bytes), it is converted to a bitmap on a new page.
  • When a page fills, the list being added to is moved to a new list page.



Bitmaps used in indexing

While bitmaps are used in Model 204 in a number of ways, one of the most critical is their use in rapid identification and further processing of sets of records.

Whether using the hash or B-tree indices, a search returns some combination of direct pointers, lists, and bitmaps.

Regardless of the mixture, the Model 204 evaluation always processes the resultant set of records as a bitmap (ANDing and ORing bitmaps being a strength of any IT system), and the result returned in the evaluation (associated with a label or list (see PLACE RECORD and REMOVE RECORD statements or the AddRecordset method) internally is also held as a bitmap.

Existence bitmap

The existence bitmap tracks which record numbers in Table B hold base records.

The execution of any FIND statement includes an implied "and exists" in its processing, and an unqualified Find statement simply returns the existence bitmap as its found set.

The existence bitmap also supports the concept of logically deleted records, where the bit is turned off, but no space cleanup is done. See the DELETE ALL RECORDS statement for further information on this technique.

Ordered Index

B-tree index structure

Fields defined with the ORDERED attribute are stored with their values in the Ordered Index, a B-tree based index structure that maintains field values in an ordered sequence to facilitate range retrievals. A B-tree is a tree structure residing in secondary storage (disk) whose elements of access and navigation are disk page size tables, or nodes (See Ordered Index B-tree diagram). It provides a method of accessing record keys in order for a given file.

There is a single B-tree structure in the file that contains all of the index entries for all of the ORDered fields in the file.

Ordered Index processing

Records are first identified in the Ordered Index, then retrieved from the stored data records. The caller can retrieve keys and associated access information with an exact match specification or with a range or pattern specification. Only one Ordered Index exists per Model 204 file, containing all field-name-equals-value pairs (in sorted sequence) and record access information (pointers to Tables B or D) for all ORDERED fields defined to the file. For more information, see Understanding the Ordered Index, below.

Note: You can use ORDERED fields in value loop processing. See the discussion Introducing value loops.

Manipulating the Ordered Index

The Ordered Index is created the first time an ORDERED field is defined and physically resides in Table D. Some of its characteristics can be modified through the REDEFINE command. If multiple deletions and updates reduce the page density and access efficiency, you can rebuild the Ordered Index using the REORGANIZE OI command.

Benefits of the ORDERED attribute

Choosing the ORDERED attribute provides the following benefits:

  • Alphabetic, alphanumeric, and numeric range retrievals are typically resolved from the index and do not require a direct search of data records. The following examples show such range retrievals. An alphabetic range retrieval is followed by a numeric range retrieval:

    NAME: FIND ALL RECORDS FOR WHICH NAME IS ALPHA LESS THAN SMITH END FIND SAL: FIND ALL RECORDS FOR WHICH SALARY IS GREATER THAN 30000 END FIND

  • Range searches through the index are supported for fields containing character strings as well as numeric values, for INVISIBLE fields, and for multiply occurring fields.
  • Record-by-record processing in sequential order of a field value is simplified and made very efficient, because the Ordered Index maintains key values in order and highly clustered. Such processing is invoked by the SOUL FOR EACH RECORD (FR) and FOR EACH VALUE (FRV) statements, and by the Host Language Interface IFFDV, IFGET, and IFGETV calls.

    IFGET supports ORDER SPECIFICATION only in a multicursor program. You can change your program to a multicursor program, if necessary.

  • Retrieval of record keys whose values match a given character pattern is optimized. This capability is called pattern matching, and it is described in Pattern matching.

Understanding the Ordered Index

Types of nodes

The Ordered Index can have the following types of tree nodes, which are shown in the diagram below and described in the following table:

Node type Description
Root Highest level node, which normally contains on one page all the pointers to the next level down-either an intermediate node or leaf node, if no intermediate nodes exist. Ordered Index searches begin at the root node.
Intermediate(s) Level(s) below the root and above the leaf, which act as directory nodes and contain only pointers to leaf nodes or to lower intermediate nodes.
Leaf Lowest level node, which contains a key field-name-equals-value pair, plus a record access pointer to each Table B record containing the pair, or to a Table D record list, or bit pattern that points to the Table B record. (Record lists and bit patterns are described in Sizing Table D.) A small Ordered Index tree can fit on a single node, in which case the root is also a leaf node.

In general, each node that has pointers to a lower level node has a large number of such pointers, which are referred to as a large fan-out in searching down the tree.

Node structure

Each of the node pages contain:

Ordered Index tree node content
Item Description
Flag Indicates node type (root, intermediate, leaf)
Parent Node Pointer Non-root node: pointer to parent node
Root node General information about the tree
Node Version Number Used during FRV and FR IN ORDER retrievals.
Determines if the revisited node is the same as the original
Prior Sibling Node Pointer Pointer to the left sibling node (at leaf level only)
Next Sibling Node Pointer Pointer to the right sibling node (at leaf level only)
Offset Last Entry Pointer to free space on the page
Offset End of Page X'1800' for 6184 byte page (only valid page size)
Prefix Key Pointer Pointer to the prefix key (longest value string, including the field code, that every field value held on a particular node has in common. At worst case, this will just be the field code, as values for different ordered fields are not held on the same leaf node)
Entry pointers Pointers to entries (data items) on the page

Ordered Index B-tree diagram

This diagram represents a portion of a typical Ordered Index B-tree, depicting the relationship of the tree nodes. The figure does not show leaf-node pointers to Table B records or to Table D record lists or bit patterns. It also does not represent the likely case in which the tree nodes contain a few hundred pointers per page.

Logical structure of Ordered Index B-tree



Processing

The Ordered Index B-tree is processed as follows:

  • When new Ordered Index entries are inserted in a full leaf node, a split of the node into two new nodes occurs, and an additional pointer is added to the node at the next higher level.
  • If this node is also full, it is split into two new nodes, and a new pointer is added to an intermediate node at the next higher level.
  • If this process reaches back to a full root node, the root is split into two intermediate nodes that are pointed to by a new root, one level higher.

This newly created level represents an increase by one of the tree's depth. Only root node splits increase the tree's depth. All leaf node entries are at the same depth.

An Ordered Index tree with a depth of three levels, with reasonable assumptions about entry sizes, can hold about 500 million entries. Model 204 automatically maintains the tree structure.

Ordered Index spacing parameters

Certain characteristics of the Ordered Index tree structure are controlled by the attributes set when the ORDERED field is defined.

Ordered Index spacing parameters
Parameter Default Range Specifies...
LRESERVE 15 0 - 99 Percentage of left-hand leaf node space to leave free upon a node split during deferred update mode processing, including file loading or Ordered Index tree restructuring with the REORGANIZE OI command. Too low a setting can cause excessive splitting during subsequent minor modifications.
NRESERVE 15 0 - 99 Percentage of left-hand intermediate node space to leave free upon a split during deferred update mode processing, including file loading or Ordered Index tree restructuring with the REORGANIZE OI command. Too low a setting can cause excessive splitting during subsequent minor modifications.
SPLITPCT (or SPLT) 50 1 - 100 Percentage of node data placed on the left-hand node when a split occurs during non-deferred User Language or Host Language Interface requests.

Note: Unlike the two previous parameters, SPLITPCT specifies data to go on the left node, rather than space.

Too high or too low a setting can cause excessive splitting.

Consider the characteristics of the data insetting this parameter. If, for example, you are indexing a processing date (or similar) it is likely that the data is always going to be increasing, and so a high SPLITPCT would be a good choice.

IMMED 1 0 and 255 Number of direct (immediate) pointers to Table B records per leaf page entry per segment in the Ordered Index tree. If the number of records to be pointed to in an entry is greater than the value of IMMED, all the record pointers are moved to a Table D list or bitmap containing pointers to Table B. Storing information in Table D lists slows range retrieval but reduces Ordered Index tree depth.
Spacing parameter guidelines

The following guidelines, while not precise, might help you reduce the number of splits and increase update performance at your site. How these parameters function is closely dependent on the data and the application. Model 204 customer support strongly suggests that you try the default settings, as well as experiment with these parameter settings to see what works for your site.

Because updates that cause page splits consume more resources than updates to less-than-full pages, set the spacing parameters to allow sufficient space for subsequent updating. Allowing too much space, however, might increase the B-Tree's depth and thereby impede performance. Knowing the likely updating characteristics of the ORDERED fields is the best guide for setting these parameters.

The settings of LRESERVE, NRESERVE, SPLITPCT, and IMMED are recommended settings only. The actual movement of data during an update might not conform exactly to the value of the appropriate parameter. The spacing parameters can be changed by specifying the parameter and a new value with the REDEFINE command:

  • Use the default setting for SPLITPCT (50), unless the field is updated Online and most values are added in strict ascending or descending order, such as dates or employee ID numbers. If field values are added in:
  • Ascending order, set SPLITPCT higher than 50.
  • Descending order, set SPLITPCT lower than 50.
  • If you load records either via the File Load utility (FLOD) or as deferred updates applied with the Z command, and the ordered field is heavily updated in Online mode, set LRESERVE and NRESERVE higher: up to 50. This causes more splits in the FLOD Z step (see : ), but, by not packing the tree as tightly during the Z step, it should decrease the number of Online splits. Of these two parameters, LRESERVE is more important for tuning. If the field is not heavily updated Online, use the defaults for LRESERVE and NRESERVE.
  • If the field is updated Online with information that is generally not in order, Model 204 customer support suggests that you experiment with the IMMED parameter. This parameter is less important if most updates are entered during the FLOD Z step. With the IMMED parameter, there might be a trade-off between Table D LIST I/O and B-Tree I/O.

Record map of preallocated fields

Also in Table D (and no more than one page) is a map of the preallocated fields defined within the file. This is used when ever a new record is added to the file, as space is reserved for these fields just after the extension pointer at the beginning of the record.

The DISPLAY RECORD command against the file shows the map. The fields appear in the map in the same order they are defined to the file, so, if you execute the following DEFINE FIELD commands:

DEFINE FIELD PRE.FIELD.2 (OCC 1 LEN 9) DEFINE FIELD PRE.FIELD.1 (OCC 1 BIN) DEFINE FIELD PRE.FIELD.3 (OCC 1 LEN 20)

The DISPLAY RECORD command produces:

LENGTH OCCURS PAD NAME 9 1 X'00' PRE.FIELD.2 BIN 1 X'00' PRE.FIELD.1 20 1 X'00' PRE.FIELD.3

Procedures

There are two parts to the storage of procedures in a Model 204 file:

The actual procedure 'text' (stored sets of commands and program statements run against Model 204 files) are stored, with procedures taking a minimum of 1 page, regardless of how small the procedure is.

And, so as not to have to scan the entire procedure store whenever you need to access one, a procedure dictionary. This dictionary associates a procedure name or alias with information about the location of the procedure's text, and with a class, if the procedure is secured.

Most sites keep their procedures in different files than their data, but, regardless, the files have identical structures whether they contain only procedures, only data, or both.

The procedure dictionary is defined using the PDSIZE and PDSTRPPG parameters, as described next.

The procedure dictionary

The procedure dictionary is allocated in blocks of one or more contiguous pages (defined by the PDSIZE parameter). When Model 204 searches for a procedure name, it begins searching on a random page in the first block. If the name is not found on that page, the remaining pages in the same block are searched. If the name is still not found, Model 204 searches the pages in the second block, and so on.

When a new procedure is being stored, and Model 204 (as expected) does not find the name, it stores the new name in the first block in which it can find space. Model 204 allocates a new block of PDSIZE parameter pages when it cannot find space for a new name in any of the preceding blocks. Space used by deleted names is reused.

Approaches to setting PDSIZE

There are two possible paths you can take in choosing a PDSIZE:

  • Have one large block containing many pages. Because name searches always begin with the first block, this increases the likelihood of finding a name on the first page read. However, as the pages fill up, Model 204 might allocate a new block when space still exists on the old block.
  • Have a number of smaller blocks with fewer pages. Although it might take Model 204 longer to find the procedure name, there is less impact on Table D when a new block is allocated.

Rocket Software generally favors the first approach, particularly as the number of procedures you have grows.

PDSTRPPG parameter

PDSTRPPG specifies the maximum number of procedure entries per procedure dictionary page.

It is, of course, tied very tightly to the length of procedure names and alias. As such, knowledge of your installations naming standards is critical in setting this parameter (see File sizing introduction for details).