Field group (File architecture): Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
Line 7: Line 7:


==Performance==
==Performance==
<p>Code accessing RFGs performs better in two main areas:
<p>Code accessing RFGs performs better in three main areas:
<ul>
<ul>
<li>improved record scanning</li>
<li>improved record scanning</li>
<li>performing adds/inserts or deletes as a single operation</li>
<li>performing adds/inserts or deletes as a single operation</li>
<li>retrieval of all field groups on a record will only require one I/O if the entire record is stored on a single page</li>
</ul>
</ul>
</p>
</p>

Revision as of 15:13, 24 May 2013

Overview

Data items do not normally exist on their own. If we take the case of addresses, for example, there are a series of fields that exist for each (house number, city, state , zip code) which clearly operate interdependently.

The idea of a Repeating Field Group (RFG) is to take these related fields and package them as a single entity so they can be read and updated with a single operation.

Such packaging not only improves the readability and maintainability of Soul programs (see Soul programming for the full detail of the statement changes) but also makes the Model 204 processing far more efficient.

Performance

Code accessing RFGs performs better in three main areas:

  • improved record scanning
  • performing adds/inserts or deletes as a single operation
  • retrieval of all field groups on a record will only require one I/O if the entire record is stored on a single page

Improved record scanning

In general, much of the time spent in Model 204 processing is due to the need to scan across the record, and to the fact that the scan reads a field at a time and, if it is not the field that is needed, goes on to the next, and so forth. As records grow in length, this scan obviously takes longer.

With RFGs, the scan skips a group, rather than a field at a time, and so it can find a specified data item faster.

But, in reality, the improvement is better than this. If you are looking for a number of related (but non-RFG) fields, each field requires a separate scan to find it (insert rules mean that the data cannot be presumed to be contiguous), so if you are looking for 6 individual fields, the record must be scanned once for each.

If the related fields are packaged as an RFG, a single scan of the record will find all the fields; in addition, all of the related data items are in a single location.

Adds/inserts and deletes are performed as single operations

Take the following blocks of code:

INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM INSERT STREET(%LOC) = %ADDR:STREET INSERT CITY(%LOC) = %ADDR:CITY INSERT STATE(%LOC) = %ADDR:STATE INSERT ZIP(%LOC) = %ADDR:ZIP

each of those statements is a separate operation. For each one the correct position in the record must be found, and then the field inserted. This is repeated six times.

Instead, with an RFG:

INSERT FIELDGROUP ADDRESS(%LOC) ADDRESS_TYPE = %ADDR:TYPE HOUSE_NUMBER = %ADDR:NUM STREET = %ADDR:STREET CITY = %ADDR:CITY STATE = %ADDR:STATE ZIP = %ADDR:ZIP END INSERT

This syntax does the entire insert as a single operation, so only one insert position needs to be found, and then the entire set of fields is inserted.

Field groups and Table B storage considerations

At first glance, the performance improvement comes at a cost of space (but this can usually be offset).

The costs:

Every record in a FILEORG=X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.

As is true of all FILEORG=X'100' files, the field name representations (as held in Table A and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this might be offset by the physical absence of fields defined as part of the group.

Space saving and default and null values

In fact, depending on the characteristics of the data you are storing, the use of RFGs might actually save space.

Take as an example a financial application where you are tracking a number of types of income over time. So you have an RFG containing a date and a number of income types: salary, interest, dividends, and so forth. It is likely that the majority of people for which you are keeping data have only a few of the many types of interest, so your record might have mostly zeros.

If so, by adding a default value of 0 and a store default of NONE (the default for exactly one fields in RFGs) none of the 0s would be physically stored (but would be treated as if they were).

For many files, the implementation of RFGs will improve both performance and space utilization.

Field Group Identifiers

As mentioned above, every field group has a unique ID.

This ID can be read with the $FIELDGROUPID function and is also displayed when a PRINT ALL INFORMATION is done.

While, by itself, the ID is just for uniqueness, there is an interesting use for it, in that you can tell the order that things happened to a file, which is invaluable in debugging.