Field group (File architecture): Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
==Overview==
==Overview==
<p>Data items do not normally exist on their own. If we take the case of addresses, for example, there are a series of fields that exist for each (house number, city, state , zip code) which clearly operate interdependently. </p>


<p>The idea of a Repeating Field Group (RFG) is to take these related fields and package them as a single entity so they can be read and updated with a single operation.</p>


<p>Such packaging improves the readability and maintainability of Soul programs (see [[Soul programming]] for the full detail of the statement changes) but also makes the Model 204 processing far more efficient.</p>




==Performance==
<p>Code accessing RFGs perform better in two main areas:</p>


==Field groups and Table B storage considerations==
<p>Every record in a FILEORG=X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.</p>
<p>As is true of all FILEORG=X'100' files, the field name representation (as held in [[Table A (File Architecture)|Table A]] and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this may be offset by the physical absence of fields defined as part of the group.</p>
DV and space savings


===Improved record scan:===
<p>In general, much of the time spent in Model 204 processing is due to the need to scan across the record, and the fact that the scan reads a field at a time and, if it is not the one that is need, goes on to the next, and so forth. As records grow in length, this scan, obviously, takes longer.<p>


<p>With RFGs, the scan skips a group, rather than a field at a time, and so can find a specified data item faster.</p>


==Field Group Identifiers==
<p>But, in reality, the improvement is better than this. If you are looking for a number of related (but non-RFG) fields, each field requires a separate scan to find it (insert rules mean that the data can not be presumed to be contiguous), so, if you are looking for 6 individual fields, the record must be scanned, once each.</p>


<p>If the related fields are packaged as an RFG, a single scan of the record will find all the fields; in addition, all of the related data items are in a single location.</p>


==Performance==


<p>Improved record scan:<br> long records are read much faster, as, rather than using the length byte in the field value pair to skip to the next record, a length is held for the entire field group and so the entire group may be skipped.</p>
===Adds / inserts and deletes are a single operation:===
 
<p>Adds / inserts and deletes are a single operation:</p>


<p>Take the following blocks of code:</p>
<p>Take the following blocks of code:</p>
Line 46: Line 47:
This syntax does the entire insert as a single operation, so, only one insert position only needs to be found, and then the entire set of fields is inserted.</p>  
This syntax does the entire insert as a single operation, so, only one insert position only needs to be found, and then the entire set of fields is inserted.</p>  


==Field groups and Table B storage considerations==
<p>At first glance, the performance improvement comes at a cost of space (but this can usually be offset).</p>
<p>The costs:</p>
<p>Every record in a FILEORG=X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.</p>
<p>As is true of all FILEORG=X'100' files, the field name representation (as held in [[Table A (File Architecture)|Table A]] and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this may be offset by the physical absence of fields defined as part of the group.</p>
===Space saving and default and null values===
<p>In fact, depending on the characteristics of the data you are storing, the use of RFGs may actually save space.</p>
<p>Take as an example, a financial application where you are tracking a number of types of income over time. So you have an RFG containing a date and a number of income types: salary; interest; dividends; and so forth. It is likely that the majority of people for which you are keeping data have only a few of the many types of interest, so your record may have mostly zeros.</p>
<p>If so, by adding a [[#Field Design (File Management)#DEFAULT-VALUE (DV) attribute|default value]] of 0 and a [[#Field Design (File Management)#STORE-DEFAULT (SD) and STORE-NULL (SN) attributes|store default]] of NONE (the default for [[#Field Design (File Management)#AT-MOST-ONE, REPEATABLE and EXACTLY-ONE attributes|exactly one]] fields in RFGs) none of the 0s would be physically stored (but would be treated as if they were).</p>
<p>For many files, the implementation of RFGs will improve both perfromance and space utilization.</p>     
==Field Group Identifiers==
<p>As mentioned above, every field group has unique ID.</p>
<p>This ID can be read with the [[$FIELDGROUPID function]] and is also displayed when a [[PRINT ALL INFORMATION statement|PRINT ALL INFORMATION]] is done.</p>
<p>While, by itself, the ID is just for uniqueness, there is an interesting use for it, in that you can tell the order that things happened to a file, which is invaluable in debugging.</p>


   





Revision as of 23:02, 6 May 2013

Overview

Data items do not normally exist on their own. If we take the case of addresses, for example, there are a series of fields that exist for each (house number, city, state , zip code) which clearly operate interdependently.

The idea of a Repeating Field Group (RFG) is to take these related fields and package them as a single entity so they can be read and updated with a single operation.

Such packaging improves the readability and maintainability of Soul programs (see Soul programming for the full detail of the statement changes) but also makes the Model 204 processing far more efficient.


Performance

Code accessing RFGs perform better in two main areas:


Improved record scan:

In general, much of the time spent in Model 204 processing is due to the need to scan across the record, and the fact that the scan reads a field at a time and, if it is not the one that is need, goes on to the next, and so forth. As records grow in length, this scan, obviously, takes longer.

With RFGs, the scan skips a group, rather than a field at a time, and so can find a specified data item faster.

But, in reality, the improvement is better than this. If you are looking for a number of related (but non-RFG) fields, each field requires a separate scan to find it (insert rules mean that the data can not be presumed to be contiguous), so, if you are looking for 6 individual fields, the record must be scanned, once each.

If the related fields are packaged as an RFG, a single scan of the record will find all the fields; in addition, all of the related data items are in a single location.


Adds / inserts and deletes are a single operation:

Take the following blocks of code:

INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM INSERT STREET(%LOC) = %ADDR:STREET INSERT CITY(%LOC) = %ADDR:CITY INSERT STATE(%LOC) = %ADDR:STATE INSERT ZIP(%LOC) = %ADDR:ZIP

each of those statements are separate operations. For each one the correct position in the record must be found, and then the field inserted. This is repeated six times.

instead, with an RFG:

INSERT FIELDGROUP ADDRESS(%LOC) ADDRESS_TYPE = %ADDR:TYPE HOUSE_NUMBER = %ADDR:NUM STREET = %ADDR:STREET CITY = %ADDR:CITY STATE = %ADDR:STATE ZIP = %ADDR:ZIP END INSERT

This syntax does the entire insert as a single operation, so, only one insert position only needs to be found, and then the entire set of fields is inserted.

Field groups and Table B storage considerations

At first glance, the performance improvement comes at a cost of space (but this can usually be offset).

The costs:

Every record in a FILEORG=X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.

As is true of all FILEORG=X'100' files, the field name representation (as held in Table A and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this may be offset by the physical absence of fields defined as part of the group.

Space saving and default and null values

In fact, depending on the characteristics of the data you are storing, the use of RFGs may actually save space.

Take as an example, a financial application where you are tracking a number of types of income over time. So you have an RFG containing a date and a number of income types: salary; interest; dividends; and so forth. It is likely that the majority of people for which you are keeping data have only a few of the many types of interest, so your record may have mostly zeros.

If so, by adding a default value of 0 and a store default of NONE (the default for exactly one fields in RFGs) none of the 0s would be physically stored (but would be treated as if they were).

For many files, the implementation of RFGs will improve both perfromance and space utilization.




Field Group Identifiers

As mentioned above, every field group has unique ID.

This ID can be read with the $FIELDGROUPID function and is also displayed when a PRINT ALL INFORMATION is done.

While, by itself, the ID is just for uniqueness, there is an interesting use for it, in that you can tell the order that things happened to a file, which is invaluable in debugging.