Field group (File architecture): Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
 
(29 intermediate revisions by 6 users not shown)
Line 1: Line 1:
==Overview==
==Overview==
<p>Data items do not normally exist on their own. If we take the case of addresses, for example, there are a series of fields that exist for each (house number, city, state , zip code) which clearly operate interdependently. </p>
<p>
Data items do not normally exist on their own. In the case of addresses, for example, a series of fields exist for each address (house number, city, state, zip code) which clearly operate interdependently. </p>


<p>The idea of a Repeating Field Group (RFG) is to take these related fields and package them as a single entity so they can be read and updated with a single operation.</p>
<p>The idea of a <b>physical field group</b> is to take these related fields and package them as a single entity so they can be read and updated with a single operation. (See [[Field group design]] for details.)</p>


<p>Such packaging improves the readability and maintainability of Soul programs (see [[:Category:SOUL|Soul programming]] for the full detail of the statement changes) but also makes the Model 204 processing far more efficient.</p>
<p>Such packaging not only improves the readability and maintainability of SOUL programs (see [[Processing multiply occurring fields and field groups]] for the details of the SOUL statements involved), but it also makes the Model 204 processing far more efficient.</p>


==Performance==
==Performance==
<p>Code accessing RFGs perform better in two main areas:</p>
<p>
Code that accesses field groups performs better in three main areas:</p>
<ul>
<li>Improved record scanning</li>


<li>Performing adds/inserts or deletes as a single operation</li>


===Improved record scan:===
<li>Retrieval of all field groups on a record (requires only one I/O if the entire record is stored on a single page)</li>
</ul>
<p>In general, much of the time spent in Model 204 processing is due to the need to scan across the record, and the fact that the scan reads a field at a time and, if it is not the one that is needed, goes on to the next, and so forth. As records grow in length, this scan, obviously, takes longer.<p>


<p>With RFGs, the scan skips a group, rather than a field at a time, and so can find a specified data item faster.</p>
===Improved record scanning===
<p>
In general, much of the time spent in Model 204 processing is used to scan across a record, and this scan reads a field at a time. If a field is not the field that is needed, scanning moves to the next field, and so forth. As records grow in length, such a scan obviously takes longer.</p>


<p>But, in reality, the improvement is better than this. If you are looking for a number of related (but non-RFG) fields, each field requires a separate scan to find it (insert rules mean that the data can not be presumed to be contiguous), so, if you are looking for 6 individual fields, the record must be scanned, once each.</p>
<p>With field groups, the scan skips a group rather than a field at a time, so it can find a specified data item faster.</p>


<p>If the related fields are packaged as an RFG, a single scan of the record will find all the fields; in addition, all of the related data items are in a single location.</p>
<p>In reality, however, the improvement is better than this. If you are looking for a number of related (but non-field-group) fields, a separate scan is required to find each field (insert rules mean that the data cannot be presumed to be contiguous). So if you are looking for six individual fields, the record must be scanned once for each.</p>


<p>If the related fields are packaged as a field group, a single scan of the record finds all the fields. In addition, all of the related data items are in a single location.</p>


===Adds / inserts and deletes are a single operation:===
===Adds/inserts and deletes are performed as single operations===
<p>
Consider the following blocks of code:</p>


<p>Take the following blocks of code:</p>
<p class="code">INSERT  ADDRESS_TYPE(%LOC)    = %ADDR:TYPE  
 
<p class="code">
INSERT  ADDRESS_TYPE(%LOC)    = %ADDR:TYPE  
INSERT  HOUSE_NUMBER(%LOC)    = %ADDR:NUM
INSERT  HOUSE_NUMBER(%LOC)    = %ADDR:NUM
INSERT  STREET(%LOC)          = %ADDR:STREET  
INSERT  STREET(%LOC)          = %ADDR:STREET  
Line 33: Line 39:
INSERT  ZIP(%LOC)              = %ADDR:ZIP
INSERT  ZIP(%LOC)              = %ADDR:ZIP
</p>  
</p>  
<p>each of those statements are separate operations. For each one the correct position in the record must be found, and then the field inserted. This is repeated six times. </p>  
<p>Each of these statements is a separate operation. For each one, the correct position in the record must be found, then the field is inserted. This sequence is repeated six times. </p>  
<p>instead, with an RFG:</p>
<p>
Instead, with a field group, the following syntax does the entire insert as a single operation. Only one insert position needs to be found, and then the entire set of fields is inserted:</p>
<p class="code">INSERT FIELDGROUP ADDRESS(%LOC)
<p class="code">INSERT FIELDGROUP ADDRESS(%LOC)
     ADDRESS_TYPE          = %ADDR:TYPE  
     ADDRESS_TYPE          = %ADDR:TYPE  
Line 43: Line 50:
     ZIP                  = %ADDR:ZIP
     ZIP                  = %ADDR:ZIP
END INSERT</p>
END INSERT</p>
<p>This syntax does the entire insert as a single operation, so, only one insert position only needs to be found, and then the entire set of fields is inserted.</p>


==Field groups and Table B storage considerations==
==Field groups and Table B storage considerations==
 
<p>
<p>At first glance, the performance improvement comes at a cost of space (but this can usually be offset).</p>
At first glance, the performance improvement of field groups comes at a cost of space (but this can usually be offset).</p>


<p>The costs:</p>
<p>The costs:</p>
<ul>
<li>Every record in a <var>[[FILEORG parameter|FILEORG]]</var> X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.</li>


<p>Every record in a FILEORG=X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.</p>
<li>As is true of all <var>FILEORG</var> X'100' files, the field name representations (as held in [[Table A (File architecture)|Table A]] and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this might be offset by the physical absence of fields defined as part of the group.</li>
 
</ul>
<p>As is true of all FILEORG=X'100' files, the field name representations (as held in [[Table A (File Architecture)|Table A]] and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this may be offset by the physical absence of fields defined as part of the group.</p>


===Space saving and default and null values===
===Space saving and default and null values===
<p>
Depending on the characteristics of the data you are storing, using field groups might actually save space.</p>
<p>
Take as an example a financial application where you are tracking a number of types of income over time. You have a field group containing a date and a number of income types: salary, interest, dividends, and so forth. It is likely that the majority of people whose data you are keeping have only a few of the many types of interest, so your record might have mostly zeros.</p>
<p>
If so, by adding a [[Field design#DEFAULT-VALUE (DV) attribute|default value]] of 0 and a [[Field design#STORE-DEFAULT (SD) and STORE-NULL (SN) attributes|store default]] of NONE (the default for [[Field design#AT-MOST-ONE, REPEATABLE and EXACTLY-ONE attributes|exactly one]] fields in field groups), none of the 0s would be physically stored (but would be treated as if they were).</p>


<p>In fact, depending on the characteristics of the data you are storing, the use of RFGs may actually save space.</p>
<p>For many files, implementing field groups will improve both performance and space utilization.</p>
 
<p>Take as an example, a financial application where you are tracking a number of types of income over time. So you have an RFG containing a date and a number of income types: salary; interest; dividends; and so forth. It is likely that the majority of people for which you are keeping data have only a few of the many types of interest, so your record may have mostly zeros.</p>
 
<p>If so, by adding a [[#Field Design (File Management)#DEFAULT-VALUE (DV) attribute|default value]] of 0 and a [[#Field Design (File Management)#STORE-DEFAULT (SD) and STORE-NULL (SN) attributes|store default]] of NONE (the default for [[#Field Design (File Management)#AT-MOST-ONE, REPEATABLE and EXACTLY-ONE attributes|exactly one]] fields in RFGs) none of the 0s would be physically stored (but would be treated as if they were).</p>
 
<p>For many files, the implementation of RFGs will improve both perfromance and space utilization.</p>
 
==Field Group Identifiers==
<p>As mentioned above, every field group has unique ID.</p>
 
<p>This ID can be read with the [[$FIELDGROUPID function]] and is also displayed when a [[PRINT ALL INFORMATION statement|PRINT ALL INFORMATION]] is done.</p>
 
<p>While, by itself, the ID is just for uniqueness, there is an interesting use for it, in that you can tell the order that things happened to a file, which is invaluable in debugging.</p>


   
==Field group identifiers==
<p>As mentioned above, every field group has a unique ID.</p>


<p>This ID can be read with the <var>[[$FieldgroupId]]</var> function, and it is also displayed by the <var>[[Basic SOUL statements and commands#PRINT ALL INFORMATION (or PAI) statement|Print All Information]]</var> statement.</p>


<p>By itself, the ID is just for uniqueness, but it has an interesting use that is invaluable in debugging: you can determine the order in which things happened to a file.</p> 




[[Category:File architecture]]
[[Category:File architecture]]

Latest revision as of 23:04, 1 April 2015

Overview

Data items do not normally exist on their own. In the case of addresses, for example, a series of fields exist for each address (house number, city, state, zip code) which clearly operate interdependently.

The idea of a physical field group is to take these related fields and package them as a single entity so they can be read and updated with a single operation. (See Field group design for details.)

Such packaging not only improves the readability and maintainability of SOUL programs (see Processing multiply occurring fields and field groups for the details of the SOUL statements involved), but it also makes the Model 204 processing far more efficient.

Performance

Code that accesses field groups performs better in three main areas:

  • Improved record scanning
  • Performing adds/inserts or deletes as a single operation
  • Retrieval of all field groups on a record (requires only one I/O if the entire record is stored on a single page)

Improved record scanning

In general, much of the time spent in Model 204 processing is used to scan across a record, and this scan reads a field at a time. If a field is not the field that is needed, scanning moves to the next field, and so forth. As records grow in length, such a scan obviously takes longer.

With field groups, the scan skips a group rather than a field at a time, so it can find a specified data item faster.

In reality, however, the improvement is better than this. If you are looking for a number of related (but non-field-group) fields, a separate scan is required to find each field (insert rules mean that the data cannot be presumed to be contiguous). So if you are looking for six individual fields, the record must be scanned once for each.

If the related fields are packaged as a field group, a single scan of the record finds all the fields. In addition, all of the related data items are in a single location.

Adds/inserts and deletes are performed as single operations

Consider the following blocks of code:

INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM INSERT STREET(%LOC) = %ADDR:STREET INSERT CITY(%LOC) = %ADDR:CITY INSERT STATE(%LOC) = %ADDR:STATE INSERT ZIP(%LOC) = %ADDR:ZIP

Each of these statements is a separate operation. For each one, the correct position in the record must be found, then the field is inserted. This sequence is repeated six times.

Instead, with a field group, the following syntax does the entire insert as a single operation. Only one insert position needs to be found, and then the entire set of fields is inserted:

INSERT FIELDGROUP ADDRESS(%LOC) ADDRESS_TYPE = %ADDR:TYPE HOUSE_NUMBER = %ADDR:NUM STREET = %ADDR:STREET CITY = %ADDR:CITY STATE = %ADDR:STATE ZIP = %ADDR:ZIP END INSERT

Field groups and Table B storage considerations

At first glance, the performance improvement of field groups comes at a cost of space (but this can usually be offset).

The costs:

  • Every record in a FILEORG X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.
  • As is true of all FILEORG X'100' files, the field name representations (as held in Table A and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this might be offset by the physical absence of fields defined as part of the group.

Space saving and default and null values

Depending on the characteristics of the data you are storing, using field groups might actually save space.

Take as an example a financial application where you are tracking a number of types of income over time. You have a field group containing a date and a number of income types: salary, interest, dividends, and so forth. It is likely that the majority of people whose data you are keeping have only a few of the many types of interest, so your record might have mostly zeros.

If so, by adding a default value of 0 and a store default of NONE (the default for exactly one fields in field groups), none of the 0s would be physically stored (but would be treated as if they were).

For many files, implementing field groups will improve both performance and space utilization.

Field group identifiers

As mentioned above, every field group has a unique ID.

This ID can be read with the $FieldgroupId function, and it is also displayed by the Print All Information statement.

By itself, the ID is just for uniqueness, but it has an interesting use that is invaluable in debugging: you can determine the order in which things happened to a file.