Field group (File architecture): Difference between revisions
m (→Overview) |
m (→Field groups and Table B storage considerations: add link) |
||
(30 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
==Overview== | ==Overview== | ||
<p>Data items do not normally exist on their own. | <p> | ||
Data items do not normally exist on their own. In the case of addresses, for example, a series of fields exist for each address (house number, city, state, zip code) which clearly operate interdependently. </p> | |||
<p>The idea of a | <p>The idea of a <b>physical field group</b> is to take these related fields and package them as a single entity so they can be read and updated with a single operation. (See [[Field group design]] for details.)</p> | ||
<p>Such packaging improves the readability and maintainability of | <p>Such packaging not only improves the readability and maintainability of SOUL programs (see [[Processing multiply occurring fields and field groups]] for the details of the SOUL statements involved), but it also makes the Model 204 processing far more efficient.</p> | ||
==Performance== | ==Performance== | ||
<p>Code | <p> | ||
Code that accesses field groups performs better in three main areas:</p> | |||
<ul> | |||
<li>Improved record scanning</li> | |||
<li>Performing adds/inserts or deletes as a single operation</li> | |||
<li>Retrieval of all field groups on a record (requires only one I/O if the entire record is stored on a single page)</li> | |||
</ul> | |||
< | |||
<p> | ===Improved record scanning=== | ||
<p> | |||
In general, much of the time spent in Model 204 processing is used to scan across a record, and this scan reads a field at a time. If a field is not the field that is needed, scanning moves to the next field, and so forth. As records grow in length, such a scan obviously takes longer.</p> | |||
<p> | <p>With field groups, the scan skips a group rather than a field at a time, so it can find a specified data item faster.</p> | ||
<p>If | <p>In reality, however, the improvement is better than this. If you are looking for a number of related (but non-field-group) fields, a separate scan is required to find each field (insert rules mean that the data cannot be presumed to be contiguous). So if you are looking for six individual fields, the record must be scanned once for each.</p> | ||
<p>If the related fields are packaged as a field group, a single scan of the record finds all the fields. In addition, all of the related data items are in a single location.</p> | |||
===Adds / inserts and deletes are | ===Adds/inserts and deletes are performed as single operations=== | ||
<p> | |||
Consider the following blocks of code:</p> | |||
<p class="code">INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE | |||
<p class="code"> | |||
INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE | |||
INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM | INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM | ||
INSERT STREET(%LOC) = %ADDR:STREET | INSERT STREET(%LOC) = %ADDR:STREET | ||
Line 33: | Line 39: | ||
INSERT ZIP(%LOC) = %ADDR:ZIP | INSERT ZIP(%LOC) = %ADDR:ZIP | ||
</p> | </p> | ||
<p> | <p>Each of these statements is a separate operation. For each one, the correct position in the record must be found, then the field is inserted. This sequence is repeated six times. </p> | ||
<p> | <p> | ||
Instead, with a field group, the following syntax does the entire insert as a single operation. Only one insert position needs to be found, and then the entire set of fields is inserted:</p> | |||
<p class="code">INSERT FIELDGROUP ADDRESS(%LOC) | <p class="code">INSERT FIELDGROUP ADDRESS(%LOC) | ||
ADDRESS_TYPE = %ADDR:TYPE | ADDRESS_TYPE = %ADDR:TYPE | ||
Line 43: | Line 50: | ||
ZIP = %ADDR:ZIP | ZIP = %ADDR:ZIP | ||
END INSERT</p> | END INSERT</p> | ||
==Field groups and Table B storage considerations== | ==Field groups and Table B storage considerations== | ||
<p> | |||
<p>At first glance, the performance improvement comes at a cost of space (but this can usually be offset).</p> | At first glance, the performance improvement of field groups comes at a cost of space (but this can usually be offset).</p> | ||
<p>The costs:</p> | <p>The costs:</p> | ||
<ul> | |||
<li>Every record in a <var>[[FILEORG parameter|FILEORG]]</var> X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.</li> | |||
< | <li>As is true of all <var>FILEORG</var> X'100' files, the field name representations (as held in [[Table A (File architecture)|Table A]] and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this might be offset by the physical absence of fields defined as part of the group.</li> | ||
</ul> | |||
===Space saving and default and null values=== | ===Space saving and default and null values=== | ||
<p> | |||
Depending on the characteristics of the data you are storing, using field groups might actually save space.</p> | |||
<p> | |||
Take as an example a financial application where you are tracking a number of types of income over time. You have a field group containing a date and a number of income types: salary, interest, dividends, and so forth. It is likely that the majority of people whose data you are keeping have only a few of the many types of interest, so your record might have mostly zeros.</p> | |||
<p> | |||
If so, by adding a [[Field design#DEFAULT-VALUE (DV) attribute|default value]] of 0 and a [[Field design#STORE-DEFAULT (SD) and STORE-NULL (SN) attributes|store default]] of NONE (the default for [[Field design#AT-MOST-ONE, REPEATABLE and EXACTLY-ONE attributes|exactly one]] fields in field groups), none of the 0s would be physically stored (but would be treated as if they were).</p> | |||
<p>For many files, implementing field groups will improve both performance and space utilization.</p> | |||
<p>For many files, | |||
==Field group identifiers== | |||
<p>As mentioned above, every field group has a unique ID.</p> | |||
<p>This ID can be read with the <var>[[$FieldgroupId]]</var> function, and it is also displayed by the <var>[[Basic SOUL statements and commands#PRINT ALL INFORMATION (or PAI) statement|Print All Information]]</var> statement.</p> | |||
<p>By itself, the ID is just for uniqueness, but it has an interesting use that is invaluable in debugging: you can determine the order in which things happened to a file.</p> | |||
[[Category:File architecture]] | [[Category:File architecture]] |
Latest revision as of 23:04, 1 April 2015
Overview
Data items do not normally exist on their own. In the case of addresses, for example, a series of fields exist for each address (house number, city, state, zip code) which clearly operate interdependently.
The idea of a physical field group is to take these related fields and package them as a single entity so they can be read and updated with a single operation. (See Field group design for details.)
Such packaging not only improves the readability and maintainability of SOUL programs (see Processing multiply occurring fields and field groups for the details of the SOUL statements involved), but it also makes the Model 204 processing far more efficient.
Performance
Code that accesses field groups performs better in three main areas:
- Improved record scanning
- Performing adds/inserts or deletes as a single operation
- Retrieval of all field groups on a record (requires only one I/O if the entire record is stored on a single page)
Improved record scanning
In general, much of the time spent in Model 204 processing is used to scan across a record, and this scan reads a field at a time. If a field is not the field that is needed, scanning moves to the next field, and so forth. As records grow in length, such a scan obviously takes longer.
With field groups, the scan skips a group rather than a field at a time, so it can find a specified data item faster.
In reality, however, the improvement is better than this. If you are looking for a number of related (but non-field-group) fields, a separate scan is required to find each field (insert rules mean that the data cannot be presumed to be contiguous). So if you are looking for six individual fields, the record must be scanned once for each.
If the related fields are packaged as a field group, a single scan of the record finds all the fields. In addition, all of the related data items are in a single location.
Adds/inserts and deletes are performed as single operations
Consider the following blocks of code:
INSERT ADDRESS_TYPE(%LOC) = %ADDR:TYPE INSERT HOUSE_NUMBER(%LOC) = %ADDR:NUM INSERT STREET(%LOC) = %ADDR:STREET INSERT CITY(%LOC) = %ADDR:CITY INSERT STATE(%LOC) = %ADDR:STATE INSERT ZIP(%LOC) = %ADDR:ZIP
Each of these statements is a separate operation. For each one, the correct position in the record must be found, then the field is inserted. This sequence is repeated six times.
Instead, with a field group, the following syntax does the entire insert as a single operation. Only one insert position needs to be found, and then the entire set of fields is inserted:
INSERT FIELDGROUP ADDRESS(%LOC) ADDRESS_TYPE = %ADDR:TYPE HOUSE_NUMBER = %ADDR:NUM STREET = %ADDR:STREET CITY = %ADDR:CITY STATE = %ADDR:STATE ZIP = %ADDR:ZIP END INSERT
Field groups and Table B storage considerations
At first glance, the performance improvement of field groups comes at a cost of space (but this can usually be offset).
The costs:
- Every record in a FILEORG X'100' file contains the 4-byte highest allocated field group ID. Every occurrence of a field group has a unique binary ID that occupies from two to five bytes, thus supporting up to four gigabytes of field group IDs.
- As is true of all FILEORG X'100' files, the field name representations (as held in Table A and thus in the record), are three bytes in length. Depending on the definitions of the fields within the group, this might be offset by the physical absence of fields defined as part of the group.
Space saving and default and null values
Depending on the characteristics of the data you are storing, using field groups might actually save space.
Take as an example a financial application where you are tracking a number of types of income over time. You have a field group containing a date and a number of income types: salary, interest, dividends, and so forth. It is likely that the majority of people whose data you are keeping have only a few of the many types of interest, so your record might have mostly zeros.
If so, by adding a default value of 0 and a store default of NONE (the default for exactly one fields in field groups), none of the 0s would be physically stored (but would be treated as if they were).
For many files, implementing field groups will improve both performance and space utilization.
Field group identifiers
As mentioned above, every field group has a unique ID.
This ID can be read with the $FieldgroupId function, and it is also displayed by the Print All Information statement.
By itself, the ID is just for uniqueness, but it has an interesting use that is invaluable in debugging: you can determine the order in which things happened to a file.