File loading techniques: Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
m (→‎Table D parameters: link repair)
 
(14 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<p>Good planning and correct technique are essential when loading files. The process of creating a file is complex, and failure to observe good practice can be both expensive and time consuming. </p>
<p>
<p>This chapter recommends a number of steps that you can follow to avoid problems while loading files. Techniques are also described for refining file parameter settings and space estimates based on the results of experiments made with sample data from the file being loaded. These experiments generally are not necessary if you know the exact format of the data in advance.    </p>
Good planning and correct technique are essential when loading files. The process of creating a file is complex, and failure to observe good practice can be both expensive and time consuming. </p>
<p>
This page recommends a number of steps that you can follow to avoid problems while loading files. Techniques are also described for refining file parameter settings and space estimates based on the results of experiments made with sample data from the file being loaded. These experiments generally are not necessary if you know the exact format of the data in advance.    </p>
 
==Verifying data formats and load programs==
==Verifying data formats and load programs==
<p>Rocket Software recommends that you take the following steps to verify that the input data is properly formatted and that the load program is functioning properly: </p>
<p>
Rocket Software recommends that you take the following steps to verify that the input data is properly formatted and that the load program is functioning properly: </p>
<ol>
<ol>
<li>Check the format of the input data, if possible, using the IBM IEBPTPCH utility or a sample file load program with the P statement. </li>
<li>Check the format of the input data, if possible, using the IBM IEBPTPCH utility or a sample file load program with the P statement. </li>
<li>Run a sample load to create 100 to 1000 <var class="product">Model&nbsp;204</var> records.
<li>Run a sample load to create 100 to 1000 <var class="product">Model&nbsp;204</var> records.
<p>If records are being loaded with a file load program, check the counters printed at the end of the program.</p>
<p>If records are being loaded with a file load program, check the counters printed at the end of the program.</p>
</li>
</li>
<li> After loading the records, issue the DISPLAY FIELD command:
<li> After loading the records, issue the DISPLAY FIELD command:
<p class="code">DISPLAY FIELD (DEFN) ALL     
<p class="code">DISPLAY FIELD (DEFN) ALL     
</p>
</p>
<p>And run a User Language request of the form:</p>
<p>
And run a [[SOUL]] request of the form:</p>
<p class="code">BEGIN
<p class="code">BEGIN
   GET.RECS: FIND ALL RECORDS
   GET.RECS: FIND ALL RECORDS
Line 21: Line 28:
END
END
</p>
</p>
<p>This request helps to verify that field names are properly defined and that the data actually loaded into the <var class="product">Model&nbsp;204</var> file looks right. A Host Language Interface program can be substituted to print the records.</p>
<p>
This request helps to verify that field names are properly defined and that the data actually loaded into the <var class="product">Model&nbsp;204</var> file looks right. A Host Language Interface program can be substituted to print the records.</p>
</li>
</li>
<li>Complete the load for the sample 100 to 1000 records by running the last four job steps of the deferred update sequence.</li>
<li>Complete the load for the sample 100 to 1000 records by running the last four job steps of the deferred update sequence.</li>
<li>Check NUMERIC RANGE retrieval fields for invalid input data. For example, if AGE and DATE have the NUMERIC RANGE retrieval option, this User Language request prints those values that are not valid numbers:  
 
<li>Check <var>NUMERIC RANGE</var> retrieval fields for invalid input data. For example, if <code>AGE</code> and <code>DATE</code> have the <var>NUMERIC RANGE</var> retrieval option, this SOUL request prints those values that are not valid numbers:  
<p class="code">BEGIN
<p class="code">BEGIN
FIND.AGE: FIND ALL RECORDS FOR WHICH
FIND.AGE: FIND ALL RECORDS FOR WHICH
AGE++ = NON NUMERICAL
  AGE++ = NON NUMERICAL
END FIND
END FIND
AGE.LOOP: FOR EACH RECORD IN FIND.AGE
AGE.LOOP: FOR EACH RECORD IN FIND.AGE
PRINT AGE
  PRINT AGE
SKIP 1 LINE
  SKIP 1 LINE
END FOR
END FOR
FIND.DATE: FIND ALL RECORDS FOR WHICH
FIND.DATE: FIND ALL RECORDS FOR WHICH
DATE++ = NON NUMERICAL
  DATE++ = NON NUMERICAL
END FIND
END FIND
DATE.LOOP: FOR EACH RECORD IN FIND.DATE
DATE.LOOP: FOR EACH RECORD IN FIND.DATE
PRINT DATE
  PRINT DATE
END FOR
END FOR
END
END
</p></li>
</p></li>
<li>Check fields that have both the BINARY and FRV attributes for noncompressible values with this request:
 
<li>Check fields that have both the <var>BINARY</var> and <var>FRV</var> attributes for noncompressible values with this request:
<p class="code">BEGIN
<p class="code">BEGIN
FOR.LOOP: FOR EACH VALUE OF <var class="term">fieldname
FOR.LOOP: FOR EACH VALUE OF <var class="term">fieldname</var>
</var>PRINT VALUE IN FOR.LOOP
  PRINT VALUE IN FOR.LOOP
END FOR
END FOR
END  
END  
</p></li>
</p></li>
</ol>
</ol>
==Estimating load times and working storage==
==Estimating load times and working storage==
<p>There is no exact formula for calculating how long it takes to load a file. The most reliable method for arriving at an estimate is to extrapolate from the CPU and elapsed time used by a sample load. The larger the sample, the better the estimate. For very large files, a sample that fills one segment (49,152 records) is recommended.  </p>
<p>
There is no exact formula for calculating how long it takes to load a file. The most reliable method for arriving at an estimate is to extrapolate from the CPU and elapsed time used by a sample load. The larger the sample, the better the estimate. For very large files, a sample that fills one segment (49,152 records) is recommended.  </p>
 
===Estimating the load time of a deferred update data set===
===Estimating the load time of a deferred update data set===
<p>The size of the deferred update data set can be estimated from the number of deferred update records produced by the sample load. If the deferred update data set is on tape, more than one volume can be specified in the JCL.  </p>
<p>
The size of the deferred update data set can be estimated from the number of deferred update records produced by the sample load. If the deferred update data set is on tape, more than one volume can be specified in the JCL.  </p>
 
===Estimating working storage requirements===
===Estimating working storage requirements===
<p>The size required for the sort work data sets can be estimated after the sample load, or it can be determined more accurately after each main load step from the actual number of deferred update records.  </p>
<p>
The size required for the sort work data sets can be estimated after the sample load, or it can be determined more accurately after each main load step from the actual number of deferred update records.  </p>
 
==Loading large files==
==Loading large files==
<p>For purposes of this discussion, a large file is one that contains more than one segment of records (that is, more than 49,152 records). Load files one segment at a time. This ensures the most efficient use of Table D space and provides convenient recovery points. Load files containing less than one segment all at once.  </p>
<p>
For purposes of this discussion, a large file is one that contains more than one segment of records (that is, more than 49,152 records). Load files one segment at a time. This ensures the most efficient use of Table D space and provides convenient recovery points. Load files containing less than one segment all at once.  </p>
 
===Loading files with ORDERED fields===
===Loading files with ORDERED fields===
<p>For very large files that have ORDERED fields, retrievals are most efficient for the ORDERED fields if the file is loaded multiple segments at a time. There is no advantage to loading more than 16 segments at a time.  </p>
<p>
<p>Take the following steps for each segment:</p>
For very large files that have <var>ORDERED</var> fields, retrievals are most efficient for the <var>ORDERED</var> fields if the file is loaded multiple segments at a time. There is no advantage to loading more than 16 segments at a time.  </p>
<p>
Take the following steps for each segment:</p>
<ol>
<ol>
<li>Load the data into Tables A and B (Phase one).</li>
<li>Load the data into Tables A and B (Phase one).</li>
<li>Dump the file (see [[File Dumping and Restoring]]). The first step of the File Load sequence is the most expensive. Dumping after this step provides a backup in case of failures in the later index steps.</li>
 
<li>Dump the file (see [[File dumping and restoring]]). The first step of the File Load sequence is the most expensive. Dumping after this step provides a backup in case of failures in the later index steps.</li>
 
<li>Run all the sort and index steps (Phase two).</li>
<li>Run all the sort and index steps (Phase two).</li>
<li>Dump the file again. This preserves all the work done so far in case of an error in the main load step for the next segment. </li>
<li>Dump the file again. This preserves all the work done so far in case of an error in the main load step for the next segment. </li>
</ol>
</ol>
<p>Keep the output data set of each sort until the segment has been completely loaded and dumped. If Table D fills up during an index step, the file can be restored as of the end of the last main load step. After Table D is increased, the index step can be rerun without the sort. If Table C fills up, the entire File Load must be rerun with a larger value for CSIZE.     </p>
<p>
Keep the output data set of each sort until the segment has been completely loaded and dumped. If Table D fills up during an index step, the file can be restored as of the end of the last main load step. After Table D is increased, the index step can be rerun without the sort. If Table C fills up, the entire File Load must be rerun with a larger value for <var>[[CSIZE parameter|CSIZE]]</var>. </p>
 
==Refining the file parameter settings==
==Refining the file parameter settings==
<p>After sample records have been loaded, a number of file parameters can be reevaluated based on actual data. A few parameters can be viewed to determine whether any incorrect estimates have been made. These parameters are grouped by file table and are described in the following sections.For more detailed information about setting the file parameters, see [[File Size Calculation|File Size Calculation]] .    </p>
<p>
After sample records have been loaded, a number of file parameters can be reevaluated based on actual data. A few parameters can be viewed to determine whether any incorrect estimates have been made. These parameters are grouped by file table and are described in the following sections.For more detailed information about setting the file parameters, see [[File size calculation in detail]].    </p>
 
==Loading Large Object data==
==Loading Large Object data==
<p>A <var class="product">Model&nbsp;204</var> file that contains Large Object data can be reorganized using the standard PAI and FLOD procedure. This is identical to previous releases of <var class="product">Model&nbsp;204</var>, and no change to your existing unload or load code is required.</p>
<p>
<p>To display complete Large Object data, you must use a PAI statement, not a PAI INTO statement.</p>
A <var class="product">Model&nbsp;204</var> file that contains [[Field design#BLOB, CLOB, and MINLOBE attributes|Large Object]] data can be reorganized using the standard PAI and FLOD procedure. This is identical to previous releases of <var class="product">Model&nbsp;204</var>, and no change to your existing unload or load code is required.</p>
<p>The output from the PAI statement for a field that is not Large Object data looks like this:</p>
<p>
To display complete Large Object data, you must use a <var>PAI</var> statement, not a <var>PAI INTO</var> statement.</p>
<p>
The output from the <var>PAI</var> statement for a field that is not Large Object data looks like this:</p>
<p class="code">FIELDNAME = FIELDVALUE
<p class="code">FIELDNAME = FIELDVALUE
</p>
</p>
<p>The output from the PAI statement for a Large Object field looks like this:</p>
<p>
The output from the <var>PAI</var> statement for a Large Object field looks like this:</p>
<p class="code">FIELDNAME = &lt;descriptor>
<p class="code">FIELDNAME = &lt;descriptor>
LOB data line 1
LOB data line 1
Line 82: Line 115:
LOB data line n  
LOB data line n  
</p>
</p>
<p>When you store a Large Object value in Table E, a Large Object descriptor is stored in Table B. The descriptor contains the length, reserve, and a pointer to the Large Object data in the Table E page. The descriptor is an internal <var class="product">Model&nbsp;204</var> structure. Usually it is of no concern to the File Manager or FLOD programmer, as the descriptor is built by the PAI statement that unloaded the data.</p>
<p>
<p>However, if loading Large Object data from an external source you may need to build the Large Object data descriptor be stored in Table B. </p>
When you store a Large Object value in Table E, a Large Object descriptor is stored in Table B. The descriptor contains the length, reserve, and a pointer to the Large Object data in the Table E page. The descriptor is an internal <var class="product">Model&nbsp;204</var> structure. Usually it is of no concern to the File Manager or FLOD programmer, as the descriptor is built by the <var>PAI</var> statement that unloaded the data.</p>
<p>The actual Large Object data is on subsequent records; as many as are required to hold the Large Object value. This depends on the OUTMRL string of the output data set to which the PAI output is directed.</p>
<p>
<p>When a FLOD executes a statement to load a field value, it checks whether the field is defined as Large Object, and if so the Large Object fieldname=value pair is created and stored in Table B. </p>
However, if loading Large Object data from an external source you may need to build the Large Object data descriptor be stored in Table B. </p>
<p>
The actual Large Object data is on subsequent records; as many as are required to hold the Large Object value. This depends on the OUTMRL string of the output data set to which the PAI output is directed.</p>
<p>
When a FLOD executes a statement to load a field value, it checks whether the field is defined as Large Object, and if so the Large Object fieldname=value pair is created and stored in Table B. </p>
 
==Partial file loads==
==Partial file loads==
<p>In some applications, only a portion of the records is available when the file is initially loaded. In some cases, records are added later in large batches; in other cases, records are added Online. When such files are created, set most parameters to the values they would have if the file were to be fully loaded. BSIZE and DSIZE might be set lower so that disk storage is not tied up until actually needed.  </p>
<p>
In some applications, only a portion of the records is available when the file is initially loaded. In some cases, records are added later in large batches; in other cases, records are added Online. When such files are created, set most parameters to the values they would have if the file were to be fully loaded. <var>[[BSIZE parameter|BSIZE]]</var> and <var>[[DSIZE parameter|DSIZE]]</var> might be set lower so that disk storage is not tied up until actually needed.  </p>
 
==Table A parameters==
==Table A parameters==
<p>If the size of Table A (ATRPG + FVFPG + MVFPG) is too small, the value of ARETRIES is nonzero. The ARETRIES parameter (Table A page retries) indicates when one or more sections of Table A is filling up. </p>
<p>
If the size of Table A (<code>ATRPG + FVFPG + MVFPG</code>) is too small, the value of <var>[[ARETRIES parameter|ARETRIES]]</var> is nonzero. The <var>ARETRIES</var> parameter (Table A page retries) indicates when one or more sections of Table A is filling up. </p>
 
====ARETRIES parameter====
====ARETRIES parameter====
<p>ARETRIES is a counter that is incremented each time <var class="product">Model&nbsp;204</var> fails in an attempt to store an item on a page and attempts to store the item on another page. Without reevaluating the input data, you cannot determine which section is filling up. Each retry indicates a slight degradation in update or retrieval efficiency for field names or for coded or FRV fields. </p>
<p>
<p>If the ATRPG parameter is greater than 1, page retries may indicate inefficiency of all field name processing. If ARETRIES is very high (near ASIZE) and new field names or values are anticipated in the file, increase the Table A size parameters to avoid an eventual table full condition.   </p>
<var>ARETRIES</var> is a counter that is incremented each time <var class="product">Model&nbsp;204</var> fails in an attempt to store an item on a page and attempts to store the item on another page. Without reevaluating the input data, you cannot determine which section is filling up. Each retry indicates a slight degradation in update or retrieval efficiency for field names or for coded or FRV fields. </p>
<p>
If the <var>[[ATRPG parameter|ATRPG]]</var> parameter is greater than 1, page retries may indicate inefficiency of all field name processing. If <var>ARETRIES</var> is very high (near <var>[[ASIZE parameter|ASIZE]]</var>) and new field names or values are anticipated in the file, increase the Table A size parameters to avoid an eventual table full condition. </p>
 
==Table B parameters==
==Table B parameters==
<p>The BRECPPG parameter specifies the number of Table B records per page and the BRESERVE parameter specifies the Table B reserved space per page. You can view the value of these parameters with the TABLEB command.</p>
<p>
The <var>[[BRECPPG parameter|BRECPPG]]</var> parameter specifies the number of Table B records per page and the <var>[[BRESERVE parameter|BRESERVE]]</var> parameter specifies the Table B reserved space per page. You can view the value of these parameters with the <var>[[TABLEB command|TABLEB]]</var> command.</p>
 
===Viewing records and free space with the TABLEB command===
===Viewing records and free space with the TABLEB command===
<p>The TABLEB command displays information about the records and free space in Table B. The TABLEB output always includes the following items:     </p>
<p>
The <var>TABLEB</var> command displays information about the records and free space in Table B. The <var>TABLEB</var> output always includes the following items: </p>
<ul>
<ul>
<li>Average free space per page</li>
<li>Average free space per page</li>
<li>Average free record slots per page</li>
<li>Average free record slots per page</li>
<li>Number of pages processed</li>
<li>Number of pages processed</li>
<li>Value of BRECPPG (number of Table B records per page) </li>
<li>Value of <var>BRECPPG</var> (number of Table B records per page) </li>
<li>Value of BRESERVE parameter (Table B reserved space per page)</li>
<li>Value of <var>BRESERVE</var> parameter (Table B reserved space per page)</li>
</ul>
</ul>
<p>The format of TABLEB is:  </p>
<p>
====Syntax====
The format of <var>TABLEB</var> is:  </p>
<p class="code">TABLEB [LIST | OVERFLOW | MASTER | RECORD LENGTH  
 
       | PAGE <var class="term">page_number</var> | PAGES <var class="term">page_number1</var>  
<p class="syntax">TABLEB [LIST | OVERFLOW | MASTER | RECORD LENGTH  
       TO <var class="term">page_number2</var>]
       | PAGE <span class="term">page_number</span> | PAGES <span class="term">page_number1</span>  
       TO <span class="term">page_number2</span>]
</p>
</p>
<p>where:</p>
<p>
<p>LIST produces one line for each page in Table B. If no other option is specified, TABLEB scans all of Table B, starting with the lowest active page and ending with the highest one. For each page, the page number, amount of free space, and number of free slots are shown. This detailed listing is followed by the list of averages and totals specified above. </p>
Where:</p>
<p>If the file being scanned is sorted, each printed detail line includes an indication of the type of page that the line describes. The OVERFLOW and MASTER options list only pages from the OVERFLOW or MASTER area of a sorted file.</p>
<ul>
<p>PAGE and PAGES options allow you to set the range of Table B pages to scan. If any of the pages specified are inactive, an error message is issued and the TABLEB command is ignored.</p>
<li><var>LIST</var> produces one line for each page in Table B. If no other option is specified, <var>TABLEB</var> scans all of Table B, starting with the lowest active page and ending with the highest one. For each page, the page number, amount of free space, and number of free slots are shown. This detailed listing is followed by the list of averages and totals specified above.  
<p>RECORD LENGTH (which can be abbreviated as RECLEN) calculates the average length of a logical record (a record that includes al its extensions) in Table B. </p>
<p>
<p>This length is exact unless DELETE RECORDS or IFDSET is used in the file. In this case, RECLEN is too long, because space from the deleted records is never released.   RECLEN is valid only if all of Table B is scanned (that is, only if you do not specify the PAGE or PAGES options). </p>
If the file being scanned is sorted, each printed detail line includes an indication of the type of page that the line describes. The <var>OVERFLOW</var> and <var>MASTER</var> options list only pages from the OVERFLOW or MASTER area of a sorted file.</p></li>
 
<li><var>PAGE</var> and <var>PAGES</var> allow you to set the range of Table B pages to scan. If any of the pages specified are inactive, an error message is issued and the <var>TABLEB</var> command is ignored.</li>
 
<li><var>RECORD LENGTH</var> (which can be abbreviated as <var>RECLEN</var>) calculates the average length of a logical record (a record that includes all of its extensions) in Table B.  
<p>
This length is exact unless <var>DELETE RECORDS</var> or <var>IFDSET</var> is used in the file. In this case, <var>RECLEN</var> is too long, because space from the deleted records is never released. </p>
<p>
<var>RECLEN</var> is valid only if all of Table B is scanned (that is, only if you do not specify the <var>PAGE</var> or <var>PAGES</var> options). </p></li>
</ul>
 
===Checking Table B parameters===
===Checking Table B parameters===
<p>The two basic goals in loading Table B properly are:</p>
<p>
The two basic goals in loading Table B properly are:</p>
<table>
<table>
<tr class="head">
<tr class="head">
Line 124: Line 184:
<th>Then...</th>
<th>Then...</th>
</tr>
</tr>
<tr>
<tr>
<td>Wasting record numbers</td>
<td>Wasting record numbers</td>
<td>Number of free slots per page shown by the TABLEB command is not zero</td>
<td>Number of free slots per page shown by the <var>TABLEB</var> command is not zero.</td>
<td>
<td>
<p>Record numbers are being wasted.   </p>
<p>Record numbers are being wasted. </p>
<p>&nbsp;</p>
<p>&nbsp;</p>
</td>
</td>
</tr>
</tr>
<tr>
<tr>
<td>Creating extension records</td>
<td>Creating extension records</td>
<td>If, after loading Table B, the EXTNADD is <var class="term">not</var> zero</td>
<td>If, after loading Table B, the <var>[[EXTNADD parameter|EXTNADD]]</var> value is <var class="term">not</var> zero</td>
<td>Extension records have been created. </td>
<td>Extension records have been created. </td>
</tr>
</tr>
</table>
</table>
<p>The following section discusses how to prevent these two conditions.</p>
<p>
The following section discusses how to prevent these two conditions.</p>
 
====Preventing wasted record numbers====
====Preventing wasted record numbers====
<p>You can recover wasted record numbers in a file by setting BRECPPG to a smaller value or, less often, BRESERVE. </p>
<p>
<p>If, according to the TABLEB output, space is being wasted on each Table B page and no record numbers have been skipped, then BRECPPG might be set too low.</p>
You can recover wasted record numbers in a file by setting <var>BRECPPG</var> to a smaller value or, less often, <var>BRESERVE</var>. </p>
<p>
If, according to the <var>TABLEB</var> output, space is being wasted on each Table B page and no record numbers have been skipped, then <var>BRECPPG</var> might be set too low.</p>
 
====Preventing extension records====
====Preventing extension records====
<p>If the value of EXTNADD is too high (that is, if you have too many extension records) reload the file with a smaller BRECPPG or a larger BRESERVE.   </p>
<p>
<p>Note</p>
If the value of <var>EXTNADD</var> is too high (that is, if you have too many extension records) reload the file with a smaller <var>BRECPPG</var> or a larger <var>BRESERVE</var>. </p>
<p>If a few records within the file are much longer than the average record, it might be most efficient to allow the longest records to create extension records. For more information about sizing BRESERVE to avoid extension records, see [[ File Size Calculation#Computing BRESERVE (reserved Table B space)|Computing BRESERVE (reserved Table B space)]].</p>
<p class="note"><b>Note:</b>
If a few records within the file are much longer than the average record, it might be most efficient to allow the longest records to create extension records. For more information about sizing <var>BRESERVE</var> to avoid extension records, see [[File size calculation in detail#Computing BRESERVE (reserved Table B space)|Computing BRESERVE (reserved Table B space)]].</p>
 
===Sizing Table B===
===Sizing Table B===
<p>This section describes the BSIZE and BHIGHPG parameters. For more information about sizing Table B, see [[ Managing File and Table Size Manually#Overview|Overview]] .</p>
<p>
This section describes the <var>BSIZE</var> and <var>BHIGHPG</var> parameters. For more information about sizing Table B, see [[File size calculation in detail#Sizing Table B|Sizing Table B]].</p>
 
====Using BHIGHPG to size BSIZE====
====Using BHIGHPG to size BSIZE====
<p>To check your estimate of BSIZE, look at the BHIGHPG parameter. In ordinary and sorted files, BHIGHPG is the number of the last page used, one less than the number of pages used. If this number is significantly different from the estimate of space required for the sample, then adjust BSIZE accordingly. </p>
<p>
To check your estimate of <var>BSIZE</var>, look at the <var>BHIGHPG</var> parameter. In ordinary and sorted files, <var>BHIGHPG</var> is the number of the last page used, one less than the number of pages used. If this number is significantly different from the estimate of space required for the sample, then adjust <var>BSIZE</var> accordingly. </p>
 
====Checking for overflow records in sorted files====
====Checking for overflow records in sorted files====
<p>In sorted files, examine OVFLADD (overflow records) and EOVFLADD (extra overflow records). The values of both of these parameters should be zero after the initial load. If they are not, either the last master area has been filled or, more likely, the input data was not properly sorted. The former condition occurs only if BHIGHPG is one less than BSIZE. For more information about sorted files, see [[Defining Fields Manually]]. </p>
<p>
In sorted files, examine <var>[[OVFLADD parameter|OVFLADD]]</var> (overflow records) and <var>[[EOVFLADD parameter|EOVFLADD]]</var> (extra overflow records). The values of both of these parameters should be zero after the initial load. If they are not, either the last master area has been filled or, more likely, the input data was not properly sorted. The former condition occurs only if BHIGHPG is one less than BSIZE. For more information about sorted files, see [[Defining fields manually]]. </p>
 
====Checking for spill records in hash key files====
====Checking for spill records in hash key files====
<p>Check SPILLADD (spills) in hash key files. Spill records are retrieved by their hash keys more slowly than other records. If the number of spills is not zero, raise BRECPPG, BSIZE, or both. Only raise BRECPPG if the TABLEB command shows pages that have no unused record slots but lots of available space.     </p>
<p>
Check <var>[[SPILLADD parameter|SPILLADD]]</var> (spills) in hash key files. Spill records are retrieved by their hash keys more slowly than other records. If the number of spills is not zero, raise <var>BRECPPG</var>, <var>BSIZE</var>, or both. Only raise <var>BRECPPG</var> if the <var>TABLEB</var> command shows pages that have no unused record slots but lots of available space. </p>
 
==Table C parameters==
==Table C parameters==
<p>To monitor Table C, use the CRETRIES parameter and the TABLEC command.Their use is discussed below.  </p>
<p>
To monitor Table C, use the <var>[[CRETRIES parameter|CRETRIES]]</var> parameter and the <var>[[TABLEC command|TABLEC]]</var> command, as described in the following sections. </p>
 
===Monitoring Table C with the CRETRIES parameter===
===Monitoring Table C with the CRETRIES parameter===
<p>As in Table A, a page retry statistic (the CRETRIES parameter) is provided. Each retry indicates a slight degradation in indexed retrieval speeds. Perhaps more significantly, page retries are a signal that Table C is becoming crowded. Although these figures vary from file to file, experiments have produced results similar to those in [[#Monitoring Table C with the CRETRIES parameter|Monitoring Table C with the CRETRIES parameter]] (from a file with CSIZE = 10).   </p>
<p>
<b>Table C space vs. page retries</b>
As in Table A, a page retry statistic (the <var>CRETRIES</var> parameter) is provided. Each retry indicates a slight degradation in indexed retrieval speeds. Perhaps more significantly, page retries are a signal that Table C is becoming crowded. Although these figures vary from file to file, experiments have produced results similar to those in [[#Monitoring Table C with the CRETRIES parameter|Monitoring Table C with the CRETRIES parameter]] (from a file with <code>CSIZE=10</code>). </p>
[[File:_File_Loading_Techniques_2.gif]]
 
<p>As shown in [[#Monitoring Table C with the CRETRIES parameter|Monitoring Table C with the CRETRIES parameter]], page retries usually are not significant when utilization is below 70 percent. At approximately 72 percent, the number of retries begins to grow more rapidly as utilization increases (new KEY values are added). The point at which retries sharply begin to increase is 80 percent; at 90 percent the slope further increases. A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies.</p>
<p class="caption" style="width:405px">Table C space vs. page retries</p>
<p class="figure">[[File:File_Loading_Techniques_2.gif]] </p>
<p>
As shown in the figure above, page retries usually are not significant when utilization is below 70 percent. At approximately 72 percent, the number of retries begins to grow more rapidly as utilization increases (new <var>KEY</var> values are added). The point at which retries sharply begin to increase is 80 percent; at 90 percent the slope further increases. A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies.</p>
 
====Utilization of Table C====
====Utilization of Table C====
<p>Table C never can be 100 percent utilized. A Table C full condition usually is reached when utilization exceeds 85 percent. Utilization of up to 95 percent has been achieved, but the precise point varies at which a Table C full condition results. </p>
<p>
Table C never can be 100 percent utilized. A Table C full condition usually is reached when utilization exceeds 85 percent. Utilization of up to 95 percent has been achieved, but the precise point varies at which a Table C full condition results. </p>
 
===Table C space requirements===
===Table C space requirements===
<p>New KEY values require only two slots in Table C for small files. If the file size multiplier (discussed in [[ File Size Calculation#Computing the file size multiplier (N)|Computing the file size multiplier (N)]]) is greater than 1, KEY values might require more than two slots. Therefore, at a rate dependent upon the level and nature of update activity, the number of slots used increases.</p>
<p>
New <var>KEY</var> values require only two slots in Table C for small files. If the file size multiplier, discussed in [[File size calculation in detail#Computing the file size multiplier (N)|Computing the file size multiplier (N)]], is greater than 1, <var>KEY</var> values might require more than two slots. Therefore, at a rate dependent upon the level and nature of update activity, the number of slots used increases.</p>
 
===TABLEC command===
===TABLEC command===
<p>Use the TABLEC command to see what percentage of Table C space actually is used.The TABLEC command displays the following information about Table C usage: </p>
<p>
Use the <var>[[TABLEC command|TABLEC]]</var> command to see what percentage of Table C space actually is used.The TABLEC command displays the following information about Table C usage: </p>
<ul>
<ul>
<li>Number of slots available in Table C</li>
<li>Number of slots available in Table C</li>
Line 176: Line 264:
<li>Percentage of property entries with overflow </li>
<li>Percentage of property entries with overflow </li>
</ul>
</ul>
<p>For example: </p>
<p>
For example: </p>
<p class="code"><var>OPEN CENSUS
<p class="code"><var>OPEN CENSUS
TABLEC</var>
TABLEC</var>
Line 188: Line 277:
M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33
M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33
</p>
</p>
<p>For more information about the TABLEC command, see the <var class="product">Model&nbsp;204</var> Parameter and Command Reference.</p>
 
===Detecting overflows with the TABLEC command===
===Detecting overflows with the TABLEC command===
<p>The TABLEC command reports on property entry overflows vs. number of pages with spills. As described in, [[ File Size Calculation#Sizing Table C|Sizing Table C]], Table C contains property entries and segment entries. </p>
<p>
The <var>TABLEC</var> command reports on property entry overflows vs. number of pages with spills. As described in, [[File size calculation in detail#Sizing Table C|Sizing Table C]], Table C contains property entries and segment entries. </p>
 
====About overflows====
====About overflows====
<p>A field name = value pair stored in Table C normally consists of one property entry for the pair, and as many segment entries as there are segments having records with that field value. If there is no space available to store an additional segment entry on the page where the property entry resides, the segment entry is created on the next page that has sufficient space. </p>
<p>
<p>When this occurs, the property entry page is said to have spilled. <var class="product">Model&nbsp;204</var> creates a duplicate property entry on the page spilled onto along with the segment entry placed there. The original property entry is updated to point to this new property entry as an overflow entry.     </p>
A field name = value pair stored in Table C normally consists of one property entry for the pair, and as many segment entries as there are segments having records with that field value. If there is no space available to store an additional segment entry on the page where the property entry resides, the segment entry is created on the next page that has sufficient space. </p>
<p>The effects of this overflow are:</p>
<p>
When this occurs, the property entry page is said to have spilled. <var class="product">Model&nbsp;204</var> creates a duplicate property entry on the page spilled onto along with the segment entry placed there. The original property entry is updated to point to this new property entry as an overflow entry. </p>
<p>
The effects of this overflow are:</p>
<ul>
<ul>
<li>Additional space is required for the overflow property entry.</li>
<li>Additional space is required for the overflow property entry.</li>
<li>Additional I/O is involved in retrieving the entries during FIND processing. </li>
<li>Additional I/O is involved in retrieving the entries during <var>FIND</var> processing. </li>
</ul>
</ul>
====Decreasing the number of overflows====
====Decreasing the number of overflows====
<p>Because property entry distribution in Table C is entirely dependent on the data values and the hashing algorithm, avoiding overflows might be impossible. However, increasing the size of Table C, which requires recreating the file, might provide a greater dispersal of the Table C entries and, therefore, fewer overflows. As Table C fills and overflows increase, you might have to choose between disk space requirements (not recreating the file) and FIND processing performance (recreating the file).</p>
<p>
<p>The number of pages with spills, reported by TABLEC, indicates how uniformly the Table C space is being filled. If the percentage of property entries having overflow is high, and the number of pages with spills is also high, then the file is apparently filling uniformly and might become full if substantial additions to Table C are processed. In this case, you might need to re-create the file and increase Table C.</p>
Because property entry distribution in Table C is entirely dependent on the data values and the hashing algorithm, avoiding overflows might be impossible. However, increasing the size of Table C, which requires recreating the file, might provide a greater dispersal of the Table C entries and, therefore, fewer overflows. As Table C fills and overflows increase, you might have to choose between disk space requirements (not recreating the file) and <var>FIND</var> processing performance (recreating the file).</p>
<p>If the percentage of property entries having overflow is high, but the number of pages with spills is low, then a "hot spot" or "cluster" exists due to many of the data values hashing to the same page(s). In this case, the file might not be in danger of filling, but performance might be further degraded by the cluster. Experimenting with alternative Table C sizes might alleviate the cluster, but some clusters are probably unavoidable.</p>
<p>
<p>In the previous sample TABLEC output, both the percentage of overflows and the number of page spills are relatively high. If substantial additional updating to Table C is anticipated, the percentage of Table C used climbs, and any performance problems caused by too many overflows might be compounded by excessive page retries. If the concern for performance outweighs the need for disk space, re-create the file with a larger Table C.</p>
The number of pages with spills, reported by <var>TABLEC</var>, indicates how uniformly the Table C space is being filled. If the percentage of property entries having overflow is high, and the number of pages with spills is also high, then the file is apparently filling uniformly and might become full if substantial additions to Table C are processed. In this case, you might need to re-create the file and increase Table C.</p>
<p>In the following sample output, the percentage of overflows is relatively high but the number of page spills is low and the values are clustered. What performance degradation exists might or might not be improved by increasing Table C:</p>
<p>
If the percentage of property entries having overflow is high, but the number of pages with spills is low, then a "hot spot" or "cluster" exists due to many of the data values hashing to the same page(s). In this case, the file might not be in danger of filling, but performance might be further degraded by the cluster. Experimenting with alternative Table C sizes might alleviate the cluster, but some clusters are probably unavoidable.</p>
<p>
In the previous sample <var>TABLEC</var> output, both the percentage of overflows and the number of page spills are relatively high. If substantial additional updating to Table C is anticipated, the percentage of Table C used climbs, and any performance problems caused by too many overflows might be compounded by excessive page retries. If the concern for performance outweighs the need for disk space, re-create the file with a larger Table C.</p>
<p>
In the following sample output, the percentage of overflows is relatively high but the number of page spills is low and the values are clustered. What performance degradation exists might or might not be improved by increasing Table C:</p>
<p class="code">NUMBER OF SLOTS = 102400
<p class="code">NUMBER OF SLOTS = 102400
SLOTS USED = 76800
SLOTS USED = 76800
Line 213: Line 313:
THE NUMBER OF PAGES WITH SPILLS IS 2
THE NUMBER OF PAGES WITH SPILLS IS 2
</p>
</p>
==Table D parameters==
==Table D parameters==
<p>Only one parameter can be viewed to check the Table D size estimate. DPGSUSED shows the number of Table D pages used and can be compared directly to estimates of DSIZE for the sample loaded. The amount of Table D used can be accurately determined only after a complete segment has been loaded. </p>
<p>
<p>Table D usage does not increase linearly within a segment. </p>
Only one parameter can be viewed to check the Table D size estimate. <var>[[DPGSUSED parameter|DPGSUSED]]</var> shows the number of Table D pages used and can be compared directly to estimates of <var>DSIZE</var> for the sample loaded. The amount of Table D used can be accurately determined only after a complete segment has been loaded. </p>
<p>
Table D usage does not increase linearly within a segment. </p>
 
====OINODES parameter====
====OINODES parameter====
<p>The estimate for the Ordered Index B-tree node space (see [[ File Size Calculation#Calculating the file size multiplier example|Calculating the file size multiplier example]]) can be checked by viewing the OINODES parameter. OINODES shows the number of Table D pages used for the Ordered Index B-tree. The exact number of pages necessary for the Ordered Index B-tree can be accurately determined only after each of the values of all the ORDERED fields has been loaded.       </p>
<p>
<p>&nbsp;</p>
The estimate for the Ordered Index B-tree node space (see [[File size calculation in detail#Calculating Ordered Index space|Calculating Ordered Index space]]) can be checked by viewing the <var>[[OINODES parameter|OINODES]]</var> parameter. <var>OINODES</var> shows the number of Table D pages used for the Ordered Index B-tree. The exact number of pages necessary for the Ordered Index B-tree can be accurately determined only after each of the values of all the <var>ORDERED</var> fields has been loaded. </p>


[[Category:File management]]
[[Category:File loading and reorganization]]

Latest revision as of 20:14, 23 April 2018

Good planning and correct technique are essential when loading files. The process of creating a file is complex, and failure to observe good practice can be both expensive and time consuming.

This page recommends a number of steps that you can follow to avoid problems while loading files. Techniques are also described for refining file parameter settings and space estimates based on the results of experiments made with sample data from the file being loaded. These experiments generally are not necessary if you know the exact format of the data in advance.

Verifying data formats and load programs

Rocket Software recommends that you take the following steps to verify that the input data is properly formatted and that the load program is functioning properly:

  1. Check the format of the input data, if possible, using the IBM IEBPTPCH utility or a sample file load program with the P statement.
  2. Run a sample load to create 100 to 1000 Model 204 records.

    If records are being loaded with a file load program, check the counters printed at the end of the program.

  3. After loading the records, issue the DISPLAY FIELD command:

    DISPLAY FIELD (DEFN) ALL

    And run a SOUL request of the form:

    BEGIN GET.RECS: FIND ALL RECORDS END FIND FOR.LOOP: FOR 10 RECORDS IN GET.RECS SKIP 1 LINE PRINT ALL INFORMATION END FOR END

    This request helps to verify that field names are properly defined and that the data actually loaded into the Model 204 file looks right. A Host Language Interface program can be substituted to print the records.

  4. Complete the load for the sample 100 to 1000 records by running the last four job steps of the deferred update sequence.
  5. Check NUMERIC RANGE retrieval fields for invalid input data. For example, if AGE and DATE have the NUMERIC RANGE retrieval option, this SOUL request prints those values that are not valid numbers:

    BEGIN FIND.AGE: FIND ALL RECORDS FOR WHICH AGE++ = NON NUMERICAL END FIND AGE.LOOP: FOR EACH RECORD IN FIND.AGE PRINT AGE SKIP 1 LINE END FOR FIND.DATE: FIND ALL RECORDS FOR WHICH DATE++ = NON NUMERICAL END FIND DATE.LOOP: FOR EACH RECORD IN FIND.DATE PRINT DATE END FOR END

  6. Check fields that have both the BINARY and FRV attributes for noncompressible values with this request:

    BEGIN FOR.LOOP: FOR EACH VALUE OF fieldname PRINT VALUE IN FOR.LOOP END FOR END

Estimating load times and working storage

There is no exact formula for calculating how long it takes to load a file. The most reliable method for arriving at an estimate is to extrapolate from the CPU and elapsed time used by a sample load. The larger the sample, the better the estimate. For very large files, a sample that fills one segment (49,152 records) is recommended.

Estimating the load time of a deferred update data set

The size of the deferred update data set can be estimated from the number of deferred update records produced by the sample load. If the deferred update data set is on tape, more than one volume can be specified in the JCL.

Estimating working storage requirements

The size required for the sort work data sets can be estimated after the sample load, or it can be determined more accurately after each main load step from the actual number of deferred update records.

Loading large files

For purposes of this discussion, a large file is one that contains more than one segment of records (that is, more than 49,152 records). Load files one segment at a time. This ensures the most efficient use of Table D space and provides convenient recovery points. Load files containing less than one segment all at once.

Loading files with ORDERED fields

For very large files that have ORDERED fields, retrievals are most efficient for the ORDERED fields if the file is loaded multiple segments at a time. There is no advantage to loading more than 16 segments at a time.

Take the following steps for each segment:

  1. Load the data into Tables A and B (Phase one).
  2. Dump the file (see File dumping and restoring). The first step of the File Load sequence is the most expensive. Dumping after this step provides a backup in case of failures in the later index steps.
  3. Run all the sort and index steps (Phase two).
  4. Dump the file again. This preserves all the work done so far in case of an error in the main load step for the next segment.

Keep the output data set of each sort until the segment has been completely loaded and dumped. If Table D fills up during an index step, the file can be restored as of the end of the last main load step. After Table D is increased, the index step can be rerun without the sort. If Table C fills up, the entire File Load must be rerun with a larger value for CSIZE.

Refining the file parameter settings

After sample records have been loaded, a number of file parameters can be reevaluated based on actual data. A few parameters can be viewed to determine whether any incorrect estimates have been made. These parameters are grouped by file table and are described in the following sections.For more detailed information about setting the file parameters, see File size calculation in detail.

Loading Large Object data

A Model 204 file that contains Large Object data can be reorganized using the standard PAI and FLOD procedure. This is identical to previous releases of Model 204, and no change to your existing unload or load code is required.

To display complete Large Object data, you must use a PAI statement, not a PAI INTO statement.

The output from the PAI statement for a field that is not Large Object data looks like this:

FIELDNAME = FIELDVALUE

The output from the PAI statement for a Large Object field looks like this:

FIELDNAME = <descriptor> LOB data line 1 LOB data line 2 ... LOB data line n

When you store a Large Object value in Table E, a Large Object descriptor is stored in Table B. The descriptor contains the length, reserve, and a pointer to the Large Object data in the Table E page. The descriptor is an internal Model 204 structure. Usually it is of no concern to the File Manager or FLOD programmer, as the descriptor is built by the PAI statement that unloaded the data.

However, if loading Large Object data from an external source you may need to build the Large Object data descriptor be stored in Table B.

The actual Large Object data is on subsequent records; as many as are required to hold the Large Object value. This depends on the OUTMRL string of the output data set to which the PAI output is directed.

When a FLOD executes a statement to load a field value, it checks whether the field is defined as Large Object, and if so the Large Object fieldname=value pair is created and stored in Table B.

Partial file loads

In some applications, only a portion of the records is available when the file is initially loaded. In some cases, records are added later in large batches; in other cases, records are added Online. When such files are created, set most parameters to the values they would have if the file were to be fully loaded. BSIZE and DSIZE might be set lower so that disk storage is not tied up until actually needed.

Table A parameters

If the size of Table A (ATRPG + FVFPG + MVFPG) is too small, the value of ARETRIES is nonzero. The ARETRIES parameter (Table A page retries) indicates when one or more sections of Table A is filling up.

ARETRIES parameter

ARETRIES is a counter that is incremented each time Model 204 fails in an attempt to store an item on a page and attempts to store the item on another page. Without reevaluating the input data, you cannot determine which section is filling up. Each retry indicates a slight degradation in update or retrieval efficiency for field names or for coded or FRV fields.

If the ATRPG parameter is greater than 1, page retries may indicate inefficiency of all field name processing. If ARETRIES is very high (near ASIZE) and new field names or values are anticipated in the file, increase the Table A size parameters to avoid an eventual table full condition.

Table B parameters

The BRECPPG parameter specifies the number of Table B records per page and the BRESERVE parameter specifies the Table B reserved space per page. You can view the value of these parameters with the TABLEB command.

Viewing records and free space with the TABLEB command

The TABLEB command displays information about the records and free space in Table B. The TABLEB output always includes the following items:

  • Average free space per page
  • Average free record slots per page
  • Number of pages processed
  • Value of BRECPPG (number of Table B records per page)
  • Value of BRESERVE parameter (Table B reserved space per page)

The format of TABLEB is:

TABLEB [LIST | OVERFLOW | MASTER | RECORD LENGTH | PAGE page_number | PAGES page_number1 TO page_number2]

Where:

  • LIST produces one line for each page in Table B. If no other option is specified, TABLEB scans all of Table B, starting with the lowest active page and ending with the highest one. For each page, the page number, amount of free space, and number of free slots are shown. This detailed listing is followed by the list of averages and totals specified above.

    If the file being scanned is sorted, each printed detail line includes an indication of the type of page that the line describes. The OVERFLOW and MASTER options list only pages from the OVERFLOW or MASTER area of a sorted file.

  • PAGE and PAGES allow you to set the range of Table B pages to scan. If any of the pages specified are inactive, an error message is issued and the TABLEB command is ignored.
  • RECORD LENGTH (which can be abbreviated as RECLEN) calculates the average length of a logical record (a record that includes all of its extensions) in Table B.

    This length is exact unless DELETE RECORDS or IFDSET is used in the file. In this case, RECLEN is too long, because space from the deleted records is never released.

    RECLEN is valid only if all of Table B is scanned (that is, only if you do not specify the PAGE or PAGES options).

Checking Table B parameters

The two basic goals in loading Table B properly are:

Avoid... If... Then...
Wasting record numbers Number of free slots per page shown by the TABLEB command is not zero.

Record numbers are being wasted.

 

Creating extension records If, after loading Table B, the EXTNADD value is not zero Extension records have been created.

The following section discusses how to prevent these two conditions.

Preventing wasted record numbers

You can recover wasted record numbers in a file by setting BRECPPG to a smaller value or, less often, BRESERVE.

If, according to the TABLEB output, space is being wasted on each Table B page and no record numbers have been skipped, then BRECPPG might be set too low.

Preventing extension records

If the value of EXTNADD is too high (that is, if you have too many extension records) reload the file with a smaller BRECPPG or a larger BRESERVE.

Note: If a few records within the file are much longer than the average record, it might be most efficient to allow the longest records to create extension records. For more information about sizing BRESERVE to avoid extension records, see Computing BRESERVE (reserved Table B space).

Sizing Table B

This section describes the BSIZE and BHIGHPG parameters. For more information about sizing Table B, see Sizing Table B.

Using BHIGHPG to size BSIZE

To check your estimate of BSIZE, look at the BHIGHPG parameter. In ordinary and sorted files, BHIGHPG is the number of the last page used, one less than the number of pages used. If this number is significantly different from the estimate of space required for the sample, then adjust BSIZE accordingly.

Checking for overflow records in sorted files

In sorted files, examine OVFLADD (overflow records) and EOVFLADD (extra overflow records). The values of both of these parameters should be zero after the initial load. If they are not, either the last master area has been filled or, more likely, the input data was not properly sorted. The former condition occurs only if BHIGHPG is one less than BSIZE. For more information about sorted files, see Defining fields manually.

Checking for spill records in hash key files

Check SPILLADD (spills) in hash key files. Spill records are retrieved by their hash keys more slowly than other records. If the number of spills is not zero, raise BRECPPG, BSIZE, or both. Only raise BRECPPG if the TABLEB command shows pages that have no unused record slots but lots of available space.

Table C parameters

To monitor Table C, use the CRETRIES parameter and the TABLEC command, as described in the following sections.

Monitoring Table C with the CRETRIES parameter

As in Table A, a page retry statistic (the CRETRIES parameter) is provided. Each retry indicates a slight degradation in indexed retrieval speeds. Perhaps more significantly, page retries are a signal that Table C is becoming crowded. Although these figures vary from file to file, experiments have produced results similar to those in Monitoring Table C with the CRETRIES parameter (from a file with CSIZE=10).

Table C space vs. page retries

As shown in the figure above, page retries usually are not significant when utilization is below 70 percent. At approximately 72 percent, the number of retries begins to grow more rapidly as utilization increases (new KEY values are added). The point at which retries sharply begin to increase is 80 percent; at 90 percent the slope further increases. A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies.

Utilization of Table C

Table C never can be 100 percent utilized. A Table C full condition usually is reached when utilization exceeds 85 percent. Utilization of up to 95 percent has been achieved, but the precise point varies at which a Table C full condition results.

Table C space requirements

New KEY values require only two slots in Table C for small files. If the file size multiplier, discussed in Computing the file size multiplier (N), is greater than 1, KEY values might require more than two slots. Therefore, at a rate dependent upon the level and nature of update activity, the number of slots used increases.

TABLEC command

Use the TABLEC command to see what percentage of Table C space actually is used.The TABLEC command displays the following information about Table C usage:

  • Number of slots available in Table C
  • Number of slots used
  • Percentage of Table C used
  • Total number of property entries
  • Number of pages with spills
  • Number of primary property entries with overflow
  • Percentage of property entries with overflow

For example:

OPEN CENSUS TABLEC M204.0638: NUMBER OF SLOTS = 102400 M204.0639: SLOTS USED = 76800 M204.0640: PERCENTAGE OF TABLE C USED = 75 M204.1833: THE TOTAL NUMBER OF PROPERTY ENTRIES IS 7496 M204.1874: THE NUMBER OF PAGES WITH SPILLS IS 33 M204.1834: 2500 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33

Detecting overflows with the TABLEC command

The TABLEC command reports on property entry overflows vs. number of pages with spills. As described in, Sizing Table C, Table C contains property entries and segment entries.

About overflows

A field name = value pair stored in Table C normally consists of one property entry for the pair, and as many segment entries as there are segments having records with that field value. If there is no space available to store an additional segment entry on the page where the property entry resides, the segment entry is created on the next page that has sufficient space.

When this occurs, the property entry page is said to have spilled. Model 204 creates a duplicate property entry on the page spilled onto along with the segment entry placed there. The original property entry is updated to point to this new property entry as an overflow entry.

The effects of this overflow are:

  • Additional space is required for the overflow property entry.
  • Additional I/O is involved in retrieving the entries during FIND processing.

Decreasing the number of overflows

Because property entry distribution in Table C is entirely dependent on the data values and the hashing algorithm, avoiding overflows might be impossible. However, increasing the size of Table C, which requires recreating the file, might provide a greater dispersal of the Table C entries and, therefore, fewer overflows. As Table C fills and overflows increase, you might have to choose between disk space requirements (not recreating the file) and FIND processing performance (recreating the file).

The number of pages with spills, reported by TABLEC, indicates how uniformly the Table C space is being filled. If the percentage of property entries having overflow is high, and the number of pages with spills is also high, then the file is apparently filling uniformly and might become full if substantial additions to Table C are processed. In this case, you might need to re-create the file and increase Table C.

If the percentage of property entries having overflow is high, but the number of pages with spills is low, then a "hot spot" or "cluster" exists due to many of the data values hashing to the same page(s). In this case, the file might not be in danger of filling, but performance might be further degraded by the cluster. Experimenting with alternative Table C sizes might alleviate the cluster, but some clusters are probably unavoidable.

In the previous sample TABLEC output, both the percentage of overflows and the number of page spills are relatively high. If substantial additional updating to Table C is anticipated, the percentage of Table C used climbs, and any performance problems caused by too many overflows might be compounded by excessive page retries. If the concern for performance outweighs the need for disk space, re-create the file with a larger Table C.

In the following sample output, the percentage of overflows is relatively high but the number of page spills is low and the values are clustered. What performance degradation exists might or might not be improved by increasing Table C:

NUMBER OF SLOTS = 102400 SLOTS USED = 76800 PERCENTAGE OF TABLE C USED = 75 THE TOTAL NUMBER OF PROPERTIES IS 7496 800 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 10 THE NUMBER OF PAGES WITH SPILLS IS 2

Table D parameters

Only one parameter can be viewed to check the Table D size estimate. DPGSUSED shows the number of Table D pages used and can be compared directly to estimates of DSIZE for the sample loaded. The amount of Table D used can be accurately determined only after a complete segment has been loaded.

Table D usage does not increase linearly within a segment.

OINODES parameter

The estimate for the Ordered Index B-tree node space (see Calculating Ordered Index space) can be checked by viewing the OINODES parameter. OINODES shows the number of Table D pages used for the Ordered Index B-tree. The exact number of pages necessary for the Ordered Index B-tree can be accurately determined only after each of the values of all the ORDERED fields has been loaded.