Tuning the hash index: Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(18 intermediate revisions by 6 users not shown)
Line 1: Line 1:
==Monitoring Table C==
==Monitoring Table C==
To monitor Table C, use the <var>[[CRETRIES parameter|CRETRIES]]</var> parameter and the <var>[[TABLEC command|TABLEC]]</var> command. Their use is discussed below.
<p>
See also [[Field design#Indexing fields|Indexing fields]] for more information on choosing the type of indexes for your fields.</p>


To monitor Table C, use the CRETRIES parameter and the TABLEC
==Using CRETRIES in monitoring==
command.Their use is discussed below.
The <var>CRETRIES</var> parameter provides the count of the total number of page retries while storing index entries in Table C. A retry indicates a slight degradation in indexed retrieval speeds. Far more significantly (given the limitations in how Table C may be increased), page retries are a signal that Table C is becoming crowded.
 
<p>See also [[#Field Design (File Management) Indexing Fields|Indexing Fields]] for more information on choosing the type of indexes for your fields.</p>
 
 
== Using [[CRETRIES parameter|CRETRIES]] in Monitoring ==
 
<p>The page retry statistic (the CRETRIES parameter) is provided. Each retry indicates a slight degradation in indexed retrieval speeds. Far more significantly (given the limitations in hoe Table C may be increased), page retries are a signal that Table C is becoming crowded.</p>


<p>Although these figures vary from file to file, experiments have produced results
<p>Although these figures vary from file to file, experiments have produced results
similar to those in the following diagram. (This from a small file with CSIZE = 10).</p>
similar to those in the following diagram. (This is from a small file with a <var>[[CSIZE parameter|CSIZE]]</var> of 10.)</p>


<p class="caption">Effect of page retries on Table C utilization</p>
:::[[File:Table C Utilization.jpg]]
:::[[File:Table C Utilization.jpg]]


<p>As shown above, page retries usually are not significant
<p>As shown above, page retries usually are not significant
when utilization is below 70 percent. At approximately 72 percent, the number
when utilization is below 70 percent. At approximately 72 percent, the number
of retries begins to grow more rapidly as utilization increases (new KEY values
of retries begins to grow more rapidly as utilization increases (new <var>KEY</var> values
are added). The point at which retries sharply begin to increase is 80 percent;
are added). The point at which retries sharply begin to increase is 80 percent;
at 90 percent the slope further increases.</p>  
at 90 percent the slope further increases.</p>  
<p>A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space
<p>
is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies depending on the characteristics of the data.</p>
A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies depending on the characteristics of the data.</p>
 
<p>Table C never can be 100 percent utilized. A Table C full condition may happen at any point after
utilization reaches 85 percent.</p>
 
<p>Because of this uncertainty, Rocket Software encourages users to strongly consider the use of the Ordered Index as a replacement for the hashed index. This can be done without code changes. See [[#Field Design (File Management) Indexing Fields|Indexing Fields]] for more information.</p>
 


== Using the [[TABLEC command]] in Monitoring ==
<p>Table C never can be 100 percent utilized. A Table C full condition may happen at any point after utilization reaches 85 percent. Because of this uncertainty, strongly consider the use of the Ordered Index as a replacement for the hashed index. This can be done without code changes. See [[Field design#Indexing fields|Indexing fields]] for more information.</p>


<p>Use the [[TABLEC command]] to see what percentage of Table C space actually
==Using the TABLEC command in monitoring==
is used.The [[TABLEC command]] displays the following information about
<p>
Use the <var>TABLEC</var> command to see what percentage of Table C space actually
is used. The <var>TABLEC</var> command displays the following information about
Table C usage:</p>
Table C usage:</p>
* Number of slots available in Table C
<ul>
* Number of slots used
<li>Number of slots available in Table C
* Percentage of Table C used
<li>Number of slots used
* Total number of property entries
<li>Percentage of Table C used
* Number of pages with spills
<li>Total number of property entries
* Number of primary property entries with overflow
<li>Number of pages with spills
* Percentage of property entries with overflow
<li>Number of primary property entries with overflow
<li>Percentage of property entries with overflow
</ul>


<p>For example:<br>
<p>For example:</p>
OPEN CENSUS<br>
<p class="code"><b>OPEN CENSUS
TABLEC<br>
TABLEC</b><br>
M204.0638: NUMBER OF SLOTS = 102400<br>
M204.0638: NUMBER OF SLOTS = 102400
M204.0639: SLOTS USED = 76800<br>
M204.0639: SLOTS USED = 76800
M204.0640: PERCENTAGE OF TABLE C USED = 75<br>
M204.0640: PERCENTAGE OF TABLE C USED = 75
M204.1833: THE TOTAL NUMBER OF PROPERTY ENTRIES IS 7496<br>
M204.1833: THE TOTAL NUMBER OF PROPERTY ENTRIES IS 7496
M204.1874: THE NUMBER OF PAGES WITH SPILLS IS 33<br>
M204.1874: THE NUMBER OF PAGES WITH SPILLS IS 33
M204.1834: 2500 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW<br>
M204.1834: 2500 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW
M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33</p>
M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33</p>


=== Using the TABLEC command to detect overflows===
===Using the TABLEC command to detect overflows===
The TABLEC command reports on property entry overflows vs. number of
The <var>TABLEC</var> command reports on property entry overflows versus number of
pages with spills. As described in [[TABLE C (File Architecture)|the file architecture discussion]], Table C contains
pages with spills. As described in [[Table C (File architecture)|the file architecture discussion]], Table C contains property entries and segment entries.  
property entries and segment entries. </P>
<p>
 
A field value pair stored in Table C normally consists of one property
<p>A field value pair stored in Table C normally consists of one property
entry for the pair, and as many segment entries as there are segments having
entry for the pair, and as many segment entries as there are segments having
records with that field value. If there is no space available to store an additional
records with that field value. If there is no space available to store an additional
segment entry on the page where the property entry resides, the segment entry
segment entry on the page where the property entry resides, the segment entry
is created on the next page that has sufficient space.</p>
is created on the next page that has sufficient space.</p>
<p>When this occurs, the property entry page is said to have spilled. Model 204
<p>
When this occurs, the property entry page is said to have spilled. Model 204
creates a duplicate property entry on the page spilled onto along with the
creates a duplicate property entry on the page spilled onto along with the
segment entry placed there. The original property entry is updated to point to
segment entry placed there. The original property entry is updated to point to
this new property entry as an overflow entry.</p>
this new property entry as an overflow entry.</p>
<p>The effects of this overflow are:</p>
<p>
* Additional space is required for the overflow property entry.
The effects of this overflow are:</p>
* Additional I/O is involved in retrieving the entries during FIND processing.
<ul>
 
<li>Additional space is required for the overflow property entry.
<p>Because property entry distribution in Table C is entirely dependent on the data
<li>Additional I/O is involved in retrieving the entries during <var>FIND</var> processing.
</ul>
<p>
Because property entry distribution in Table C is entirely dependent on the data
values and the hashing algorithm, avoiding overflows might be impossible.
values and the hashing algorithm, avoiding overflows might be impossible.
However, increasing the size of Table C, which requires recreating the file,
However, increasing the size of Table C, which requires recreating the file,
Line 79: Line 77:
between disk space requirements (not recreating the file) and FIND processing
between disk space requirements (not recreating the file) and FIND processing
performance (recreating the file).</p>
performance (recreating the file).</p>
<p>The number of pages with spills, reported by TABLEC, indicates how uniformly
<p>
The number of pages with spills, reported by <var>TABLEC</var>, indicates how uniformly
the Table C space is being filled. If the percentage of property entries having
the Table C space is being filled. If the percentage of property entries having
overflow is high, and the number of pages with spills is also high, then the file
overflow is high, and the number of pages with spills is also high, then the file
Line 85: Line 84:
Table C are processed. In this case, you might need to recreate the file and
Table C are processed. In this case, you might need to recreate the file and
increase Table C.</p>
increase Table C.</p>
<p>If the percercentage of property entries having overflow is high, but the number of
<p>
pages with spills is low, then a “hot spot” or “cluster” exists due to many of the
If the percercentage of property entries having overflow is high, but the number of
pages with spills is low, then a "hot spot" or "cluster" exists due to many of the
data values hashing to the same page(s). In this case, the file might not be in
data values hashing to the same page(s). In this case, the file might not be in
danger of filling, but performance might be further degraded by the cluster.
danger of filling, but performance might be further degraded by the cluster.
Experimenting with alternative Table C sizes might alleviate the cluster, but
Experimenting with alternative Table C sizes might alleviate the cluster, but
some clusters are probably unavoidable.</p>
some clusters are probably unavoidable.</p>
<p>In the previous sample TABLEC output, both the percentage of overflows and
<p>
In the previous sample <var>TABLEC</var> output, both the percentage of overflows and
the number of page spills are relatively high. If substantial additional updating
the number of page spills are relatively high. If substantial additional updating
to Table C is anticipated, the percentage of Table C used climbs, and any
to Table C is anticipated, the percentage of Table C used climbs, and any
performance problems caused by too many overflows might be compounded
performance problems caused by too many overflows might be compounded
by excessive page retries, or worse, the file will fill. Normally concern for performance (including availability) outweighs the need for disk space, so consider recreating the file with a larger Table C.</p>
by excessive page retries, or worse, the file will fill. Normally concern for performance (including availability) outweighs the need for disk space, so consider recreating the file with a larger Table C.</p>
<p>In the following sample output, the percentage of overflows is relatively high but
<p>
In the following sample output, the percentage of overflows is relatively high but
the number of page spills is low and the values are clustered. What
the number of page spills is low and the values are clustered. What
performance degradation exists might or might not be improved by increasing
performance degradation exists might or might not be improved by increasing
Table C:</p>
Table C:</p>
<p>NUMBER OF SLOTS = 102400<br>
<p class="code">NUMBER OF SLOTS = 102400
SLOTS USED = 76800<br>
SLOTS USED = 76800
PERCENTAGE OF TABLE C USED = 75<br>
PERCENTAGE OF TABLE C USED = 75
THE TOTAL NUMBER OF PROPERTIES IS 7496<br>
THE TOTAL NUMBER OF PROPERTIES IS 7496
800 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW<br>
800 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW
THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 10<br>
THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 10
THE NUMBER OF PAGES WITH SPILLS IS 2</p>
THE NUMBER OF PAGES WITH SPILLS IS 2</p>


 
[[Category:Model 204 files]]
 
 
 
 
 
 
[[Category:File Architecture and Management]]
[[Category:File Management]]

Latest revision as of 16:53, 12 May 2014

Monitoring Table C

To monitor Table C, use the CRETRIES parameter and the TABLEC command. Their use is discussed below.

See also Indexing fields for more information on choosing the type of indexes for your fields.

Using CRETRIES in monitoring

The CRETRIES parameter provides the count of the total number of page retries while storing index entries in Table C. A retry indicates a slight degradation in indexed retrieval speeds. Far more significantly (given the limitations in how Table C may be increased), page retries are a signal that Table C is becoming crowded.

Although these figures vary from file to file, experiments have produced results similar to those in the following diagram. (This is from a small file with a CSIZE of 10.)

Effect of page retries on Table C utilization

As shown above, page retries usually are not significant when utilization is below 70 percent. At approximately 72 percent, the number of retries begins to grow more rapidly as utilization increases (new KEY values are added). The point at which retries sharply begin to increase is 80 percent; at 90 percent the slope further increases.

A rough guideline for users concerned with access performance is to aim for 75 percent utilization. If space is the only constraint, you should be able to achieve 85 percent utilization. Remember that this figure varies depending on the characteristics of the data.

Table C never can be 100 percent utilized. A Table C full condition may happen at any point after utilization reaches 85 percent. Because of this uncertainty, strongly consider the use of the Ordered Index as a replacement for the hashed index. This can be done without code changes. See Indexing fields for more information.

Using the TABLEC command in monitoring

Use the TABLEC command to see what percentage of Table C space actually is used. The TABLEC command displays the following information about Table C usage:

  • Number of slots available in Table C
  • Number of slots used
  • Percentage of Table C used
  • Total number of property entries
  • Number of pages with spills
  • Number of primary property entries with overflow
  • Percentage of property entries with overflow

For example:

OPEN CENSUS TABLEC
M204.0638: NUMBER OF SLOTS = 102400 M204.0639: SLOTS USED = 76800 M204.0640: PERCENTAGE OF TABLE C USED = 75 M204.1833: THE TOTAL NUMBER OF PROPERTY ENTRIES IS 7496 M204.1874: THE NUMBER OF PAGES WITH SPILLS IS 33 M204.1834: 2500 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW M204.1871: THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 33

Using the TABLEC command to detect overflows

The TABLEC command reports on property entry overflows versus number of pages with spills. As described in the file architecture discussion, Table C contains property entries and segment entries.

A field value pair stored in Table C normally consists of one property entry for the pair, and as many segment entries as there are segments having records with that field value. If there is no space available to store an additional segment entry on the page where the property entry resides, the segment entry is created on the next page that has sufficient space.

When this occurs, the property entry page is said to have spilled. Model 204 creates a duplicate property entry on the page spilled onto along with the segment entry placed there. The original property entry is updated to point to this new property entry as an overflow entry.

The effects of this overflow are:

  • Additional space is required for the overflow property entry.
  • Additional I/O is involved in retrieving the entries during FIND processing.

Because property entry distribution in Table C is entirely dependent on the data values and the hashing algorithm, avoiding overflows might be impossible. However, increasing the size of Table C, which requires recreating the file, might provide a greater dispersal of the Table C entries and, therefore, fewer overflows. As Table C fills and overflows increase, you might have to choose between disk space requirements (not recreating the file) and FIND processing performance (recreating the file).

The number of pages with spills, reported by TABLEC, indicates how uniformly the Table C space is being filled. If the percentage of property entries having overflow is high, and the number of pages with spills is also high, then the file is apparently filling uniformly and might become full if substantial additions to Table C are processed. In this case, you might need to recreate the file and increase Table C.

If the percercentage of property entries having overflow is high, but the number of pages with spills is low, then a "hot spot" or "cluster" exists due to many of the data values hashing to the same page(s). In this case, the file might not be in danger of filling, but performance might be further degraded by the cluster. Experimenting with alternative Table C sizes might alleviate the cluster, but some clusters are probably unavoidable.

In the previous sample TABLEC output, both the percentage of overflows and the number of page spills are relatively high. If substantial additional updating to Table C is anticipated, the percentage of Table C used climbs, and any performance problems caused by too many overflows might be compounded by excessive page retries, or worse, the file will fill. Normally concern for performance (including availability) outweighs the need for disk space, so consider recreating the file with a larger Table C.

In the following sample output, the percentage of overflows is relatively high but the number of page spills is low and the values are clustered. What performance degradation exists might or might not be improved by increasing Table C:

NUMBER OF SLOTS = 102400 SLOTS USED = 76800 PERCENTAGE OF TABLE C USED = 75 THE TOTAL NUMBER OF PROPERTIES IS 7496 800 PRIMARY PROPERTY ENTRIES HAVE OVERFLOW THE PERCENTAGE OF PROPERTY ENTRIES WITH OVERFLOW IS 10 THE NUMBER OF PAGES WITH SPILLS IS 2