SirMon critical-file-resource monitoring: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (add graphic)
 
m (add link)
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<!--Page automatically generated by CMSTOWIK EXEC and will be
** automatically replaced ** -- any manual edits will be lost.
You've been warned.  ..  (Page built by JAL at the SIRIUS VM; file: FUNPGNEW SYSUT2) -->
<!-- Page name: SirMon critical-file-resource monitoring-->
==Model 204 critical file resources==
==Model 204 critical file resources==
<var class="product">Model 204</var> defines four "critical file resources," which  are used to serialize access to important file structures.
<var class="product">Model 204</var> defines four "critical file resources," which  are used to serialize access to important file structures.
Line 38: Line 34:


==SirMon monitoring of critical file resources==
==SirMon monitoring of critical file resources==
<var class="product">SirMon</var> supplies you with a variety of statistics for
<var class="product">SirMon</var> supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file
viewing queues and conflicts that have formed against critical file
resources, and it supplies a special screen to help monitor critical
resources, and it supplies a special screen to help monitor critical
file resource conflicts while they are happening:
file resource conflicts while they are happening:


<p class="caption" style="width:500px">Critical File Resource Conflict display</p>
<p class="caption" style="width:475px">Critical File Resource conflict display</p>
<p class="figure">[[File:SmonCFRSconf.png|500px]]</p>
<p class="figure">[[File:SmonCFRSconf.png|475px]]</p>


This display is accessed from any file monitor screen in <var class="product">SirMon</var>
This display is accessed from any file monitor screen in <var class="product">SirMon</var>
(including custom user-defined screens) by pressing the PF2 key
(including custom user-defined screens) by pressing the PF2 key
with the cursor on a line containing file statistics.
with the cursor on a line containing file statistics. The Critical File Resource conflict display is not available in the [[RKWeb]] interface.  


The top section of the CFRS screen displays a variety of file-related
The top section of the CFRS screen displays a variety of file-related
Line 63: Line 58:
of enqueue each is holding.
of enqueue each is holding.
In addition, WT (WAITTYPE), WTIME (the length of time the user has been
In addition, WT (WAITTYPE), WTIME (the length of time the user has been
waiting) and PNAME (the procedure being run by the user) is displayed for each user.
waiting) and PNAME (the procedure being run by the user) are displayed for each user.


Critical file resource conflicts and enqueuing problems are always a
second-order effect indicating some other bottleneck.
For instance, long queues of users or high numbers of conflicts on the resource called <code>INDEX</code> might point to an inefficient [[SOUL]] program that is unnecessarily locking up a file's index pages.
<p class="note"><b>Note:</b> Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM. </p>
===Waiting users===
<b>WT</b> is a numeric code used by <var class="product">Model&nbsp;204</var> to indicate the type of wait a user is experiencing.
WAITTYPEs 24 and 25 indicate a wait on a critical file resource. <var class="product">SirMon</var> displays the <b>WT</b> statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters:
<ul>
<ul>
<li>WT is a numeric code used by <var class="product">Model 204</var> to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. <var class="product">SirMon</var> displays the WT statistic as a two-digit code, ordinarily the numeric wait type.
<li>The first character indicates the particular critical file resource
However, codes 24 and 25 are displayed as two alphabetic characters.
The first character indicates the particular critical file resource
being waited on, as follows:
being waited on, as follows:


Line 83: Line 85:
<tr><th>R</th>
<tr><th>R</th>
<td>RECENQ</td></tr>
<td>RECENQ</td></tr>
</table>
</table></li>
The second character indicates the strength of enqueue being sought:
 
<li>The second character indicates the strength of enqueue being sought:


<table class="thJustBold">
<table class="thJustBold">
Line 92: Line 95:
<tr><th>S</th>
<tr><th>S</th>
<td>Share</td></tr>
<td>Share</td></tr>
</table>
</table></li>
</ul>
 
<p>
<p>
For example, a WT value of <code>DS</code> indicates that the
For example, a <b>WT</b> value of <code>DS</code> indicates that the
DIRECT resource is required in share mode, while a WT value
DIRECT resource is required in share mode, while a WT value
of <code>RE</code> indicates that the RECENQ resource is required in
of <code>RE</code> indicates that the RECENQ resource is required in
Line 101: Line 106:
If a user is waiting for a critical file resource, the user that currently
If a user is waiting for a critical file resource, the user that currently
holds the requested critical file resource should be investigated.
holds the requested critical file resource should be investigated.
For example, if a user has a WT value of <code>IS</code>, you should
For example, if a user has a <b>WT</b> value of <code>IS</code>, you should
find another user holding the <code>INDEX</code> resource in exclusive mode.
find another user holding the <code>INDEX</code> resource in exclusive mode.
Whatever this second user is waiting upon is generally the root cause of
Whatever this second user is waiting upon is generally the root cause of
the first user's <code>IS</code> wait.</p>
the first user's <code>IS</code> wait.</p>
</ul>


===Related statistics===
Along with statistics displayed on this special critical file resource
Along with statistics displayed on this special critical file resource
screen, there are quite a number of other statistics, at the System,
screen, there are quite a number of other statistics, at the System,
Line 137: Line 142:
resource waits.</li>
resource waits.</li>
</ul>
</ul>
Critical file resource conflicts and enqueuing problems are always a
second-order effect indicating some other bottleneck.
For instance, long queues of users or high numbers of conflicts on the resource
called INDEX might point to an inefficient [[SOUL]] program that
is unnecessarily locking up a file's index pages.
<p class="note"><b>Note:</b> Users waiting in queues, but not holding enqueues on
critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM. </p>


==See also==
==See also==

Latest revision as of 20:05, 6 June 2017

Model 204 critical file resources

Model 204 defines four "critical file resources," which are used to serialize access to important file structures. The names of these resources and the resources that they control are:

DIRECT Controls access to Table B records.
INDEX Controls access to index structures in both Table C and Table D.
EXISTS Controls access to the existence bit-map.
RECENQ Controls access to record enqueueing data structures representing found sets and record lists.

A set of internal rules govern access to critical file resources: A thread must hold a resource in SHR mode to examine the associated file structure. A thread must hold a resource in EXC mode to modify the associated file structure. Multiple threads may hold a resource in SHR mode but only a single thread may hold a resource in EXC mode. A resource may not be held in SHR and EXC modes simultaneously. A thread may hold any number of resources for a single file but will never hold resources in more than one file at a time.

When a thread is prevented by the above rules from obtaining a required resource, the thread must wait until the resource becomes available: this is called a critical file resource "conflict." In this situation, the requesting thread is placed on a queue of users waiting for the resource. The thread is said to be "enqueued" on the critical file resource.

SirMon monitoring of critical file resources

SirMon supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file resources, and it supplies a special screen to help monitor critical file resource conflicts while they are happening:

Critical File Resource conflict display

This display is accessed from any file monitor screen in SirMon (including custom user-defined screens) by pressing the PF2 key with the cursor on a line containing file statistics. The Critical File Resource conflict display is not available in the RKWeb interface.

The top section of the CFRS screen displays a variety of file-related statistics (described in Critical File Resource statistics displayed in SirMon) showing the rate of conflicts and the current length of the queue of users waiting for each of the four critical file resources. Disk reads and writes (DKRD/DKWR) are also displayed, as are the total number of the file's pages currently in buffers (BUFPAGE) and the total number of users queued, waiting for any critical file resource for the file.

The lower portion of the screen displays users who currently hold enqueues on critical file resources for the selected file, and the type of enqueue each is holding. In addition, WT (WAITTYPE), WTIME (the length of time the user has been waiting) and PNAME (the procedure being run by the user) are displayed for each user.

Critical file resource conflicts and enqueuing problems are always a second-order effect indicating some other bottleneck. For instance, long queues of users or high numbers of conflicts on the resource called INDEX might point to an inefficient SOUL program that is unnecessarily locking up a file's index pages.

Note: Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM.

Waiting users

WT is a numeric code used by Model 204 to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. SirMon displays the WT statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters:

  • The first character indicates the particular critical file resource being waited on, as follows:
    D DIRECT
    I INDEX
    E EXISTS
    R RECENQ
  • The second character indicates the strength of enqueue being sought:
    E Exclusive
    S Share

For example, a WT value of DS indicates that the DIRECT resource is required in share mode, while a WT value of RE indicates that the RECENQ resource is required in exclusive mode. This translation of WAITTYPEs is vital in determining the root cause of a critical file resource enqueuing problem.

If a user is waiting for a critical file resource, the user that currently holds the requested critical file resource should be investigated. For example, if a user has a WT value of IS, you should find another user holding the INDEX resource in exclusive mode. Whatever this second user is waiting upon is generally the root cause of the first user's IS wait.

Related statistics

Along with statistics displayed on this special critical file resource screen, there are quite a number of other statistics, at the System, File, and User levels, that are related to critical file resource monitoring. An understanding of how these statistics relate to critical file resources is key to effectively identifying and fixing related problems.

  • CFRCONF and CFRQUEU exist on system and file levels. These statistics are the sums of the resource type queue lengths and number of conflicts occurring.

    CFRCONF is provided to flag situations where a critical file resource is being obtained and released relatively frequently, often producing short lived conflicts. This situation could produce unnecessarily high CPU utilization because of extra scheduler overhead. Two updating "batch" type jobs could produce this kind of problem.

    CFRQUEU is provided to flag situations where an application holds a critical file resource for excessive periods of time producing long queues of users waiting for the resource.

    CFRCONF exists on the user level, and at this level it indicates the number of times a user thread has had to wait on a critical file resource. Also on the user level, CFRCWTT indicates the amount of time a user has waited on critical file resources.

  • The WAITFIL statistic indicates the name of the file being waited on, and is reported for both critical file resource waits and disk I/O.
  • The WAITCFR statistic indicates the abbreviated name of the critical file resource being waited on, and is only reported for critical file resource waits.

See also