SirMon critical-file-resource monitoring: Difference between revisions
m (add graphic) |
m (add link) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==Model 204 critical file resources== | ==Model 204 critical file resources== | ||
<var class="product">Model 204</var> defines four "critical file resources," which are used to serialize access to important file structures. | <var class="product">Model 204</var> defines four "critical file resources," which are used to serialize access to important file structures. | ||
Line 38: | Line 34: | ||
==SirMon monitoring of critical file resources== | ==SirMon monitoring of critical file resources== | ||
<var class="product">SirMon</var> supplies you with a variety of statistics for | <var class="product">SirMon</var> supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file | ||
viewing queues and conflicts that have formed against critical file | |||
resources, and it supplies a special screen to help monitor critical | resources, and it supplies a special screen to help monitor critical | ||
file resource conflicts while they are happening: | file resource conflicts while they are happening: | ||
<p class="caption" style="width: | <p class="caption" style="width:475px">Critical File Resource conflict display</p> | ||
<p class="figure">[[File:SmonCFRSconf.png| | <p class="figure">[[File:SmonCFRSconf.png|475px]]</p> | ||
This display is accessed from any file monitor screen in <var class="product">SirMon</var> | This display is accessed from any file monitor screen in <var class="product">SirMon</var> | ||
(including custom user-defined screens) by pressing the PF2 key | (including custom user-defined screens) by pressing the PF2 key | ||
with the cursor on a line containing file statistics. | with the cursor on a line containing file statistics. The Critical File Resource conflict display is not available in the [[RKWeb]] interface. | ||
The top section of the CFRS screen displays a variety of file-related | The top section of the CFRS screen displays a variety of file-related | ||
Line 63: | Line 58: | ||
of enqueue each is holding. | of enqueue each is holding. | ||
In addition, WT (WAITTYPE), WTIME (the length of time the user has been | In addition, WT (WAITTYPE), WTIME (the length of time the user has been | ||
waiting) and PNAME (the procedure being run by the user) | waiting) and PNAME (the procedure being run by the user) are displayed for each user. | ||
Critical file resource conflicts and enqueuing problems are always a | |||
second-order effect indicating some other bottleneck. | |||
For instance, long queues of users or high numbers of conflicts on the resource called <code>INDEX</code> might point to an inefficient [[SOUL]] program that is unnecessarily locking up a file's index pages. | |||
<p class="note"><b>Note:</b> Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM. </p> | |||
===Waiting users=== | |||
<b>WT</b> is a numeric code used by <var class="product">Model 204</var> to indicate the type of wait a user is experiencing. | |||
WAITTYPEs 24 and 25 indicate a wait on a critical file resource. <var class="product">SirMon</var> displays the <b>WT</b> statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters: | |||
<ul> | <ul> | ||
<li> | <li>The first character indicates the particular critical file resource | ||
The first character indicates the particular critical file resource | |||
being waited on, as follows: | being waited on, as follows: | ||
Line 83: | Line 85: | ||
<tr><th>R</th> | <tr><th>R</th> | ||
<td>RECENQ</td></tr> | <td>RECENQ</td></tr> | ||
</table> | </table></li> | ||
The second character indicates the strength of enqueue being sought: | |||
<li>The second character indicates the strength of enqueue being sought: | |||
<table class="thJustBold"> | <table class="thJustBold"> | ||
Line 92: | Line 95: | ||
<tr><th>S</th> | <tr><th>S</th> | ||
<td>Share</td></tr> | <td>Share</td></tr> | ||
</table> | </table></li> | ||
</ul> | |||
<p> | <p> | ||
For example, a WT value of <code>DS</code> indicates that the | For example, a <b>WT</b> value of <code>DS</code> indicates that the | ||
DIRECT resource is required in share mode, while a WT value | DIRECT resource is required in share mode, while a WT value | ||
of <code>RE</code> indicates that the RECENQ resource is required in | of <code>RE</code> indicates that the RECENQ resource is required in | ||
Line 101: | Line 106: | ||
If a user is waiting for a critical file resource, the user that currently | If a user is waiting for a critical file resource, the user that currently | ||
holds the requested critical file resource should be investigated. | holds the requested critical file resource should be investigated. | ||
For example, if a user has a WT value of <code>IS</code>, you should | For example, if a user has a <b>WT</b> value of <code>IS</code>, you should | ||
find another user holding the <code>INDEX</code> resource in exclusive mode. | find another user holding the <code>INDEX</code> resource in exclusive mode. | ||
Whatever this second user is waiting upon is generally the root cause of | Whatever this second user is waiting upon is generally the root cause of | ||
the first user's <code>IS</code> wait.</p> | the first user's <code>IS</code> wait.</p> | ||
===Related statistics=== | |||
Along with statistics displayed on this special critical file resource | Along with statistics displayed on this special critical file resource | ||
screen, there are quite a number of other statistics, at the System, | screen, there are quite a number of other statistics, at the System, | ||
Line 137: | Line 142: | ||
resource waits.</li> | resource waits.</li> | ||
</ul> | </ul> | ||
==See also== | ==See also== |
Latest revision as of 20:05, 6 June 2017
Model 204 critical file resources
Model 204 defines four "critical file resources," which are used to serialize access to important file structures. The names of these resources and the resources that they control are:
DIRECT | Controls access to Table B records. |
---|---|
INDEX | Controls access to index structures in both Table C and Table D. |
EXISTS | Controls access to the existence bit-map. |
RECENQ | Controls access to record enqueueing data structures representing found sets and record lists. |
A set of internal rules govern access to critical file resources: A thread must hold a resource in SHR mode to examine the associated file structure. A thread must hold a resource in EXC mode to modify the associated file structure. Multiple threads may hold a resource in SHR mode but only a single thread may hold a resource in EXC mode. A resource may not be held in SHR and EXC modes simultaneously. A thread may hold any number of resources for a single file but will never hold resources in more than one file at a time.
When a thread is prevented by the above rules from obtaining a required resource, the thread must wait until the resource becomes available: this is called a critical file resource "conflict." In this situation, the requesting thread is placed on a queue of users waiting for the resource. The thread is said to be "enqueued" on the critical file resource.
SirMon monitoring of critical file resources
SirMon supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file resources, and it supplies a special screen to help monitor critical file resource conflicts while they are happening:
This display is accessed from any file monitor screen in SirMon (including custom user-defined screens) by pressing the PF2 key with the cursor on a line containing file statistics. The Critical File Resource conflict display is not available in the RKWeb interface.
The top section of the CFRS screen displays a variety of file-related statistics (described in Critical File Resource statistics displayed in SirMon) showing the rate of conflicts and the current length of the queue of users waiting for each of the four critical file resources. Disk reads and writes (DKRD/DKWR) are also displayed, as are the total number of the file's pages currently in buffers (BUFPAGE) and the total number of users queued, waiting for any critical file resource for the file.
The lower portion of the screen displays users who currently hold enqueues on critical file resources for the selected file, and the type of enqueue each is holding. In addition, WT (WAITTYPE), WTIME (the length of time the user has been waiting) and PNAME (the procedure being run by the user) are displayed for each user.
Critical file resource conflicts and enqueuing problems are always a
second-order effect indicating some other bottleneck.
For instance, long queues of users or high numbers of conflicts on the resource called INDEX
might point to an inefficient SOUL program that is unnecessarily locking up a file's index pages.
Note: Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM.
Waiting users
WT is a numeric code used by Model 204 to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. SirMon displays the WT statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters:
- The first character indicates the particular critical file resource
being waited on, as follows:
D DIRECT I INDEX E EXISTS R RECENQ - The second character indicates the strength of enqueue being sought:
E Exclusive S Share
For example, a WT value of DS
indicates that the
DIRECT resource is required in share mode, while a WT value
of RE
indicates that the RECENQ resource is required in
exclusive mode. This translation of WAITTYPEs is vital in determining the root cause of a critical file resource enqueuing problem.
If a user is waiting for a critical file resource, the user that currently
holds the requested critical file resource should be investigated.
For example, if a user has a WT value of IS
, you should
find another user holding the INDEX
resource in exclusive mode.
Whatever this second user is waiting upon is generally the root cause of
the first user's IS
wait.
Related statistics
Along with statistics displayed on this special critical file resource screen, there are quite a number of other statistics, at the System, File, and User levels, that are related to critical file resource monitoring. An understanding of how these statistics relate to critical file resources is key to effectively identifying and fixing related problems.
- CFRCONF and CFRQUEU exist on system and file levels.
These statistics are the sums of the resource type queue lengths and number
of conflicts occurring.
CFRCONF is provided to flag situations where a critical file resource is being obtained and released relatively frequently, often producing short lived conflicts. This situation could produce unnecessarily high CPU utilization because of extra scheduler overhead. Two updating "batch" type jobs could produce this kind of problem.
CFRQUEU is provided to flag situations where an application holds a critical file resource for excessive periods of time producing long queues of users waiting for the resource.
CFRCONF exists on the user level, and at this level it indicates the number of times a user thread has had to wait on a critical file resource. Also on the user level, CFRCWTT indicates the amount of time a user has waited on critical file resources.
- The WAITFIL statistic indicates the name of the file being waited on, and is reported for both critical file resource waits and disk I/O.
- The WAITCFR statistic indicates the abbreviated name of the critical file resource being waited on, and is only reported for critical file resource waits.
See also
- SirMon
- SirMon application structure
- SirMon main menu
- SirMon System Overview screen
- SirMon threshold setting
- SirMon background monitor
- SirMon System Monitor menu
- SirMon User Monitor menu
- SirMon File Monitor menu
- SirMon Subsystem Monitor menu
- SirMon Task Monitor menu
- SirMon Janus Monitor menu
- SirMon custom screens
- SirMon critical-file-resource monitoring
- SirMon user-initiated capturing of statistics
- System statistics displayed in SirMon
- User statistics displayed in SirMon
- File statistics displayed in SirMon
- Subsystem statistics displayed in SirMon
- Task statistics displayed in SirMon
- Critical File Resource statistics displayed in SirMon
- SirMon date processing