SirMon critical-file-resource monitoring: Difference between revisions
m (reduce size of graphic) |
m (misc formatting) |
||
Line 43: | Line 43: | ||
file resource conflicts while they are happening: | file resource conflicts while they are happening: | ||
<p class="caption" style="width:475px">Critical File Resource | <p class="caption" style="width:475px">Critical File Resource conflict display</p> | ||
<p class="figure">[[File:SmonCFRSconf.png|475px]]</p> | <p class="figure">[[File:SmonCFRSconf.png|475px]]</p> | ||
Line 63: | Line 63: | ||
of enqueue each is holding. | of enqueue each is holding. | ||
In addition, WT (WAITTYPE), WTIME (the length of time the user has been | In addition, WT (WAITTYPE), WTIME (the length of time the user has been | ||
waiting) and PNAME (the procedure being run by the user) | waiting) and PNAME (the procedure being run by the user) are displayed for each user. | ||
<ul> | <ul> | ||
<li>WT is a numeric code used by <var class="product">Model 204</var> to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. <var class="product">SirMon</var> displays the WT statistic as a two-digit code, ordinarily the numeric wait type. | <li>WT is a numeric code used by <var class="product">Model 204</var> to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. <var class="product">SirMon</var> displays the WT statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters. | ||
However, codes 24 and 25 are displayed as two alphabetic characters. | |||
The first character indicates the particular critical file resource | The first character indicates the particular critical file resource | ||
being waited on, as follows: | being waited on, as follows: | ||
Line 140: | Line 139: | ||
Critical file resource conflicts and enqueuing problems are always a | Critical file resource conflicts and enqueuing problems are always a | ||
second-order effect indicating some other bottleneck. | second-order effect indicating some other bottleneck. | ||
For instance, long queues of users or high numbers of conflicts on the resource | For instance, long queues of users or high numbers of conflicts on the resource called <code>INDEX</code> might point to an inefficient [[SOUL]] program that is unnecessarily locking up a file's index pages. | ||
called INDEX might point to an inefficient [[SOUL]] program that | |||
is unnecessarily locking up a file's index pages. | |||
<p class="note"><b>Note:</b> Users waiting in queues, but not holding enqueues on | <p class="note"><b>Note:</b> Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM. </p> | ||
critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM. </p> | |||
==See also== | ==See also== |
Revision as of 22:21, 3 November 2015
Model 204 critical file resources
Model 204 defines four "critical file resources," which are used to serialize access to important file structures. The names of these resources and the resources that they control are:
DIRECT | Controls access to Table B records. |
---|---|
INDEX | Controls access to index structures in both Table C and Table D. |
EXISTS | Controls access to the existence bit-map. |
RECENQ | Controls access to record enqueueing data structures representing found sets and record lists. |
A set of internal rules govern access to critical file resources: A thread must hold a resource in SHR mode to examine the associated file structure. A thread must hold a resource in EXC mode to modify the associated file structure. Multiple threads may hold a resource in SHR mode but only a single thread may hold a resource in EXC mode. A resource may not be held in SHR and EXC modes simultaneously. A thread may hold any number of resources for a single file but will never hold resources in more than one file at a time.
When a thread is prevented by the above rules from obtaining a required resource, the thread must wait until the resource becomes available: this is called a critical file resource "conflict." In this situation, the requesting thread is placed on a queue of users waiting for the resource. The thread is said to be "enqueued" on the critical file resource.
SirMon monitoring of critical file resources
SirMon supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file resources, and it supplies a special screen to help monitor critical file resource conflicts while they are happening:
This display is accessed from any file monitor screen in SirMon (including custom user-defined screens) by pressing the PF2 key with the cursor on a line containing file statistics.
The top section of the CFRS screen displays a variety of file-related statistics (described in Critical File Resource statistics displayed in SirMon) showing the rate of conflicts and the current length of the queue of users waiting for each of the four critical file resources. Disk reads and writes (DKRD/DKWR) are also displayed, as are the total number of the file's pages currently in buffers (BUFPAGE) and the total number of users queued, waiting for any critical file resource for the file.
The lower portion of the screen displays users who currently hold enqueues on critical file resources for the selected file, and the type of enqueue each is holding. In addition, WT (WAITTYPE), WTIME (the length of time the user has been waiting) and PNAME (the procedure being run by the user) are displayed for each user.
- WT is a numeric code used by Model 204 to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. SirMon displays the WT statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters.
The first character indicates the particular critical file resource
being waited on, as follows:
D DIRECT I INDEX E EXISTS R RECENQ The second character indicates the strength of enqueue being sought:
E Exclusive S Share For example, a WT value of
DS
indicates that the DIRECT resource is required in share mode, while a WT value ofRE
indicates that the RECENQ resource is required in exclusive mode. This translation of WAITTYPEs is vital in determining the root cause of a critical file resource enqueuing problem.If a user is waiting for a critical file resource, the user that currently holds the requested critical file resource should be investigated. For example, if a user has a WT value of
IS
, you should find another user holding theINDEX
resource in exclusive mode. Whatever this second user is waiting upon is generally the root cause of the first user'sIS
wait.
Along with statistics displayed on this special critical file resource screen, there are quite a number of other statistics, at the System, File, and User levels, that are related to critical file resource monitoring. An understanding of how these statistics relate to critical file resources is key to effectively identifying and fixing related problems.
- CFRCONF and CFRQUEU exist on system and file levels.
These statistics are the sums of the resource type queue lengths and number
of conflicts occurring.
CFRCONF is provided to flag situations where a critical file resource is being obtained and released relatively frequently, often producing short lived conflicts. This situation could produce unnecessarily high CPU utilization because of extra scheduler overhead. Two updating "batch" type jobs could produce this kind of problem.
CFRQUEU is provided to flag situations where an application holds a critical file resource for excessive periods of time producing long queues of users waiting for the resource.
CFRCONF exists on the user level, and at this level it indicates the number of times a user thread has had to wait on a critical file resource. Also on the user level, CFRCWTT indicates the amount of time a user has waited on critical file resources.
- The WAITFIL statistic indicates the name of the file being waited on, and is reported for both critical file resource waits and disk I/O.
- The WAITCFR statistic indicates the abbreviated name of the critical file resource being waited on, and is only reported for critical file resource waits.
Critical file resource conflicts and enqueuing problems are always a
second-order effect indicating some other bottleneck.
For instance, long queues of users or high numbers of conflicts on the resource called INDEX
might point to an inefficient SOUL program that is unnecessarily locking up a file's index pages.
Note: Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM.
See also
- SirMon
- SirMon application structure
- SirMon main menu
- SirMon System Overview screen
- SirMon threshold setting
- SirMon background monitor
- SirMon System Monitor menu
- SirMon User Monitor menu
- SirMon File Monitor menu
- SirMon Subsystem Monitor menu
- SirMon Task Monitor menu
- SirMon Janus Monitor menu
- SirMon custom screens
- SirMon critical-file-resource monitoring
- SirMon user-initiated capturing of statistics
- System statistics displayed in SirMon
- User statistics displayed in SirMon
- File statistics displayed in SirMon
- Subsystem statistics displayed in SirMon
- Task statistics displayed in SirMon
- Critical File Resource statistics displayed in SirMon
- SirMon date processing