SirMon critical-file-resource monitoring

From m204wiki
Revision as of 22:21, 3 November 2015 by JAL (talk | contribs) (misc formatting)
Jump to navigation Jump to search

Model 204 critical file resources

Model 204 defines four "critical file resources," which are used to serialize access to important file structures. The names of these resources and the resources that they control are:

DIRECT Controls access to Table B records.
INDEX Controls access to index structures in both Table C and Table D.
EXISTS Controls access to the existence bit-map.
RECENQ Controls access to record enqueueing data structures representing found sets and record lists.

A set of internal rules govern access to critical file resources: A thread must hold a resource in SHR mode to examine the associated file structure. A thread must hold a resource in EXC mode to modify the associated file structure. Multiple threads may hold a resource in SHR mode but only a single thread may hold a resource in EXC mode. A resource may not be held in SHR and EXC modes simultaneously. A thread may hold any number of resources for a single file but will never hold resources in more than one file at a time.

When a thread is prevented by the above rules from obtaining a required resource, the thread must wait until the resource becomes available: this is called a critical file resource "conflict." In this situation, the requesting thread is placed on a queue of users waiting for the resource. The thread is said to be "enqueued" on the critical file resource.

SirMon monitoring of critical file resources

SirMon supplies you with a variety of statistics for viewing queues and conflicts that have formed against critical file resources, and it supplies a special screen to help monitor critical file resource conflicts while they are happening:

Critical File Resource conflict display

This display is accessed from any file monitor screen in SirMon (including custom user-defined screens) by pressing the PF2 key with the cursor on a line containing file statistics.

The top section of the CFRS screen displays a variety of file-related statistics (described in Critical File Resource statistics displayed in SirMon) showing the rate of conflicts and the current length of the queue of users waiting for each of the four critical file resources. Disk reads and writes (DKRD/DKWR) are also displayed, as are the total number of the file's pages currently in buffers (BUFPAGE) and the total number of users queued, waiting for any critical file resource for the file.

The lower portion of the screen displays users who currently hold enqueues on critical file resources for the selected file, and the type of enqueue each is holding. In addition, WT (WAITTYPE), WTIME (the length of time the user has been waiting) and PNAME (the procedure being run by the user) are displayed for each user.

  • WT is a numeric code used by Model 204 to indicate the type of wait a user is experiencing. WAITTYPEs 24 and 25 indicate a wait on a critical file resource. SirMon displays the WT statistic as a two-digit code, ordinarily the numeric wait type. However, codes 24 and 25 are displayed as two alphabetic characters. The first character indicates the particular critical file resource being waited on, as follows:
    D DIRECT
    I INDEX
    E EXISTS
    R RECENQ

    The second character indicates the strength of enqueue being sought:

    E Exclusive
    S Share

    For example, a WT value of DS indicates that the DIRECT resource is required in share mode, while a WT value of RE indicates that the RECENQ resource is required in exclusive mode. This translation of WAITTYPEs is vital in determining the root cause of a critical file resource enqueuing problem.

    If a user is waiting for a critical file resource, the user that currently holds the requested critical file resource should be investigated. For example, if a user has a WT value of IS, you should find another user holding the INDEX resource in exclusive mode. Whatever this second user is waiting upon is generally the root cause of the first user's IS wait.

Along with statistics displayed on this special critical file resource screen, there are quite a number of other statistics, at the System, File, and User levels, that are related to critical file resource monitoring. An understanding of how these statistics relate to critical file resources is key to effectively identifying and fixing related problems.

  • CFRCONF and CFRQUEU exist on system and file levels. These statistics are the sums of the resource type queue lengths and number of conflicts occurring.

    CFRCONF is provided to flag situations where a critical file resource is being obtained and released relatively frequently, often producing short lived conflicts. This situation could produce unnecessarily high CPU utilization because of extra scheduler overhead. Two updating "batch" type jobs could produce this kind of problem.

    CFRQUEU is provided to flag situations where an application holds a critical file resource for excessive periods of time producing long queues of users waiting for the resource.

    CFRCONF exists on the user level, and at this level it indicates the number of times a user thread has had to wait on a critical file resource. Also on the user level, CFRCWTT indicates the amount of time a user has waited on critical file resources.

  • The WAITFIL statistic indicates the name of the file being waited on, and is reported for both critical file resource waits and disk I/O.
  • The WAITCFR statistic indicates the abbreviated name of the critical file resource being waited on, and is only reported for critical file resource waits.

Critical file resource conflicts and enqueuing problems are always a second-order effect indicating some other bottleneck. For instance, long queues of users or high numbers of conflicts on the resource called INDEX might point to an inefficient SOUL program that is unnecessarily locking up a file's index pages.

Note: Users waiting in queues, but not holding enqueues on critical file resources, are not displayed here. They can be seen in User Monitor displays showing WAITTYP and WAITTIM.

See also