SirTune user states: Difference between revisions
m (misc cleanup) |
m (misc formatting) |
||
Line 271: | Line 271: | ||
A high value for number of users per sample with wait types CFREX and CFRSH in the [[SirTune reports#SUMMARY reports|SUMMARY report]] suggests that critical file resource enqueuing bears closer examination.</p> | A high value for number of users per sample with wait types CFREX and CFRSH in the [[SirTune reports#SUMMARY reports|SUMMARY report]] suggests that critical file resource enqueuing bears closer examination.</p> | ||
There are four | There are four critical file resources: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<caption>Critical file resources</caption> | |||
<tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr> | <tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr> | ||
<tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr> | <tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr> | ||
Line 281: | Line 282: | ||
</table> | </table> | ||
===Determining the cause of a wait=== | |||
A first step to investigating a critical file resource enqueueing problem | A first step to investigating a critical file resource enqueueing problem | ||
is to produce reports for the WCFR state. | is to produce reports for the WCFR state. | ||
Line 303: | Line 305: | ||
SirTune is able to produce several additional reports to help isolate the | SirTune is able to produce several additional reports to help isolate the | ||
cause of critical file resource enqueuing. | cause of critical file resource enqueuing. | ||
The first report that might be useful is the CFRROOT report. | The first report that might be useful is the [[SirTune reports#CFRROOT reports|CFRROOT]] report. | ||
This report indicates the base wait types that are behind critical file resource waits. | This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide | ||
The | information about which lines of code cause critical file resource waits, so it is not helpful for application tuning. | ||
information | |||
The | The CFRROOT report might indicate that | ||
application tuning (rather than system tuning) might be required to reduce | application tuning (rather than system tuning) might be required to reduce | ||
critical file resource enqueuing. | critical file resource enqueuing. | ||
This would be indicated by a primary root cause | This would be indicated by a primary root cause | ||
of DISK (disk I/O waits) or maybe JRNLO (journal I/O waits). | of <code>DISK</code> (disk I/O waits) or maybe <code>JRNLO</code> (journal I/O waits). | ||
===Reducing wait times=== | |||
You can attack a primary root cause for critical file resource waits | You can attack a primary root cause for critical file resource waits | ||
by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources. | |||
To facilitate this latter option, several CFR states can be requested | To facilitate this latter option, several CFR states can be requested | ||
Line 321: | Line 323: | ||
These states are: | These states are: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<caption>CFR states</caption> | |||
<tr><th>CFRHANY</th> | <tr><th>CFRHANY</th> | ||
<td>The state where a user holds any critical file resource.</td></tr> | <td>The state where a user holds any critical file resource.</td></tr> |
Revision as of 21:55, 12 November 2015
Model 204 user states
When the SirTune sampling program is collecting a sample it scans all logged on users. Each user is classified by its state. The user's state is a general indication of the type of activity occurring in a user thread. These states roughly correspond to the states reported by the Model 204 performance monitor, though broken down to a finer level of detail.
Primary states
The following primary states are distinguished by SirTune:
BLKIN | This includes any user that is blocked, that is waiting for something, in a server and not waiting for user input. This is distinguished from BLKIU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
---|---|
BLKIU | This includes any user that is blocked, that is waiting for something, in a server and waiting for user input. This is distinguished from BLKIN because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
BLKON | This includes any user that is blocked, that is waiting for something, not in a server and not waiting for user input. This is distinguished from BLKOU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
BLKOU | This includes any user that is blocked, that is waiting for something, not in a server and waiting for user input. This is distinguished from BLKON because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
REDY | This includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running. |
RUNG | This includes any user that is running, that is, using CPU. Unless MP/204 is installed, there can never be more than one user in state RUNG per sample. |
RUNGM | If MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See The RUNGM and RUNGS states. |
RUNGS | If MP/204 is installed, this includes any user that is running, that is, using CPU, in subtask mode. See The RUNGM and RUNGS states. |
SWPGI | This includes any user that is in the process of being swapped into a server. |
SWPGOBN | This includes any user that is in the process of being swapped out of a server because it is waiting on something other than user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKON. |
SWPGOBU | This includes any user that is in the process of being swapped out of a server because it is waiting on user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKOU. |
SWPGOW | This includes any user that is in the process of being swapped out of a server because it is has been server sliced. If no servers of appropriate size are available at the point the user is swapped out, the user switches to state WTSV. |
WPST | This includes any PST that is not running. |
WTSV | This includes any user that is waiting for a server to become available so that the user could be run. The only reason a user would be in the WTSV state is that all servers of appropriate size are occupied by other users that cannot be swapped out of server. |
In this list, the phrase "waiting for user input" refers to a thread waiting for terminal or line input. In addition, a wait for a response to the console message issued by User 0 on a HALT command is also considered a user input wait. "Sleep" waits, that is, waits resulting from the *SLEEP command and the Pause statement, are not considered user input waits.
Composite states
In addition to the above primary states, several composite states are provided for convenience and report generation. For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. Thus any user in any of the indicated primary states is also considered to be in state SWPG. The following are the available composite states, their component primary states, and an explanation that suggests the meaning of the composite state.
ALL | This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL. |
---|---|
ALLI | This state is made up of RUNG, REDY, BLKIN, and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs. |
ALLN | This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN, and SWPGOW. It includes any user not blocked for user input. It does not include non-running PSTs. |
BLK | This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN, and SWPGOBU. It includes any user that is blocked on anything. |
BLKI | This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything. |
BLKN | This state is made up of BLKIN, BLKON, and SWPGOBN. It includes any user that is blocked for something other than user input. |
BLKO | This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something. |
BLKU | This state is made up of BLKIU, BLKOU, and SWPGOBU. It includes any user that is waiting for user input. |
OSERVN | This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input. |
OSERVU | This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input. |
OSERVW | This state is made up of SWPGOW and WTSV. It includes any user that is either waiting for a server or being swapped out of a server so that it can wait for a server to free up. This latter case only happens when a user is server sliced. |
REDYR | This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. Users in state REDYR can be either running or waiting for the Model 204 scheduler to provide CPU to run. |
RUNBL | This state is made up of RUNG, REDY, WTSV, and SWPGOW. It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the Model 204 scheduler to provide the resources (CPU and/or server) to run. |
SWPG | This state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server. |
SWPGO | This state is made up of SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped out of a server. |
SWPGOB | This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something. |
Specifying states in COLLECT and REPORT STATE statements
Any of the above primary or composite states can be included on COLLECT statements for input to SIRTUNEI and on REPORT STATE statements for input to SIRTUNEREPORT or SIRTUNER. Some valid COLLECT statements are:
COLLECT BLKN SWPG COLLECT ALLN COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI
Some valid REPORT STATE statements are
REPORT STATE BLKIN EVAL REPORT STATE SWPG CHUNK 100 REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4
In addition to user states, SirTune's COLLECT statement lets you request information about DISKIO and CFR. The following is a valid COLLECT statement:
COLLECT DISKIO CFR
But there is no REPORT STATE statement that allows DISKIO nor CFR.
Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for SirTune. The simplest way to ensure this is by explicitly specifying any state to be used in a REPORT STATE statement or a COLLECT statement. For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER:
REPORT STATE BLKN CHUNK 10 REPORT STATE SWPG CHUNK 10
You can code the following COLLECT statement for SirTune:
COLLECT BLKN SWPG
This statement is functionally equivalent to
COLLECT BLKIN BLKON SWPGOBN SWPGOBU SWPGOW SWPGI
In general, if running a relatively small Online (an average of less than 20 logged on users), this statement should not produce a prohibitively large amount of data and makes all reports possible:
COLLECT ALL
If running a midsize to large Online (an average 20+ logged on users), the following statement should collect a sufficient quantity of data to produce most interesting STATE reports without generating a prohibitively large sample data set:
COLLECT ALLN BLKIU SWPGOBU
Specifying the RUNGM and RUNGS states
When running the MP/204 feature with Model 204, a user that is in state RUNG can be further distinguished to be either running in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports:
REPORT STATE RUNGM EVAL REPORT STATE RUNGS EVAL
The first report is a breakdown of users running in maintask mode by evaluating procedure, and the second is a breakdown of users running in subtask mode by evaluating procedure. Maintask mode is often referred to as "serial" mode, and subtask mode is often referred to as "parallel" mode.
The total observations for state RUNG in any sample is always equal to the total observations for state RUNGM plus the total observations for state RUNGS.
The distinction between maintask and subtask mode can be made either on the basis of the task on which a user is running (maintask or subtask), or on its virtual (or logical) MP mode (that is, whether it is capable of running in a subtask or not).
The default distinction is made on the basis of the
actual task on which a user is running.
This can be changed with the SIRTUNER MPVIRT
statement.
This is generally the preferred setting when using
the REPORT STATE RUNGM
report to try to reduce the amount of maintask (serial) SOUL code.
Specifying reports by wait type
Users in state BLK (blocked on anything), always have a wait type
associated with them.
These wait types are the same wait types that appear
next to the users in a Model 204 MONITOR command or in the SirMon WAITTYP statistic.
STATE reports can be requested by these wait types.
To produce these STATE reports by wait type, COLLECT statements (collecting data for all states in which a wait type might occur) must be added to SirTune's input stream (SIRTUNEI
).
For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a REPORT STATE WDISK
report.
Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a REPORT STATE WCFREX
report.
The available wait type reports along with the corresponding Model 204 wait type number, a description of the wait type, and the required states to be collected are listed here:
WMISC | 0 - Miscellaneous waits. Requires BLKN. |
---|---|
WDISK | 1 - Wait for disk I/O. Requires BLKIN. |
WUSERO | 2 - Wait for user output. Requires BLKU. |
WUSERI | 3 - Wait for user input. Requires BLKU. |
WOPERI | 4 - Wait for operator input. Requires BLKU. |
WDUMPO | 5 - Wait for dump write. Requires BLKIN. |
WDUMPI | 6 - Wait for restore read. Requires BLKIN. |
WENQUE | 7 - Wait for miscellaneous enqueue. Requires BLKN. |
WBUFF | 8 - Wait for disk buffer. Requires BLKIN. |
WPST | 10 - Wait on PST. Requires BLKN. |
WIFAM | 11 - IFAM waits. Requires BLKN. |
WSLEEP | 12 - Waits for a time interval, including Pause statements and *SLEEP commands. Requires BLKN. |
WJRNLO | 15 - Wait for journal output. Requires BLKIN. |
WCHKPO | 16 - Wait for checkpoint output. Requires BLKIN. |
WWRITE | 17 - Wait for a checkpoint DECB. Requires BLKIN. |
WARBMO | 18 - Waits for output arbitration. Requires BLKN. |
WCHKPR | 19 - Waits for a checkpoint request. Requires WPST. |
WDISK | 20 - Waits for checkpoint completion. Requires BLKIN. |
WDEAD | 21 - Wait forever (dead thread). Requires BLKU. |
WVSAMI | 22 - Wait for VSAM input. Requires BLKN. |
WLOGIN | 23 - Wait after login failure. Requires BLKN. |
WCFREX | 24 - Wait for critical file resource in exclusive mode. Requires BLKN. |
WCFRSH | 25 - Wait for critical file resource in share mode. Requires BLKN. |
WVTBUF | 26 - Wait for VTAM buffer. Requires BLKN. |
WCONVI | 27 - Wait for inter-process input. Requires BLKN. |
WCONVO | 28 - Wait for inter-process output. Requires BLKN. |
WSCTYI | 29 - Wait for security interface. Requires BLKN. |
WS$WAI | 30 - Swappable $Wait call. Requires BLKN. |
WN$WAI | 31 - Non-swappable $Wait call. Requires BLKIN. |
WULDB2 | 32 - Wait for DB2 subtask. Requires BLKN. |
Thus to produce a breakdown of disk I/O waits by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:
REPORT STATE WDISK EVAL CHUNK 4
To get a breakdown of waits for miscellaneous enqueues (including record locks) by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:
REPORT STATE WENQUE EVAL CHUNK 4
In addition to these primary wait types, there are a few composite wait types for which reports can be generated. These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:
WCFR | This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control. |
---|---|
WLOG | This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for Model 204 recovery, that is, all checkpoint and journal I/O related waits. |
To get a breakdown of waits for critical file resources by evaluating procedure and by individual lines with the procedures, code the following in SIRTUNEI:
REPORT STATE WCFR EVAL CHUNK 4
Critical file resource states
Critical file resources are used by Model 204 to provide multi-user concurrency control on a file level. This control mechanism will sometimes exacerbate some other performance bottleneck. A high value for number of users per sample with wait types CFREX and CFRSH in the SUMMARY report suggests that critical file resource enqueuing bears closer examination.
There are four critical file resources:
DIRECT | Protects table B updates and accesses. |
---|---|
INDEX | Protects accesses and updates of table C and the ordered index. |
EXISTS | Protects accesses and updates of the existence bit map. |
RECENQ | Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the Find Without Locks SOUL statement. |
Determining the cause of a wait
A first step to investigating a critical file resource enqueueing problem is to produce reports for the WCFR state. This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits. Probably the most useful report would be produced by this statement:
REPORT STATE WCFR CHUNK 4
This will break down critical file resource waits by individual lines of SOUL code. Unfortunately, the problem with this type of analysis is that it focuses on the "victims" of critical file resource waits rather than the "culprits," the lines of code holding critical file resources causing other users to wait. While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case.
To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the SirTune data collector.
To have this additional data collected, simply specify the parameter CFR
on a COLLECT
statement for
SirTune. This parameter can be specified alone or with other COLLECT
parameters as in this statement:
COLLECT BLKN DISKIO CFR
After this additional CFR (Critical File Resource) data is collected, SirTune is able to produce several additional reports to help isolate the cause of critical file resource enqueuing. The first report that might be useful is the CFRROOT report. This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide information about which lines of code cause critical file resource waits, so it is not helpful for application tuning.
The CFRROOT report might indicate that
application tuning (rather than system tuning) might be required to reduce
critical file resource enqueuing.
This would be indicated by a primary root cause
of DISK
(disk I/O waits) or maybe JRNLO
(journal I/O waits).
Reducing wait times
You can attack a primary root cause for critical file resource waits by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources.
To facilitate this latter option, several CFR states can be requested on SirTune reports if CFR data had been collected by SirTune. These states are:
CFRHANY | The state where a user holds any critical file resource. |
---|---|
CFRHDIR | The state where a user holds the DIRECT critical file resource. |
CFRHIND | The state where a user holds the INDEX critical file resource. |
CFRHEXS | The state where a user holds the EXISTS critical file resource. |
CFRHREC | The state where a user holds the RECENQ critical file resource. |
CFRBANY | The state where a user holds any critical file resource and is preventing (blocking) another user from obtaining a critical file resource. |
CFRBDIR | The state where a user holds the DIRECT critical file resource and is preventing (blocking) another user from obtaining the DIRECT resource. |
CFRBIND | The state where a user holds the INDEX critical file resource and is preventing (blocking) another user from obtaining the INDEX resource. |
CFRBEXS | The state where a user holds the EXISTS critical file resource and is preventing (blocking) another user from obtaining the EXISTS resource. |
CFRBREC | The state where a user holds the RECENQ critical file resource and is preventing (blocking) an other user from obtaining the RECENQ resource. |
It should be noted that the CFRBxxx states are weighted based on the number of other users holding the resource and the number of users waiting for the resource. For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state.
On the other hand, if a user at a line of code holds the DIRECT resource (in share mode) along with 4 other users, and a single user is waiting for the DIRECT resource, the line of code is considered to have 1/5th of an observation in the CFRBDIR state.
Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical file resource by lines of SOUL code:
REPORT STATE CFRBANY CHUNK 4
This is probably the most useful of the STATE CFRxxxx reports. Once critical file resource blocking is isolated to specific SOUL instructions, critical file resource enqueuing can be reduced by:
- Reducing the number of times the offending instructions are executed.
- Reducing the amount of disk I/O performed by the offending instructions.
- Reducing the amount of CPU used by the offending instructions.
It might be tempting to use the Find Without Locks SOUL statement to reduce the critical file resource enqueuing associated with a statement. This will only work if the resource causing conflicts is the RECENQ resource. All other critical file resources are processed exactly the same way, whether or not a locked record set is being used.
However, if the resource causing the conflict is indeed the RECENQ resource, it is still not recommended that the solution be Find Without Locks. A high conflict rate on the RECENQ resource indicates that the environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best. A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs.
The CFRHxxx reports can be useful for tracking potential critical file resource enqueuing problems (perhaps in a test environment) before they actually happen. These states include any user that holds a critical file resource, whether or not it is blocking anyone. These reports are difficult to interpret, however, since they require a fairly good estimate of expected future usage patterns to have any predictive value.
See also
- SirTune introduction
- SirTune data collection under MVS
- SirTune data collection under CMS
- SirTune data collection statements
- SirTune MODIFY and SMSG commands
- SirTune report generation
- SirTune reports
- SirTune user states
- SirTune and Model 204 quad types
- SirTune statement wildcards
- SirTune date processing