SirTune user states: Difference between revisions
m (→See also) |
RPuszewski (talk | contribs) No edit summary |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
You've been warned. .. (Page built by JAL at the SIRIUS VM; file: FUNPGNEW SYSUT2) --> | You've been warned. .. (Page built by JAL at the SIRIUS VM; file: FUNPGNEW SYSUT2) --> | ||
<!-- Page name: SirTune user states--> | <!-- Page name: SirTune user states--> | ||
==Model 204 user states== | |||
When the <var class="product">SirTune</var> sampling program is collecting a sample it scans all | When the <var class="product">SirTune</var> sampling program is collecting a sample it scans all logged on users. | ||
logged on users. | |||
Each user is classified by its <b>state</b>. | Each user is classified by its <b>state</b>. | ||
The user's state is a general indication of the type of activity | The user's state is a general indication of the type of activity | ||
occurring in a user thread. | occurring in a user thread. | ||
These states roughly correspond to the states reported | These states roughly correspond to the states reported | ||
by the <var class="product">Model 204</var> performance monitor, though broken down to a finer level | by the <var class="product">Model 204</var> performance monitor, though broken down to a finer level of detail. | ||
of detail. | |||
===Primary states=== | |||
The following primary states are distinguished by <var class="product">SirTune</var>: | The following primary states are distinguished by <var class="product">SirTune</var>: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
Line 30: | Line 28: | ||
<tr><th>REDY</th><td>This includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running.</td></tr> | <tr><th>REDY</th><td>This includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running.</td></tr> | ||
<tr><th>RUNG</th><td>This includes any user that is running, that is, using CPU. Unless MP/204 is installed, there can never be more than one user in state RUNG per sample.</td></tr> | <tr><th>RUNG</th><td>This includes any user that is running, that is, using CPU. Unless [[MP/204]] is installed, there can never be more than one user in state RUNG per sample.</td></tr> | ||
<tr><th>RUNGM</th><td>If MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See [[#rungms|The RUNGM and RUNGS states]].</td></tr> | <tr><th>RUNGM</th><td>If MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See [[#rungms|The RUNGM and RUNGS states]].</td></tr> | ||
Line 53: | Line 51: | ||
</table> | </table> | ||
In this list the | In this list, the phrase "waiting for user input" refers to a thread waiting for terminal or line input. In addition, a wait for a response to the console message issued by User 0 on a HALT command is also considered a user input wait. | ||
"Sleep" waits, that is, waits resulting from the *SLEEP command and | "Sleep" waits, that is, waits resulting from the <var>[[*SLEEP command|*SLEEP]]</var> command and | ||
the | the <var>[[Record level locking and concurrency control#pause|Pause]]</var> statement, are not considered user input waits. | ||
===Composite states=== | |||
In addition to the above primary states, several composite states | In addition to the above primary states, several composite states | ||
are provided for convenience and report generation. | are provided for convenience and report generation. | ||
For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU and | For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. | ||
SWPGOW. | |||
Thus any user in any of the indicated primary states is also considered | Thus any user in any of the indicated primary states is also considered | ||
to be in state SWPG. | to be in state SWPG. | ||
The following are the available composite states, their component primary states and an explanation that | The following are the available composite states, their component primary states, and an explanation that suggests the meaning of the composite state. | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<tr><th>ALL</th><td>This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL.</td></tr> | <tr><th>ALL</th> | ||
<td>This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL.</td></tr> | |||
<tr><th>ALLI</th><td>This state is made up of RUNG, REDY, BLKIN and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs.</td></tr> | <tr><th>ALLI</th> | ||
<td>This state is made up of RUNG, REDY, BLKIN, and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs.</td></tr> | |||
<tr><th>ALLN</th><td>This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN and SWPGOW. It includes any user | <tr><th>ALLN</th> | ||
<td>This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN, and SWPGOW. It includes any user not blocked for user input. It does not include non-running PSTs.</td></tr> | |||
<tr><th>BLK</th><td>This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN and SWPGOBU. It includes any user that is blocked on anything.</td></tr> | <tr><th>BLK</th> | ||
<td>This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN, and SWPGOBU. It includes any user that is blocked on anything.</td></tr> | |||
<tr><th>BLKI</th><td>This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything.</td></tr> | <tr><th>BLKI</th> | ||
<td>This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything.</td></tr> | |||
<tr><th>BLKN</th><td>This state is made up of BLKIN, BLKON and SWPGOBN. It includes any user that is blocked for something other than user input.</td></tr> | <tr><th>BLKN</th> | ||
<td>This state is made up of BLKIN, BLKON, and SWPGOBN. It includes any user that is blocked for something other than user input.</td></tr> | |||
<tr><th>BLKO</th><td>This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something.</td></tr> | <tr><th>BLKO</th> | ||
<td>This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something.</td></tr> | |||
<tr><th>BLKU</th><td>This state is made up of BLKIU, BLKOU and SWPGOBU. It includes any user that is waiting for user input.</td></tr> | <tr><th>BLKU</th> | ||
<td>This state is made up of BLKIU, BLKOU, and SWPGOBU. It includes any user that is waiting for user input.</td></tr> | |||
<tr><th>OSERVN</th><td>This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input.</td></tr> | <tr><th>OSERVN</th> | ||
<td>This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input.</td></tr> | |||
<tr><th>OSERVU</th><td>This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input.</td></tr> | <tr><th>OSERVU</th> | ||
<td>This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input.</td></tr> | |||
<tr><th>OSERVW</th> | <tr><th>OSERVW</th> | ||
Line 90: | Line 98: | ||
<tr><th>REDYR</th> | <tr><th>REDYR</th> | ||
<td>This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. | <td>This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. | ||
Users in state REDYR can either running or waiting for the <var class="product">Model 204</var> scheduler to provide CPU to run.</td></tr> | Users in state REDYR can be either running or waiting for the <var class="product">Model 204</var> scheduler to provide CPU to run.</td></tr> | ||
<tr><th>RUNBL</th> | <tr><th>RUNBL</th> | ||
<td>This state is made up of RUNG, REDY, WTSV and SWPGOW. | <td>This state is made up of RUNG, REDY, WTSV, and SWPGOW. | ||
It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can either running or waiting for the <var class="product">Model 204</var> scheduler to provide the resources (CPU and/or server) to run.</td></tr> | It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the <var class="product">Model 204</var> scheduler to provide the resources (CPU and/or server) to run.</td></tr> | ||
<tr><th>SWPG</th><td>This state is made up of SWPGI, SWPGOBN, SWPGOBU and SWPGOW. It includes any user that is being swapped into or out of a server.</td></tr> | <tr><th>SWPG</th><td>This state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server.</td></tr> | ||
<tr><th>SWPGO</th><td>This state is made up of SWPGOBN, SWPGOBU and SWPGOW. It includes any user that is being swapped out of a server.</td></tr> | <tr><th>SWPGO</th> | ||
<td>This state is made up of SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped out of a server.</td></tr> | |||
<tr><th>SWPGOB</th><td>This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something.</td></tr> | <tr><th>SWPGOB</th> | ||
<td>This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something.</td></tr> | |||
</table> | </table> | ||
===Specifying states in COLLECT and REPORT STATE statements=== | |||
Any of the above primary or composite states can be included on | Any of the above primary or composite states can be included on | ||
COLLECT statements for input to | [[SirTune data collection statements#colstat|COLLECT]] statements for input to SIRTUNEI and on [[SirTune reports#STATE reports|REPORT STATE]] statements for input to SIRTUNEREPORT or SIRTUNER. | ||
for input to SIRTUNER. | |||
Some valid COLLECT statements are: | Some valid COLLECT statements are: | ||
<p class="code" | <p class="code">COLLECT BLKN SWPG | ||
COLLECT ALLN | COLLECT ALLN | ||
COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI | COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI | ||
</p> | |||
Some valid REPORT STATE statements are | Some valid REPORT STATE statements are | ||
<p class="code" | <p class="code">REPORT STATE BLKIN EVAL | ||
REPORT STATE SWPG CHUNK 100 | REPORT STATE SWPG CHUNK 100 | ||
REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4 | REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4 | ||
</p> | |||
In addition to user states, <var class="product">SirTune</var>'s COLLECT statement | In addition to user states, <var class="product">SirTune</var>'s COLLECT statement lets you request information about DISKIO and CFR. | ||
you | The following is a valid COLLECT statement: | ||
The following is a valid COLLECT statement | <p class="code">COLLECT DISKIO CFR | ||
<p class="code" | </p> | ||
But there is no REPORT STATE statement that allows DISKIO nor CFR. | |||
Any state requested in a REPORT STATE statement must have had the corresponding | Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for <var class="product">SirTune</var>. | ||
primary states explicitly or implicitly specified on COLLECT statements for | |||
<var class="product">SirTune</var>. | |||
The simplest way to ensure this is by explicitly specifying any state | The simplest way to ensure this is by explicitly specifying any state | ||
to be used in a REPORT STATE statement | to be used in a REPORT STATE statement or a COLLECT statement. | ||
For example, if | For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER: | ||
<p class="code" | <p class="code">REPORT STATE BLKN CHUNK 10 | ||
REPORT STATE SWPG CHUNK 10 | REPORT STATE SWPG CHUNK 10 | ||
</p> | |||
You can code the following COLLECT statement for <var class="product">SirTune</var>: | |||
<p class="code" | <p class="code">COLLECT BLKN SWPG | ||
</p> | |||
This statement is functionally equivalent to | This statement is functionally equivalent to | ||
<p class="code" | <p class="code">COLLECT BLKIN BLKON SWPGOBN SWPGOBU SWPGOW SWPGI | ||
</ | </p> | ||
In general, if running a relatively small Online (an average of | |||
less than 20 logged on users), this statement should not produce a prohibitively large amount of data and | |||
makes all reports possible: | |||
<p class="code">COLLECT ALL | |||
</p> | |||
If running a midsize to large | If running a midsize to large | ||
Online (an average 20+ logged on users), the following statement | |||
should collect a sufficient quantity of data to produce most | should collect a sufficient quantity of data to produce most | ||
interesting STATE reports without generating a prohibitively large | interesting STATE reports without generating a prohibitively large | ||
sample | sample data set: | ||
<p class="code">COLLECT ALLN BLKIU SWPGOBU | |||
</p> | |||
<div id="rungms"></div> | <div id="rungms"></div> | ||
== | ===Specifying the RUNGM and RUNGS states=== | ||
<!--Caution: <div> above--> | <!--Caution: <div> above--> | ||
When running the MP/204 feature with <var class="product">Model 204</var> a user that is | When running the [[MP/204]] feature with <var class="product">Model 204</var>, a user that is in state RUNG can be further distinguished to be either running | ||
in state RUNG can be further distinguished to be either running | |||
in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. | in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. | ||
For example, | For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports: | ||
<p class="code" | <p class="code">REPORT STATE RUNGM EVAL | ||
REPORT STATE RUNGS EVAL | REPORT STATE RUNGS EVAL | ||
</p> | |||
The first report is a breakdown of users running in maintask | |||
The first is a breakdown of users running in maintask | mode by evaluating procedure, and the second is a breakdown of users running in subtask mode by evaluating procedure. | ||
mode by evaluating procedure and the second is a breakdown of users running | Maintask mode is often referred to as "serial" mode, and subtask mode is often referred to as "parallel" mode. | ||
in subtask mode by evaluating procedure. | |||
Maintask mode is often referred to as "serial" mode, and subtask mode is often | |||
referred to as "parallel" mode. | |||
The total observations for state RUNG in any sample is always equal to the | The total observations for state RUNG in any sample is always equal to the | ||
total observations for state RUNGM plus the total observations for state RUNGS. | total observations for state RUNGM plus the total observations for state RUNGS. | ||
The distinction between maintask and subtask mode can be made either on the | The distinction between maintask and subtask mode can be made either on the basis of the task on which a user is running (maintask or subtask), or on its virtual (or logical) MP mode (that is, whether it is capable of running in a subtask or not). | ||
basis of the task on which a user is running (maintask or subtask), or on its | |||
virtual (or logical) MP mode (that is, whether it is capable of running in | |||
a subtask or not). | |||
The default distinction is made on the basis of the | The default distinction is made on the basis of the | ||
actual task on which a user is running. | actual task on which a user is running. | ||
This can be changed with the SIRTUNER MPVIRT statement. | This can be changed with the SIRTUNER <code>[[SirTune report generation#mpvirt|MPVIRT]]</code> statement. | ||
This is generally the preferred setting when using | This is generally the preferred setting when using | ||
the REPORT STATE RUNGM report to try to reduce the amount of maintask (serial) | the <code>REPORT STATE RUNGM</code> report to try to reduce the amount of maintask (serial) SOUL code. | ||
== | ==Specifying reports by wait type== | ||
Users in state BLK (blocked on anything), always have a wait type | Users in state BLK (blocked on anything), always have a wait type | ||
associated with them. | associated with them. | ||
These wait types are the same wait types that appear | These wait types are the same wait types that appear | ||
next to the users in a <var class="product">Model 204</var> MONITOR command or in the | next to the users in a <var class="product">Model 204</var> <var>[[ONLINE monitoring#Wait type values|MONITOR]]</var> command or in the SirMon [[User statistics displayed in SirMon|WAITTYP statistic]]. | ||
STATE reports can be requested by these wait types. | STATE reports can be requested by these wait types. | ||
To produce these STATE reports by wait type, COLLECT statements (collecting | To produce these STATE reports by wait type, [[SirTune data collection statements#colstat|COLLECT statements]] (collecting data for all states in which a wait type might occur) must be added to <var class="product">SirTune</var>'s input stream (<code>SIRTUNEI</code>). | ||
data for all states in which a wait type might occur) must be added to <var class="product">SirTune</var>'s input stream (SIRTUNEI). | |||
For example, disk I/O wait types are not swappable, so it is only necessary to collect | For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a <code>REPORT STATE WDISK</code> report. | ||
state BLKIN to produce a REPORT STATE WDISK report. | Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a <code>REPORT STATE WCFREX</code> report. | ||
Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all | |||
be collected to produce a REPORT STATE WCFREX report. | |||
The available wait type reports along with the corresponding <var class="product">Model 204</var> wait type number, a description of the wait type, and the required states to be collected | The available wait type reports along with the corresponding <var class="product">Model 204</var> wait type number, a description of the wait type, and the required states to be collected | ||
are listed here: | are listed here: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<caption>Wait type reports</caption> | |||
<tr><th>WMISC</th><td>0 - Miscellaneous waits. Requires BLKN.</td></tr> | <tr><th>WMISC</th><td>0 - Miscellaneous waits. Requires BLKN.</td></tr> | ||
<tr><th>WDISK</th><td>1 - Wait for disk I/O. Requires BLKIN.</td></tr> | <tr><th>WDISK</th><td>1 - Wait for disk I/O. Requires BLKIN.</td></tr> | ||
Line 214: | Line 212: | ||
<tr><th>WPST</th><td>10 - Wait on PST. Requires BLKN.</td></tr> | <tr><th>WPST</th><td>10 - Wait on PST. Requires BLKN.</td></tr> | ||
<tr><th>WIFAM</th><td>11 - IFAM waits. Requires BLKN.</td></tr> | <tr><th>WIFAM</th><td>11 - IFAM waits. Requires BLKN.</td></tr> | ||
<tr><th>WSLEEP</th><td>12 - Waits for a time interval, including | <tr><th>WSLEEP</th><td>12 - Waits for a time interval, including <var>[[Record level locking and concurrency control#pause|Pause]]</var> statements and <var>[[*SLEEP command|*SLEEP]]</var> commands. Requires BLKN.</td></tr> | ||
<tr><th>WJRNLO</th><td>15 - Wait for journal output. Requires BLKIN.</td></tr> | <tr><th>WJRNLO</th><td>15 - Wait for journal output. Requires BLKIN.</td></tr> | ||
<tr><th>WCHKPO</th><td>16 - Wait for checkpoint output. Requires BLKIN.</td></tr> | <tr><th>WCHKPO</th><td>16 - Wait for checkpoint output. Requires BLKIN.</td></tr> | ||
Line 230: | Line 228: | ||
<tr><th>WCONVO</th><td>28 - Wait for inter-process output. Requires BLKN.</td></tr> | <tr><th>WCONVO</th><td>28 - Wait for inter-process output. Requires BLKN.</td></tr> | ||
<tr><th>WSCTYI</th><td>29 - Wait for security interface. Requires BLKN.</td></tr> | <tr><th>WSCTYI</th><td>29 - Wait for security interface. Requires BLKN.</td></tr> | ||
<tr><th>WS$WAI</th><td>30 - Swappable $ | <tr><th>WS$WAI</th><td>30 - Swappable <var>[[$Wait]]</var> call. Requires BLKN.</td></tr> | ||
<tr><th>WN$WAI</th><td>31 - Non-swappable $ | <tr><th>WN$WAI</th><td>31 - Non-swappable <var>$Wait</var> call. Requires BLKIN.</td></tr> | ||
<tr><th>WULDB2</th><td>32 - Wait for DB2 subtask. Requires BLKN.</td></tr> | <tr><th>WULDB2</th><td>32 - Wait for DB2 subtask. Requires BLKN.</td></tr> | ||
<tr><th>WOCSUB</th><td>33 - Waiting on Open/Close subtask. Requires BLKIN.</td></tr> | |||
<tr><th>WDBUGU</th><td>38 - Wait for user being debugged. Requires BLKN.</td></tr> | |||
<tr><th>WDBUGD</th><td>39 - Wait for user performing debugging. Requires BLKN.</td></tr> | |||
<tr><th>WMQTSK</th><td>40 - Wait for MQ subtask to become available. Requires BLKN.</td></tr> | |||
<tr><th>WMQAPI</th><td>41 - Wait for MQ subtask to run. Requires BLKIN.</td></tr> | |||
<tr><th>WMQGWT</th><td>42 - Wait for MQGET with wait time specified. Requires BLKN.</td></tr> | |||
<tr><th>WECLD</th><td>43 - Wait for ECF to load/delete a module. Requires BLKN.</td></tr> | |||
<tr><th>WECMOD</th><td>44 - Wait for external module to become free. Requires BLKN.</td></tr> | |||
<tr><th>WECTSK</th><td>45 - Wait for ECF subtask to become free. Requires BLKN.</td></tr> | |||
<tr><th>WECRUN</th><td>46 - Wait for external module to run. Requires BLKN.</td></tr> | |||
<tr><th>W$WTQZ</th><td>47 - User within $WAIT('CPQZ') wait; CHKPPST within extended quiesce. Requires BLKN.</td></tr> | |||
<tr><th>W$WTXS</th><td>48 - User within $WAIT('QZSIG') wait. Requires BLKN.</td></tr> | |||
<tr><th>W$NDEQ</th><td>49 - At end of extended quiesce, waiting for count of $WAIT('CPQZ') and $WAIT('QZSIG') users to go to zero. Requires BLKN.</td></tr> | |||
<tr><th>WHSM</th><td>50 - Wait FOR HSM recall of a migrated dataset. Requires BLKN.</td></tr> | |||
<tr><th>WCDS</th><td>51 - Wait for share mode constraints DB lock. Requires BLKIN.</td></tr> | |||
<tr><th>WCDX</th><td>52 - Wait for exclusive mode constraints DB lock. Requires BLKIN.</td></tr> | |||
<tr><th>WSBBOL</th><td>53 - Wait for SUB-TRANS CP processing to complete for this user. Requires BLKN.</td></tr> | |||
<tr><th>WSBBFC</th><td>54 - SUB-TRAN CP postponement - waiting on blocking file command to complete. Requires BLKIN.</td></tr> | |||
<tr><th>WSBTMR</th><td>55 - SUB-TRAN CP CPTS timer wait. Requires BLKIN.</td></tr> | |||
<tr><th>WSBARY</th><td>56 - SUB-TRAN CP scanner array wait. Requires BLKIN.</td></tr> | |||
<tr><th>WDMNM</th><td>57 - A daemon child waiting on its master. Requires BLK.</td></tr> | |||
<tr><th>WDMND</th><td>58 - A daemon master waiting on its daemon. Requires BLK.</td></tr> | |||
<tr><th>WCUSTn</th><td>80-89 - Customer reserved wait codes.</td></tr> | |||
<tr><th>WFUNLD</th><td>97 - Fast Unload request. Requires BLKN.</td></tr> | |||
<tr><th>WMAXAU</th><td>98 - MAXAUSER delay. Requires BLKN.</td></tr> | |||
<tr><th>WSFQUI</th><td>99 - SirFact quiesce wait. Requires BLKN.</td></tr> | |||
</table> | </table> | ||
Line 251: | Line 276: | ||
These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here: | These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<tr><th>WCFR</th><td>This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.</td></tr> | <tr><th>WCFR</th> | ||
<td>This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.</td></tr> | |||
<tr><th>WLOG</th> | <tr><th>WLOG</th> | ||
<td>This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for <var class="product">Model 204</var> recovery, that is, all checkpoint and journal I/O related waits.</td></tr> | <td>This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for <var class="product">Model 204</var> recovery, that is, all checkpoint and journal I/O related waits.</td></tr> | ||
</table> | </table> | ||
Line 265: | Line 291: | ||
==Critical file resource states== | ==Critical file resource states== | ||
<p> | <p> | ||
Critical file resources are used by <var class="product">Model 204</var> to provide multi-user | Critical file resources are used by <var class="product">Model 204</var> to provide multi-user | ||
concurrency control on a file level. This control mechanism will sometimes | concurrency control on a file level. This control mechanism will sometimes | ||
exacerbate some other performance bottleneck. | exacerbate some other performance bottleneck. | ||
A high value for number of users per sample with wait types CFREX and CFRSH in the | A high value for number of users per sample with wait types CFREX and CFRSH in the [[SirTune reports#SUMMARY reports|SUMMARY report]] suggests that critical file resource enqueuing bears closer examination.</p> | ||
SUMMARY report suggests that critical file resource enqueuing bears closer examination.</p> | |||
There are four | There are four critical file resources: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<caption>Critical file resources</caption> | |||
<tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr> | <tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr> | ||
<tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr> | <tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr> | ||
<tr><th>EXISTS</th><td>Protects accesses and updates of the existence bit map.</td></tr> | <tr><th>EXISTS</th><td>Protects accesses and updates of the existence bit map.</td></tr> | ||
<tr><th>RECENQ</th><td>Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the | |||
<tr><th>RECENQ</th><td>Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the <var>[[Find Without Locks statement|Find Without Locks]]</var> SOUL statement.</td></tr> | |||
</table> | </table> | ||
===Determining the cause of a wait=== | |||
A first step to investigating a critical file resource enqueueing problem | A first step to investigating a critical file resource enqueueing problem | ||
is to produce reports for | is to produce reports for the WCFR state. | ||
This will help isolate the programs or lines of code that encounter frequent or long critical file | This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits. | ||
resource waits. | |||
Probably the most useful report would be produced by this statement: | Probably the most useful report would be produced by this statement: | ||
<p class="code" | <p class="code">REPORT STATE WCFR CHUNK 4 | ||
</p> | |||
This will break down critical file resource waits by individual lines of | This will break down critical file resource waits by individual lines of | ||
SOUL code. Unfortunately, the problem with this type of analysis | |||
is that it focuses on the "victims" of critical file resource waits rather | is that it focuses on the "victims" of critical file resource waits rather | ||
than the "culprits," the lines of code holding critical file resources causing other users to wait. | than the "culprits," the lines of code holding critical file resources causing other users to wait. | ||
While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the | While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case. | ||
To determine the actual cause of critical file resource enqueuing, more data | To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the <var class="product">SirTune</var> data collector. | ||
needs to be collected by the <var class="product">SirTune</var> data collector. | To have this additional data collected, simply specify the parameter <code>CFR</code> on a <code>COLLECT</code> statement for | ||
To have this additional data collected, simply specify the parameter CFR on a COLLECT statement for | <var class="product">SirTune</var>. This parameter can be specified alone or with other <code>COLLECT</code> parameters as in this statement: | ||
<var class="product">SirTune</var>. This parameter can be specified alone or with other COLLECT parameters as in this statement: | <p class="code">COLLECT BLKN DISKIO CFR | ||
<p class="code" | </p> | ||
After this additional CFR (Critical File Resource) data is collected, | After this additional CFR (Critical File Resource) data is collected, | ||
SirTune is able to produce several additional reports to help isolate the | |||
cause of critical file resource enqueuing. | cause of critical file resource enqueuing. | ||
The first report that might be useful is the CFRROOT report. | The first report that might be useful is the [[SirTune reports#CFRROOT reports|CFRROOT]] report. | ||
This report indicates the base wait types that are behind critical file resource waits. | This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide | ||
The CFRROOT report does not provide | information about which lines of code cause critical file resource waits, so it is not helpful for application tuning. | ||
information | |||
is not helpful for application tuning. | |||
The CFRROOT report might indicate that | The CFRROOT report might indicate that | ||
Line 313: | Line 338: | ||
critical file resource enqueuing. | critical file resource enqueuing. | ||
This would be indicated by a primary root cause | This would be indicated by a primary root cause | ||
of DISK (disk I/O waits) or maybe JRNLO (journal I/O waits). | of <code>DISK</code> (disk I/O waits) or maybe <code>JRNLO</code> (journal I/O waits). | ||
===Reducing wait times=== | |||
You can attack a primary root cause for critical file resource waits | You can attack a primary root cause for critical file resource waits | ||
by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources. | |||
tuning), or by specifically targeting those instructions that hold critical file | |||
resources. | |||
To facilitate this latter option, several CFR states can be requested | To facilitate this latter option, several CFR states can be requested | ||
on | on SirTune reports if CFR data had been collected by <var class="product">SirTune</var>. | ||
These states are: | These states are: | ||
<table class="thJustBold"> | <table class="thJustBold"> | ||
<caption>CFR states</caption> | |||
<tr><th>CFRHANY</th> | <tr><th>CFRHANY</th> | ||
<td>The state where a user holds any critical file resource.</td></tr> | <td>The state where a user holds any critical file resource.</td></tr> | ||
Line 355: | Line 380: | ||
</table> | </table> | ||
It should be noted that the CFRB | It should be noted that the CFRB<i>xxx</i> states are weighted based on the number of other users holding the resource and the number of users waiting for the resource. | ||
other users holding the resource and the number of users waiting for the resource. | For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state. | ||
For example, if a user at a line of code holds the DIRECT resource and 3 other | |||
users are waiting for the resource, that line of code is considered to have 3 | |||
observations in the CFRBDIR state. | |||
On the other hand, if a user at a line of code | On the other hand, if a user at a line of code | ||
Line 366: | Line 388: | ||
considered to have 1/5th of an observation in the CFRBDIR state. | considered to have 1/5th of an observation in the CFRBDIR state. | ||
Generally, the most useful reports for reducing critical file resource waits | Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical | ||
are the CFRB reports. This statement: | file resource by lines of SOUL code: | ||
<p class="code"><nowiki>REPORT STATE CFRBANY CHUNK 4 | <p class="code"><nowiki>REPORT STATE CFRBANY CHUNK 4 | ||
</nowiki></p> | </nowiki></p> | ||
This is probably the most useful of the STATE CFR<i>xxxx</i> reports. | |||
This is probably the most useful of the STATE CFR | |||
Once critical file resource blocking is isolated | Once critical file resource blocking is isolated | ||
to specific | to specific SOUL instructions, critical file resource enqueuing can be | ||
reduced by: | reduced by: | ||
<ul> | <ul> | ||
<li>Reducing the number of times the offending instructions are executed.</li> | <li>Reducing the number of times the offending instructions are executed.</li> | ||
<li>Reducing the amount of disk I/O performed by the offending instructions.</li> | <li>Reducing the amount of disk I/O performed by the offending instructions.</li> | ||
<li>Reducing the amount of CPU used by the offending instructions.</li> | <li>Reducing the amount of CPU used by the offending instructions.</li> | ||
</ul> | </ul> | ||
It might be tempting to use the | It might be tempting to use the <var>Find Without Locks</var> SOUL statement to | ||
reduce the critical file resource enqueuing associated with a statement. | reduce the critical file resource enqueuing associated with a statement. | ||
This will only work if the resource causing conflicts is the RECENQ resource. | This will only work if the resource causing conflicts is the RECENQ resource. | ||
Line 390: | Line 413: | ||
However, if the resource causing the conflict is indeed the RECENQ | However, if the resource causing the conflict is indeed the RECENQ | ||
resource, it is still <i>not</i> recommended that the solution be | resource, it is still <i>not</i> recommended that the solution be | ||
<var>Find Without Locks</var>. | |||
A high conflict rate on the RECENQ resource indicates that the | A high conflict rate on the RECENQ resource indicates that the | ||
environment has a high update activity level, which means that operating on | environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best. | ||
unenqueued found sets is a questionable tactic at best. | A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs. | ||
A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing | |||
found sets before any terminal I/O occurs. | |||
The CFRH | The CFRH<i>xxx</i> reports can be useful for tracking potential critical file resource | ||
enqueuing problems (perhaps in a test environment) before they actually happen. | enqueuing problems (perhaps in a test environment) before they actually happen. | ||
These states include any user that holds a critical file resource, whether | These states include any user that holds a critical file resource, whether | ||
or not it is blocking anyone. | or not it is blocking anyone. | ||
These reports are difficult to interpret, however, | These reports are difficult to interpret, however, | ||
since they require a fairly good estimate of expected future usage patterns | since they require a fairly good estimate of expected future usage patterns to have any predictive value. | ||
to have any predictive value. | |||
==See also== | ==See also== |
Latest revision as of 17:53, 31 October 2019
Model 204 user states
When the SirTune sampling program is collecting a sample it scans all logged on users. Each user is classified by its state. The user's state is a general indication of the type of activity occurring in a user thread. These states roughly correspond to the states reported by the Model 204 performance monitor, though broken down to a finer level of detail.
Primary states
The following primary states are distinguished by SirTune:
BLKIN | This includes any user that is blocked, that is waiting for something, in a server and not waiting for user input. This is distinguished from BLKIU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
---|---|
BLKIU | This includes any user that is blocked, that is waiting for something, in a server and waiting for user input. This is distinguished from BLKIN because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
BLKON | This includes any user that is blocked, that is waiting for something, not in a server and not waiting for user input. This is distinguished from BLKOU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
BLKOU | This includes any user that is blocked, that is waiting for something, not in a server and waiting for user input. This is distinguished from BLKON because waits for things other than user input are generally viewed as a performance problem while waits for user input are not. |
REDY | This includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running. |
RUNG | This includes any user that is running, that is, using CPU. Unless MP/204 is installed, there can never be more than one user in state RUNG per sample. |
RUNGM | If MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See The RUNGM and RUNGS states. |
RUNGS | If MP/204 is installed, this includes any user that is running, that is, using CPU, in subtask mode. See The RUNGM and RUNGS states. |
SWPGI | This includes any user that is in the process of being swapped into a server. |
SWPGOBN | This includes any user that is in the process of being swapped out of a server because it is waiting on something other than user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKON. |
SWPGOBU | This includes any user that is in the process of being swapped out of a server because it is waiting on user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKOU. |
SWPGOW | This includes any user that is in the process of being swapped out of a server because it is has been server sliced. If no servers of appropriate size are available at the point the user is swapped out, the user switches to state WTSV. |
WPST | This includes any PST that is not running. |
WTSV | This includes any user that is waiting for a server to become available so that the user could be run. The only reason a user would be in the WTSV state is that all servers of appropriate size are occupied by other users that cannot be swapped out of server. |
In this list, the phrase "waiting for user input" refers to a thread waiting for terminal or line input. In addition, a wait for a response to the console message issued by User 0 on a HALT command is also considered a user input wait. "Sleep" waits, that is, waits resulting from the *SLEEP command and the Pause statement, are not considered user input waits.
Composite states
In addition to the above primary states, several composite states are provided for convenience and report generation. For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. Thus any user in any of the indicated primary states is also considered to be in state SWPG. The following are the available composite states, their component primary states, and an explanation that suggests the meaning of the composite state.
ALL | This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL. |
---|---|
ALLI | This state is made up of RUNG, REDY, BLKIN, and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs. |
ALLN | This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN, and SWPGOW. It includes any user not blocked for user input. It does not include non-running PSTs. |
BLK | This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN, and SWPGOBU. It includes any user that is blocked on anything. |
BLKI | This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything. |
BLKN | This state is made up of BLKIN, BLKON, and SWPGOBN. It includes any user that is blocked for something other than user input. |
BLKO | This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something. |
BLKU | This state is made up of BLKIU, BLKOU, and SWPGOBU. It includes any user that is waiting for user input. |
OSERVN | This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input. |
OSERVU | This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input. |
OSERVW | This state is made up of SWPGOW and WTSV. It includes any user that is either waiting for a server or being swapped out of a server so that it can wait for a server to free up. This latter case only happens when a user is server sliced. |
REDYR | This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. Users in state REDYR can be either running or waiting for the Model 204 scheduler to provide CPU to run. |
RUNBL | This state is made up of RUNG, REDY, WTSV, and SWPGOW. It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the Model 204 scheduler to provide the resources (CPU and/or server) to run. |
SWPG | This state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server. |
SWPGO | This state is made up of SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped out of a server. |
SWPGOB | This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something. |
Specifying states in COLLECT and REPORT STATE statements
Any of the above primary or composite states can be included on COLLECT statements for input to SIRTUNEI and on REPORT STATE statements for input to SIRTUNEREPORT or SIRTUNER. Some valid COLLECT statements are:
COLLECT BLKN SWPG COLLECT ALLN COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI
Some valid REPORT STATE statements are
REPORT STATE BLKIN EVAL REPORT STATE SWPG CHUNK 100 REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4
In addition to user states, SirTune's COLLECT statement lets you request information about DISKIO and CFR. The following is a valid COLLECT statement:
COLLECT DISKIO CFR
But there is no REPORT STATE statement that allows DISKIO nor CFR.
Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for SirTune. The simplest way to ensure this is by explicitly specifying any state to be used in a REPORT STATE statement or a COLLECT statement. For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER:
REPORT STATE BLKN CHUNK 10 REPORT STATE SWPG CHUNK 10
You can code the following COLLECT statement for SirTune:
COLLECT BLKN SWPG
This statement is functionally equivalent to
COLLECT BLKIN BLKON SWPGOBN SWPGOBU SWPGOW SWPGI
In general, if running a relatively small Online (an average of less than 20 logged on users), this statement should not produce a prohibitively large amount of data and makes all reports possible:
COLLECT ALL
If running a midsize to large Online (an average 20+ logged on users), the following statement should collect a sufficient quantity of data to produce most interesting STATE reports without generating a prohibitively large sample data set:
COLLECT ALLN BLKIU SWPGOBU
Specifying the RUNGM and RUNGS states
When running the MP/204 feature with Model 204, a user that is in state RUNG can be further distinguished to be either running in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports:
REPORT STATE RUNGM EVAL REPORT STATE RUNGS EVAL
The first report is a breakdown of users running in maintask mode by evaluating procedure, and the second is a breakdown of users running in subtask mode by evaluating procedure. Maintask mode is often referred to as "serial" mode, and subtask mode is often referred to as "parallel" mode.
The total observations for state RUNG in any sample is always equal to the total observations for state RUNGM plus the total observations for state RUNGS.
The distinction between maintask and subtask mode can be made either on the basis of the task on which a user is running (maintask or subtask), or on its virtual (or logical) MP mode (that is, whether it is capable of running in a subtask or not).
The default distinction is made on the basis of the
actual task on which a user is running.
This can be changed with the SIRTUNER MPVIRT
statement.
This is generally the preferred setting when using
the REPORT STATE RUNGM
report to try to reduce the amount of maintask (serial) SOUL code.
Specifying reports by wait type
Users in state BLK (blocked on anything), always have a wait type
associated with them.
These wait types are the same wait types that appear
next to the users in a Model 204 MONITOR command or in the SirMon WAITTYP statistic.
STATE reports can be requested by these wait types.
To produce these STATE reports by wait type, COLLECT statements (collecting data for all states in which a wait type might occur) must be added to SirTune's input stream (SIRTUNEI
).
For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a REPORT STATE WDISK
report.
Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a REPORT STATE WCFREX
report.
The available wait type reports along with the corresponding Model 204 wait type number, a description of the wait type, and the required states to be collected are listed here:
WMISC | 0 - Miscellaneous waits. Requires BLKN. |
---|---|
WDISK | 1 - Wait for disk I/O. Requires BLKIN. |
WUSERO | 2 - Wait for user output. Requires BLKU. |
WUSERI | 3 - Wait for user input. Requires BLKU. |
WOPERI | 4 - Wait for operator input. Requires BLKU. |
WDUMPO | 5 - Wait for dump write. Requires BLKIN. |
WDUMPI | 6 - Wait for restore read. Requires BLKIN. |
WENQUE | 7 - Wait for miscellaneous enqueue. Requires BLKN. |
WBUFF | 8 - Wait for disk buffer. Requires BLKIN. |
WPST | 10 - Wait on PST. Requires BLKN. |
WIFAM | 11 - IFAM waits. Requires BLKN. |
WSLEEP | 12 - Waits for a time interval, including Pause statements and *SLEEP commands. Requires BLKN. |
WJRNLO | 15 - Wait for journal output. Requires BLKIN. |
WCHKPO | 16 - Wait for checkpoint output. Requires BLKIN. |
WWRITE | 17 - Wait for a checkpoint DECB. Requires BLKIN. |
WARBMO | 18 - Waits for output arbitration. Requires BLKN. |
WCHKPR | 19 - Waits for a checkpoint request. Requires WPST. |
WDISK | 20 - Waits for checkpoint completion. Requires BLKIN. |
WDEAD | 21 - Wait forever (dead thread). Requires BLKU. |
WVSAMI | 22 - Wait for VSAM input. Requires BLKN. |
WLOGIN | 23 - Wait after login failure. Requires BLKN. |
WCFREX | 24 - Wait for critical file resource in exclusive mode. Requires BLKN. |
WCFRSH | 25 - Wait for critical file resource in share mode. Requires BLKN. |
WVTBUF | 26 - Wait for VTAM buffer. Requires BLKN. |
WCONVI | 27 - Wait for inter-process input. Requires BLKN. |
WCONVO | 28 - Wait for inter-process output. Requires BLKN. |
WSCTYI | 29 - Wait for security interface. Requires BLKN. |
WS$WAI | 30 - Swappable $Wait call. Requires BLKN. |
WN$WAI | 31 - Non-swappable $Wait call. Requires BLKIN. |
WULDB2 | 32 - Wait for DB2 subtask. Requires BLKN. |
WOCSUB | 33 - Waiting on Open/Close subtask. Requires BLKIN. |
WDBUGU | 38 - Wait for user being debugged. Requires BLKN. |
WDBUGD | 39 - Wait for user performing debugging. Requires BLKN. |
WMQTSK | 40 - Wait for MQ subtask to become available. Requires BLKN. |
WMQAPI | 41 - Wait for MQ subtask to run. Requires BLKIN. |
WMQGWT | 42 - Wait for MQGET with wait time specified. Requires BLKN. |
WECLD | 43 - Wait for ECF to load/delete a module. Requires BLKN. |
WECMOD | 44 - Wait for external module to become free. Requires BLKN. |
WECTSK | 45 - Wait for ECF subtask to become free. Requires BLKN. |
WECRUN | 46 - Wait for external module to run. Requires BLKN. |
W$WTQZ | 47 - User within $WAIT('CPQZ') wait; CHKPPST within extended quiesce. Requires BLKN. |
W$WTXS | 48 - User within $WAIT('QZSIG') wait. Requires BLKN. |
W$NDEQ | 49 - At end of extended quiesce, waiting for count of $WAIT('CPQZ') and $WAIT('QZSIG') users to go to zero. Requires BLKN. |
WHSM | 50 - Wait FOR HSM recall of a migrated dataset. Requires BLKN. |
WCDS | 51 - Wait for share mode constraints DB lock. Requires BLKIN. |
WCDX | 52 - Wait for exclusive mode constraints DB lock. Requires BLKIN. |
WSBBOL | 53 - Wait for SUB-TRANS CP processing to complete for this user. Requires BLKN. |
WSBBFC | 54 - SUB-TRAN CP postponement - waiting on blocking file command to complete. Requires BLKIN. |
WSBTMR | 55 - SUB-TRAN CP CPTS timer wait. Requires BLKIN. |
WSBARY | 56 - SUB-TRAN CP scanner array wait. Requires BLKIN. |
WDMNM | 57 - A daemon child waiting on its master. Requires BLK. |
WDMND | 58 - A daemon master waiting on its daemon. Requires BLK. |
WCUSTn | 80-89 - Customer reserved wait codes. |
WFUNLD | 97 - Fast Unload request. Requires BLKN. |
WMAXAU | 98 - MAXAUSER delay. Requires BLKN. |
WSFQUI | 99 - SirFact quiesce wait. Requires BLKN. |
Thus to produce a breakdown of disk I/O waits by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:
REPORT STATE WDISK EVAL CHUNK 4
To get a breakdown of waits for miscellaneous enqueues (including record locks) by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:
REPORT STATE WENQUE EVAL CHUNK 4
In addition to these primary wait types, there are a few composite wait types for which reports can be generated. These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:
WCFR | This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control. |
---|---|
WLOG | This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for Model 204 recovery, that is, all checkpoint and journal I/O related waits. |
To get a breakdown of waits for critical file resources by evaluating procedure and by individual lines with the procedures, code the following in SIRTUNEI:
REPORT STATE WCFR EVAL CHUNK 4
Critical file resource states
Critical file resources are used by Model 204 to provide multi-user concurrency control on a file level. This control mechanism will sometimes exacerbate some other performance bottleneck. A high value for number of users per sample with wait types CFREX and CFRSH in the SUMMARY report suggests that critical file resource enqueuing bears closer examination.
There are four critical file resources:
DIRECT | Protects table B updates and accesses. |
---|---|
INDEX | Protects accesses and updates of table C and the ordered index. |
EXISTS | Protects accesses and updates of the existence bit map. |
RECENQ | Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the Find Without Locks SOUL statement. |
Determining the cause of a wait
A first step to investigating a critical file resource enqueueing problem is to produce reports for the WCFR state. This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits. Probably the most useful report would be produced by this statement:
REPORT STATE WCFR CHUNK 4
This will break down critical file resource waits by individual lines of SOUL code. Unfortunately, the problem with this type of analysis is that it focuses on the "victims" of critical file resource waits rather than the "culprits," the lines of code holding critical file resources causing other users to wait. While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case.
To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the SirTune data collector.
To have this additional data collected, simply specify the parameter CFR
on a COLLECT
statement for
SirTune. This parameter can be specified alone or with other COLLECT
parameters as in this statement:
COLLECT BLKN DISKIO CFR
After this additional CFR (Critical File Resource) data is collected, SirTune is able to produce several additional reports to help isolate the cause of critical file resource enqueuing. The first report that might be useful is the CFRROOT report. This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide information about which lines of code cause critical file resource waits, so it is not helpful for application tuning.
The CFRROOT report might indicate that
application tuning (rather than system tuning) might be required to reduce
critical file resource enqueuing.
This would be indicated by a primary root cause
of DISK
(disk I/O waits) or maybe JRNLO
(journal I/O waits).
Reducing wait times
You can attack a primary root cause for critical file resource waits by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources.
To facilitate this latter option, several CFR states can be requested on SirTune reports if CFR data had been collected by SirTune. These states are:
CFRHANY | The state where a user holds any critical file resource. |
---|---|
CFRHDIR | The state where a user holds the DIRECT critical file resource. |
CFRHIND | The state where a user holds the INDEX critical file resource. |
CFRHEXS | The state where a user holds the EXISTS critical file resource. |
CFRHREC | The state where a user holds the RECENQ critical file resource. |
CFRBANY | The state where a user holds any critical file resource and is preventing (blocking) another user from obtaining a critical file resource. |
CFRBDIR | The state where a user holds the DIRECT critical file resource and is preventing (blocking) another user from obtaining the DIRECT resource. |
CFRBIND | The state where a user holds the INDEX critical file resource and is preventing (blocking) another user from obtaining the INDEX resource. |
CFRBEXS | The state where a user holds the EXISTS critical file resource and is preventing (blocking) another user from obtaining the EXISTS resource. |
CFRBREC | The state where a user holds the RECENQ critical file resource and is preventing (blocking) an other user from obtaining the RECENQ resource. |
It should be noted that the CFRBxxx states are weighted based on the number of other users holding the resource and the number of users waiting for the resource. For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state.
On the other hand, if a user at a line of code holds the DIRECT resource (in share mode) along with 4 other users, and a single user is waiting for the DIRECT resource, the line of code is considered to have 1/5th of an observation in the CFRBDIR state.
Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical file resource by lines of SOUL code:
REPORT STATE CFRBANY CHUNK 4
This is probably the most useful of the STATE CFRxxxx reports. Once critical file resource blocking is isolated to specific SOUL instructions, critical file resource enqueuing can be reduced by:
- Reducing the number of times the offending instructions are executed.
- Reducing the amount of disk I/O performed by the offending instructions.
- Reducing the amount of CPU used by the offending instructions.
It might be tempting to use the Find Without Locks SOUL statement to reduce the critical file resource enqueuing associated with a statement. This will only work if the resource causing conflicts is the RECENQ resource. All other critical file resources are processed exactly the same way, whether or not a locked record set is being used.
However, if the resource causing the conflict is indeed the RECENQ resource, it is still not recommended that the solution be Find Without Locks. A high conflict rate on the RECENQ resource indicates that the environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best. A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs.
The CFRHxxx reports can be useful for tracking potential critical file resource enqueuing problems (perhaps in a test environment) before they actually happen. These states include any user that holds a critical file resource, whether or not it is blocking anyone. These reports are difficult to interpret, however, since they require a fairly good estimate of expected future usage patterns to have any predictive value.
See also
- SirTune introduction
- SirTune data collection under MVS
- SirTune data collection under CMS
- SirTune data collection statements
- SirTune MODIFY and SMSG commands
- SirTune report generation
- SirTune reports
- SirTune user states
- SirTune and Model 204 quad types
- SirTune statement wildcards
- SirTune date processing