SirTune user states: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (misc cleanup)
m (misc cleanup)
Line 3: Line 3:
You've been warned.  ..  (Page built by JAL at the SIRIUS VM; file: FUNPGNEW SYSUT2) -->
You've been warned.  ..  (Page built by JAL at the SIRIUS VM; file: FUNPGNEW SYSUT2) -->
<!-- Page name: SirTune user states-->
<!-- Page name: SirTune user states-->
<p></p>
==Model 204 user states==
When the <var class="product">SirTune</var> sampling program is collecting a sample it scans all
When the <var class="product">SirTune</var> sampling program is collecting a sample it scans all logged on users.
logged on users.
Each user is classified by its <b>state</b>.
Each user is classified by its <b>state</b>.
The user's state is a general indication of the type of activity
The user's state is a general indication of the type of activity
occurring in a user thread.
occurring in a user thread.
These states roughly correspond to the states reported
These states roughly correspond to the states reported
by the <var class="product">Model 204</var> performance monitor, though broken down to a finer level
by the <var class="product">Model 204</var> performance monitor, though broken down to a finer level of detail.
of detail.
 
<p></p>
===Primary states===
<p></p>
The following primary states are distinguished by <var class="product">SirTune</var>:
The following primary states are distinguished by <var class="product">SirTune</var>:
<table class="thJustBold">
<table class="thJustBold">
Line 57: Line 55:
the <var>[[Record level locking and concurrency control#pause|Pause]]</var> statement, are not considered user input waits.
the <var>[[Record level locking and concurrency control#pause|Pause]]</var> statement, are not considered user input waits.


===Composite states===
In addition to the above primary states, several composite states
In addition to the above primary states, several composite states
are provided for convenience and report generation.
are provided for convenience and report generation.
For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU and SWPGOW.
For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW.
Thus any user in any of the indicated primary states is also considered
Thus any user in any of the indicated primary states is also considered
to be in state SWPG.
to be in state SWPG.
Line 99: Line 98:
<tr><th>REDYR</th>
<tr><th>REDYR</th>
<td>This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server.
<td>This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server.
Users in state REDYR can either running or waiting for the <var class="product">Model&nbsp;204</var> scheduler to provide CPU to run.</td></tr>
Users in state REDYR can be either running or waiting for the <var class="product">Model&nbsp;204</var> scheduler to provide CPU to run.</td></tr>


<tr><th>RUNBL</th>
<tr><th>RUNBL</th>
<td>This state is made up of RUNG, REDY, WTSV, and SWPGOW.
<td>This state is made up of RUNG, REDY, WTSV, and SWPGOW.
It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can either running or waiting for the <var class="product">Model&nbsp;204</var> scheduler to provide the resources (CPU and/or server) to run.</td></tr>
It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the <var class="product">Model&nbsp;204</var> scheduler to provide the resources (CPU and/or server) to run.</td></tr>


<tr><th>SWPG</th><td>This state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server.</td></tr>
<tr><th>SWPG</th><td>This state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server.</td></tr>
Line 114: Line 113:
</table>
</table>


===Specifying states in COLLECT and REPORT STATE statements===
Any of the above primary or composite states can be included on
Any of the above primary or composite states can be included on
[[SirTune configuration statements#colstat|COLLECT]] statements for input to <var class="product">SirTune</var> and [[SirTune reports#STATE reports|REPORT STATE]] statements for input to SIRTUNER.
[[SirTune configuration statements#colstat|COLLECT]] statements for input to SIRTUNEI and on [[SirTune reports#STATE reports|REPORT STATE]] statements for input to SIRTUNEREPORT or SIRTUNER.
Some valid COLLECT statements are:
Some valid COLLECT statements are:
<p class="code">COLLECT BLKN SWPG
<p class="code">COLLECT BLKN SWPG
Line 130: Line 130:
In addition to user states, <var class="product">SirTune</var>'s COLLECT statement lets you request information about DISKIO and CFR.
In addition to user states, <var class="product">SirTune</var>'s COLLECT statement lets you request information about DISKIO and CFR.
The following is a valid COLLECT statement:
The following is a valid COLLECT statement:
<p class="code"><nowiki>COLLECT DISKIO CFR
<p class="code">COLLECT DISKIO CFR
</nowiki></p>
</p>
But there is no REPORT STATE statement which allows DISKIO nor CFR.
But there is no REPORT STATE statement that allows DISKIO nor CFR.


Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for <var class="product">SirTune</var>.
Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for <var class="product">SirTune</var>.
The simplest way to ensure this is by explicitly specifying any state
The simplest way to ensure this is by explicitly specifying any state
to be used in a REPORT STATE statement or a COLLECT statement.
to be used in a REPORT STATE statement or a COLLECT statement.
For example, if you intend to produce the following reports with SIRTUNER:
For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER:
<p class="code">REPORT STATE BLKN CHUNK 10
<p class="code">REPORT STATE BLKN CHUNK 10
REPORT STATE SWPG CHUNK 10
REPORT STATE SWPG CHUNK 10
</p>
</p>
you can code the following COLLECT statement for <var class="product">SirTune</var>:
You can code the following COLLECT statement for <var class="product">SirTune</var>:
<p class="code">COLLECT BLKN SWPG
<p class="code">COLLECT BLKN SWPG
</p>
</p>
Line 163: Line 163:


<div id="rungms"></div>
<div id="rungms"></div>
==The RUNGM and RUNGS states==
===Specifying the RUNGM and RUNGS states===
<!--Caution: <div> above-->
<!--Caution: <div> above-->


When running the [[MP/204]] feature with <var class="product">Model&nbsp;204</var>, a user that is
When running the [[MP/204]] feature with <var class="product">Model&nbsp;204</var>, a user that is in state RUNG can be further distinguished to be either running
in state RUNG can be further distinguished to be either running
in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting.
in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting.
For example, these SIRTUNER statements generate two reports:
For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports:
<p class="code">REPORT STATE RUNGM EVAL
<p class="code">REPORT STATE RUNGM EVAL
REPORT STATE RUNGS EVAL
REPORT STATE RUNGS EVAL
Line 187: Line 186:
the <code>REPORT STATE RUNGM</code> report to try to reduce the amount of maintask (serial) SOUL code.
the <code>REPORT STATE RUNGM</code> report to try to reduce the amount of maintask (serial) SOUL code.


==Wait types==
==Specifying reports by wait type==
Users in state BLK (blocked on anything), always have a wait type
Users in state BLK (blocked on anything), always have a wait type
associated with them.
associated with them.
These wait types are the same wait types that appear
These wait types are the same wait types that appear
next to the users in a <var class="product">Model&nbsp;204</var> MONITOR command or in the SIRMON WAITTYP statistic.
next to the users in a <var class="product">Model&nbsp;204</var> <var>[[ONLINE monitoring#Wait type values|MONITOR]]</var> command or in the SirMon [[User statistics displayed in SirMon|WAITTYP statistic]].
STATE reports can be requested by these wait types.
STATE reports can be requested by these wait types.
To produce these STATE reports by wait type, COLLECT statements (collecting data for all states in which a wait type might occur) must be added to <var class="product">SirTune</var>'s input stream (SIRTUNEI).
To produce these STATE reports by wait type, [[SirTune_configuration_statements#colstat|COLLECT statements]] (collecting data for all states in which a wait type might occur) must be added to <var class="product">SirTune</var>'s input stream (<code>SIRTUNEI</code>).


For example, disk I/O wait types are not swappable, so it is only necessary to collect
For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a <code>REPORT STATE WDISK</code> report.
state BLKIN to produce a REPORT STATE WDISK report.
Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a <code>REPORT STATE WCFREX</code> report.
Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all
be collected to produce a REPORT STATE WCFREX report.


The available wait type reports along with the corresponding <var class="product">Model 204</var> wait type number, a description of the wait type, and the required states to be collected
The available wait type reports along with the corresponding <var class="product">Model&nbsp;204</var> wait type number, a description of the wait type, and the required states to be collected
are listed here:
are listed here:
<table class="thJustBold">
<table class="thJustBold">
<caption>Wait type reports</caption>
<tr><th>WMISC</th><td>0 - Miscellaneous waits.  Requires BLKN.</td></tr>
<tr><th>WMISC</th><td>0 - Miscellaneous waits.  Requires BLKN.</td></tr>
<tr><th>WDISK</th><td>1 - Wait for disk I/O.  Requires BLKIN.</td></tr>
<tr><th>WDISK</th><td>1 - Wait for disk I/O.  Requires BLKIN.</td></tr>
Line 214: Line 212:
<tr><th>WPST</th><td>10 - Wait on PST.  Requires BLKN.</td></tr>
<tr><th>WPST</th><td>10 - Wait on PST.  Requires BLKN.</td></tr>
<tr><th>WIFAM</th><td>11 - IFAM waits.  Requires BLKN.</td></tr>
<tr><th>WIFAM</th><td>11 - IFAM waits.  Requires BLKN.</td></tr>
<tr><th>WSLEEP</th><td>12 - Waits for a time interval, including PAUSE and SLEEP statements.  Requires BLKN.</td></tr>
<tr><th>WSLEEP</th><td>12 - Waits for a time interval, including <var>[[Record level locking and concurrency control#pause|Pause]]</var> statements and <var>[[*SLEEP command|*SLEEP]]</var> commands.  Requires BLKN.</td></tr>
<tr><th>WJRNLO</th><td>15 - Wait for journal output.  Requires BLKIN.</td></tr>
<tr><th>WJRNLO</th><td>15 - Wait for journal output.  Requires BLKIN.</td></tr>
<tr><th>WCHKPO</th><td>16 - Wait for checkpoint output.  Requires BLKIN.</td></tr>
<tr><th>WCHKPO</th><td>16 - Wait for checkpoint output.  Requires BLKIN.</td></tr>
Line 230: Line 228:
<tr><th>WCONVO</th><td>28 - Wait for inter-process output.  Requires BLKN.</td></tr>
<tr><th>WCONVO</th><td>28 - Wait for inter-process output.  Requires BLKN.</td></tr>
<tr><th>WSCTYI</th><td>29 - Wait for security interface.  Requires BLKN.</td></tr>
<tr><th>WSCTYI</th><td>29 - Wait for security interface.  Requires BLKN.</td></tr>
<tr><th>WS$WAI</th><td>30 - Swappable $WAIT call.  Requires BLKN.</td></tr>
<tr><th>WS$WAI</th><td>30 - Swappable <var>[[$Wait]]</var> call.  Requires BLKN.</td></tr>
<tr><th>WN$WAI</th><td>31 - Non-swappable $WAIT call.  Requires BLKIN.</td></tr>
<tr><th>WN$WAI</th><td>31 - Non-swappable <var>$Wait</var> call.  Requires BLKIN.</td></tr>
<tr><th>WULDB2</th><td>32 - Wait for DB2 subtask.  Requires BLKN.</td></tr>
 
<tr><th>WULDB2</th>
<td>32 - Wait for DB2 subtask.  Requires BLKN.</td></tr>
</table>
</table>


Line 251: Line 251:
These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:
These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:
<table class="thJustBold">
<table class="thJustBold">
<tr><th>WCFR</th><td>This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.</td></tr>
<tr><th>WCFR</th>
<td>This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.</td></tr>


<tr><th>WLOG</th>
<tr><th>WLOG</th>
<td>This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for <var class="product">Model 204</var> recovery, that is, all checkpoint and journal I/O related waits.</td></tr>
<td>This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for <var class="product">Model&nbsp;204</var> recovery, that is, all checkpoint and journal I/O related waits.</td></tr>
</table>
</table>


Line 265: Line 266:
==Critical file resource states==
==Critical file resource states==
<p>
<p>
Critical file resources are used by <var class="product">Model 204</var> to provide multi-user
Critical file resources are used by <var class="product">Model&nbsp;204</var> to provide multi-user
concurrency control on a file level. This control mechanism will sometimes
concurrency control on a file level. This control mechanism will sometimes
exacerbate some other performance bottleneck.
exacerbate some other performance bottleneck.
A high value for number of users per sample with wait types CFREX and CFRSH in the
A high value for number of users per sample with wait types CFREX and CFRSH in the [[SirTune reports#SUMMARY reports|SUMMARY report]] suggests that critical file resource enqueuing bears closer examination.</p>
SUMMARY report suggests that critical file resource enqueuing bears closer examination.</p>


There are four different critical file resources:
There are four different critical file resources:
Line 275: Line 275:
<tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr>
<tr><th>DIRECT</th><td>Protects table B updates and accesses.</td></tr>
<tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr>
<tr><th>INDEX</th><td>Protects accesses and updates of table C and the ordered index.</td></tr>
<tr><th>EXISTS</th><td>Protects accesses and updates of the existence bit map.</td></tr>
<tr><th>EXISTS</th><td>Protects accesses and updates of the existence bit map.</td></tr>
<tr><th>RECENQ</th><td>Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the FIND WITHOUT LOCKS User Language statement.</td></tr>
 
<tr><th>RECENQ</th><td>Protects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the <var>[[Find Without Locks statement|Find Without Locks]]</var> SOUL statement.</td></tr>
</table>
</table>


A first step to investigating a critical file resource enqueueing problem
A first step to investigating a critical file resource enqueueing problem
is to produce reports for STATE WCFR.
is to produce reports for the WCFR state.
This will help isolate the programs or lines of code that encounter frequent or long critical file
This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits.
resource waits.
Probably the most useful report would be produced by this statement:
Probably the most useful report would be produced by this statement:
<p class="code"><nowiki>REPORT STATE WCFR CHUNK 4
<p class="code">REPORT STATE WCFR CHUNK 4
</nowiki></p>
</p>


This will break down critical file resource waits by individual lines of
This will break down critical file resource waits by individual lines of
User Language code. Unfortunately, the problem with this type of analysis
SOUL code. Unfortunately, the problem with this type of analysis
is that it focuses on the "victims" of critical file resource waits rather
is that it focuses on the "victims" of critical file resource waits rather
than the "culprits," the lines of code holding critical file resources causing other users to wait.
than the "culprits," the lines of code holding critical file resources causing other users to wait.
While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the STATE WCFR report that this is indeed the case.
While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case.


To determine the actual cause of critical file resource enqueuing, more data
To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the <var class="product">SirTune</var> data collector.
needs to be collected by the <var class="product">SirTune</var> data collector.
To have this additional data collected, simply specify the parameter <code>CFR</code> on a <code>COLLECT</code> statement for
To have this additional data collected, simply specify the parameter CFR on a COLLECT statement for
<var class="product">SirTune</var>. This parameter can be specified alone or with other <code>COLLECT</code> parameters as in this statement:
<var class="product">SirTune</var>. This parameter can be specified alone or with other COLLECT parameters as in this statement:
<p class="code">COLLECT BLKN DISKIO CFR
<p class="code"><nowiki>COLLECT BLKN DISKIO CFR
</p>
</nowiki></p>


After this additional CFR (Critical File Resource) data is collected,
After this additional CFR (Critical File Resource) data is collected,
SIRTUNER is able to produce several additional reports to help isolate the
SirTune is able to produce several additional reports to help isolate the
cause of critical file resource enqueuing.
cause of critical file resource enqueuing.
The first report that might be useful is the CFRROOT report.
The first report that might be useful is the CFRROOT report.
This report indicates the base wait types that are behind critical file resource waits.
This report indicates the base wait types that are behind critical file resource waits.
The CFRROOT report does not provide
The [[SirTune reports#CFRROOT reports|CFRROOT]] report does not provide
information on which lines of code cause critical file resource waits, so it
information on which lines of code cause critical file resource waits, so it is not helpful for application tuning.
is not helpful for application tuning.


The CFRROOT report might indicate that
The <code>CFRROOT</code> report might indicate that
application tuning (rather than system tuning) might be required to reduce
application tuning (rather than system tuning) might be required to reduce
critical file resource enqueuing.
critical file resource enqueuing.
Line 316: Line 315:


You can attack a primary root cause for critical file resource waits
You can attack a primary root cause for critical file resource waits
either by trying to reduce overall disk I/O's or journal I/O's (with application
either by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources.
tuning), or by specifically targeting those instructions that hold critical file
resources.


To facilitate this latter option, several CFR states can be requested
To facilitate this latter option, several CFR states can be requested
on SIRTUNER reports if CFR data had been collected by <var class="product">SirTune</var>.
on SirTune reports if CFR data had been collected by <var class="product">SirTune</var>.
These states are:
These states are:
<table class="thJustBold">
<table class="thJustBold">
Line 355: Line 352:
</table>
</table>


It should be noted that the CFRB??? states are weighted based on the number of
It should be noted that the CFRB<i>xxx</i> states are weighted based on the number of other users holding the resource and the number of users waiting for the resource.
other users holding the resource and the number of users waiting for the resource.
For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state.
For example, if a user at a line of code holds the DIRECT resource and 3 other
users are waiting for the resource, that line of code is considered to have 3
observations in the CFRBDIR state.


On the other hand, if a user at a line of code
On the other hand, if a user at a line of code
Line 366: Line 360:
considered to have 1/5th of an observation in the CFRBDIR state.
considered to have 1/5th of an observation in the CFRBDIR state.


Generally, the most useful reports for reducing critical file resource waits
Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical
are the CFRB reports. This statement:
file resource by lines of SOUL code:
<p class="code"><nowiki>REPORT STATE CFRBANY CHUNK 4
<p class="code"><nowiki>REPORT STATE CFRBANY CHUNK 4
</nowiki></p>
</nowiki></p>
will break down the state where a user is blocking another user from any critical
 
file resource by lines of User Language code.
This is probably the most useful of the STATE CFR<i>xxxx</i> reports.
This is probably the most useful of the STATE CFR???? reports.
Once critical file resource blocking is isolated
Once critical file resource blocking is isolated
to specific User Language instructions, critical file resource enqueuing can be
to specific SOUL instructions, critical file resource enqueuing can be
reduced by:
reduced by:
<ul>
<ul>
<li>Reducing the number of times the offending instructions are executed.</li>
<li>Reducing the number of times the offending instructions are executed.</li>
<li>Reducing the amount of disk I/O performed by the offending instructions.</li>
<li>Reducing the amount of disk I/O performed by the offending instructions.</li>
<li>Reducing the amount of CPU used by the offending instructions.</li>
<li>Reducing the amount of CPU used by the offending instructions.</li>
</ul>
</ul>


It might be tempting to use the FIND WITHOUT LOCKS User Language statement to
It might be tempting to use the <var>Find Without Locks</var> SOUL statement to
reduce the critical file resource enqueuing associated with a statement.
reduce the critical file resource enqueuing associated with a statement.
This will only work if the resource causing conflicts is the RECENQ resource.
This will only work if the resource causing conflicts is the RECENQ resource.
Line 390: Line 385:
However, if the resource causing the conflict is indeed the RECENQ
However, if the resource causing the conflict is indeed the RECENQ
resource, it is still <i>not</i> recommended that the solution be
resource, it is still <i>not</i> recommended that the solution be
FIND WITHOUT LOCKS.
<var>Find Without Locks</var>.
A high conflict rate on the RECENQ resource indicates that the
A high conflict rate on the RECENQ resource indicates that the
environment has a high update activity level, which means that operating on
environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best.
unenqueued found sets is a questionable tactic at best.
A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs.
A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing
found sets before any terminal I/O occurs.


The CFRH??? reports can be useful for tracking potential critical file resource
The CFRH<i>xxx</i> reports can be useful for tracking potential critical file resource
enqueuing problems (perhaps in a test environment) before they actually happen.
enqueuing problems (perhaps in a test environment) before they actually happen.
These states include any user that holds a critical file resource, whether
These states include any user that holds a critical file resource, whether
or not it is blocking anyone.
or not it is blocking anyone.
These reports are difficult to interpret, however,
These reports are difficult to interpret, however,
since they require a fairly good estimate of expected future usage patterns
since they require a fairly good estimate of expected future usage patterns to have any predictive value.
to have any predictive value.


==See also==
==See also==

Revision as of 23:24, 9 November 2015

Model 204 user states

When the SirTune sampling program is collecting a sample it scans all logged on users. Each user is classified by its state. The user's state is a general indication of the type of activity occurring in a user thread. These states roughly correspond to the states reported by the Model 204 performance monitor, though broken down to a finer level of detail.

Primary states

The following primary states are distinguished by SirTune:

BLKIN This includes any user that is blocked, that is waiting for something, in a server and not waiting for user input. This is distinguished from BLKIU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKIU This includes any user that is blocked, that is waiting for something, in a server and waiting for user input. This is distinguished from BLKIN because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKON This includes any user that is blocked, that is waiting for something, not in a server and not waiting for user input. This is distinguished from BLKOU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKOU This includes any user that is blocked, that is waiting for something, not in a server and waiting for user input. This is distinguished from BLKON because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
REDYThis includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running.
RUNGThis includes any user that is running, that is, using CPU. Unless MP/204 is installed, there can never be more than one user in state RUNG per sample.
RUNGMIf MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See The RUNGM and RUNGS states.
RUNGSIf MP/204 is installed, this includes any user that is running, that is, using CPU, in subtask mode. See The RUNGM and RUNGS states.
SWPGIThis includes any user that is in the process of being swapped into a server.
SWPGOBN This includes any user that is in the process of being swapped out of a server because it is waiting on something other than user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKON.
SWPGOBU This includes any user that is in the process of being swapped out of a server because it is waiting on user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKOU.
SWPGOW This includes any user that is in the process of being swapped out of a server because it is has been server sliced. If no servers of appropriate size are available at the point the user is swapped out, the user switches to state WTSV.
WPSTThis includes any PST that is not running.
WTSV This includes any user that is waiting for a server to become available so that the user could be run. The only reason a user would be in the WTSV state is that all servers of appropriate size are occupied by other users that cannot be swapped out of server.

In this list, the phrase "waiting for user input" refers to a thread waiting for terminal or line input. In addition, a wait for a response to the console message issued by User 0 on a HALT command is also considered a user input wait. "Sleep" waits, that is, waits resulting from the *SLEEP command and the Pause statement, are not considered user input waits.

Composite states

In addition to the above primary states, several composite states are provided for convenience and report generation. For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. Thus any user in any of the indicated primary states is also considered to be in state SWPG. The following are the available composite states, their component primary states, and an explanation that suggests the meaning of the composite state.

ALL This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL.
ALLI This state is made up of RUNG, REDY, BLKIN, and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs.
ALLN This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN, and SWPGOW. It includes any user not blocked for user input. It does not include non-running PSTs.
BLK This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN, and SWPGOBU. It includes any user that is blocked on anything.
BLKI This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything.
BLKN This state is made up of BLKIN, BLKON, and SWPGOBN. It includes any user that is blocked for something other than user input.
BLKO This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something.
BLKU This state is made up of BLKIU, BLKOU, and SWPGOBU. It includes any user that is waiting for user input.
OSERVN This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input.
OSERVU This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input.
OSERVW This state is made up of SWPGOW and WTSV. It includes any user that is either waiting for a server or being swapped out of a server so that it can wait for a server to free up. This latter case only happens when a user is server sliced.
REDYR This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. Users in state REDYR can be either running or waiting for the Model 204 scheduler to provide CPU to run.
RUNBL This state is made up of RUNG, REDY, WTSV, and SWPGOW. It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the Model 204 scheduler to provide the resources (CPU and/or server) to run.
SWPGThis state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server.
SWPGO This state is made up of SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped out of a server.
SWPGOB This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something.

Specifying states in COLLECT and REPORT STATE statements

Any of the above primary or composite states can be included on COLLECT statements for input to SIRTUNEI and on REPORT STATE statements for input to SIRTUNEREPORT or SIRTUNER. Some valid COLLECT statements are:

COLLECT BLKN SWPG COLLECT ALLN COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI

Some valid REPORT STATE statements are

REPORT STATE BLKIN EVAL REPORT STATE SWPG CHUNK 100 REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4

In addition to user states, SirTune's COLLECT statement lets you request information about DISKIO and CFR. The following is a valid COLLECT statement:

COLLECT DISKIO CFR

But there is no REPORT STATE statement that allows DISKIO nor CFR.

Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for SirTune. The simplest way to ensure this is by explicitly specifying any state to be used in a REPORT STATE statement or a COLLECT statement. For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER:

REPORT STATE BLKN CHUNK 10 REPORT STATE SWPG CHUNK 10

You can code the following COLLECT statement for SirTune:

COLLECT BLKN SWPG

This statement is functionally equivalent to

COLLECT BLKIN BLKON SWPGOBN SWPGOBU SWPGOW SWPGI

In general, if running a relatively small Online (an average of less than 20 logged on users), this statement should not produce a prohibitively large amount of data and makes all reports possible:

COLLECT ALL

If running a midsize to large Online (an average 20+ logged on users), the following statement should collect a sufficient quantity of data to produce most interesting STATE reports without generating a prohibitively large sample data set:

COLLECT ALLN BLKIU SWPGOBU

Specifying the RUNGM and RUNGS states

When running the MP/204 feature with Model 204, a user that is in state RUNG can be further distinguished to be either running in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports:

REPORT STATE RUNGM EVAL REPORT STATE RUNGS EVAL

The first report is a breakdown of users running in maintask mode by evaluating procedure, and the second is a breakdown of users running in subtask mode by evaluating procedure. Maintask mode is often referred to as "serial" mode, and subtask mode is often referred to as "parallel" mode.

The total observations for state RUNG in any sample is always equal to the total observations for state RUNGM plus the total observations for state RUNGS.

The distinction between maintask and subtask mode can be made either on the basis of the task on which a user is running (maintask or subtask), or on its virtual (or logical) MP mode (that is, whether it is capable of running in a subtask or not). The default distinction is made on the basis of the actual task on which a user is running. This can be changed with the SIRTUNER MPVIRT statement. This is generally the preferred setting when using the REPORT STATE RUNGM report to try to reduce the amount of maintask (serial) SOUL code.

Specifying reports by wait type

Users in state BLK (blocked on anything), always have a wait type associated with them. These wait types are the same wait types that appear next to the users in a Model 204 MONITOR command or in the SirMon WAITTYP statistic. STATE reports can be requested by these wait types. To produce these STATE reports by wait type, COLLECT statements (collecting data for all states in which a wait type might occur) must be added to SirTune's input stream (SIRTUNEI).

For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a REPORT STATE WDISK report. Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a REPORT STATE WCFREX report.

The available wait type reports along with the corresponding Model 204 wait type number, a description of the wait type, and the required states to be collected are listed here:

Wait type reports
WMISC0 - Miscellaneous waits. Requires BLKN.
WDISK1 - Wait for disk I/O. Requires BLKIN.
WUSERO2 - Wait for user output. Requires BLKU.
WUSERI3 - Wait for user input. Requires BLKU.
WOPERI4 - Wait for operator input. Requires BLKU.
WDUMPO5 - Wait for dump write. Requires BLKIN.
WDUMPI6 - Wait for restore read. Requires BLKIN.
WENQUE7 - Wait for miscellaneous enqueue. Requires BLKN.
WBUFF8 - Wait for disk buffer. Requires BLKIN.
WPST10 - Wait on PST. Requires BLKN.
WIFAM11 - IFAM waits. Requires BLKN.
WSLEEP12 - Waits for a time interval, including Pause statements and *SLEEP commands. Requires BLKN.
WJRNLO15 - Wait for journal output. Requires BLKIN.
WCHKPO16 - Wait for checkpoint output. Requires BLKIN.
WWRITE17 - Wait for a checkpoint DECB. Requires BLKIN.
WARBMO18 - Waits for output arbitration. Requires BLKN.
WCHKPR19 - Waits for a checkpoint request. Requires WPST.
WDISK20 - Waits for checkpoint completion. Requires BLKIN.
WDEAD21 - Wait forever (dead thread). Requires BLKU.
WVSAMI22 - Wait for VSAM input. Requires BLKN.
WLOGIN23 - Wait after login failure. Requires BLKN.
WCFREX24 - Wait for critical file resource in exclusive mode. Requires BLKN.
WCFRSH25 - Wait for critical file resource in share mode. Requires BLKN.
WVTBUF26 - Wait for VTAM buffer. Requires BLKN.
WCONVI27 - Wait for inter-process input. Requires BLKN.
WCONVO28 - Wait for inter-process output. Requires BLKN.
WSCTYI29 - Wait for security interface. Requires BLKN.
WS$WAI30 - Swappable $Wait call. Requires BLKN.
WN$WAI31 - Non-swappable $Wait call. Requires BLKIN.
WULDB2 32 - Wait for DB2 subtask. Requires BLKN.

Thus to produce a breakdown of disk I/O waits by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:

REPORT STATE WDISK EVAL CHUNK 4

To get a breakdown of waits for miscellaneous enqueues (including record locks) by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:

REPORT STATE WENQUE EVAL CHUNK 4

In addition to these primary wait types, there are a few composite wait types for which reports can be generated. These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:

WCFR This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.
WLOG This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for Model 204 recovery, that is, all checkpoint and journal I/O related waits.

To get a breakdown of waits for critical file resources by evaluating procedure and by individual lines with the procedures, code the following in SIRTUNEI:

REPORT STATE WCFR EVAL CHUNK 4

Critical file resource states

Critical file resources are used by Model 204 to provide multi-user concurrency control on a file level. This control mechanism will sometimes exacerbate some other performance bottleneck. A high value for number of users per sample with wait types CFREX and CFRSH in the SUMMARY report suggests that critical file resource enqueuing bears closer examination.

There are four different critical file resources:

DIRECTProtects table B updates and accesses.
INDEXProtects accesses and updates of table C and the ordered index.
EXISTSProtects accesses and updates of the existence bit map.
RECENQProtects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the Find Without Locks SOUL statement.

A first step to investigating a critical file resource enqueueing problem is to produce reports for the WCFR state. This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits. Probably the most useful report would be produced by this statement:

REPORT STATE WCFR CHUNK 4

This will break down critical file resource waits by individual lines of SOUL code. Unfortunately, the problem with this type of analysis is that it focuses on the "victims" of critical file resource waits rather than the "culprits," the lines of code holding critical file resources causing other users to wait. While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case.

To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the SirTune data collector. To have this additional data collected, simply specify the parameter CFR on a COLLECT statement for SirTune. This parameter can be specified alone or with other COLLECT parameters as in this statement:

COLLECT BLKN DISKIO CFR

After this additional CFR (Critical File Resource) data is collected, SirTune is able to produce several additional reports to help isolate the cause of critical file resource enqueuing. The first report that might be useful is the CFRROOT report. This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide information on which lines of code cause critical file resource waits, so it is not helpful for application tuning.

The CFRROOT report might indicate that application tuning (rather than system tuning) might be required to reduce critical file resource enqueuing. This would be indicated by a primary root cause of DISK (disk I/O waits) or maybe JRNLO (journal I/O waits).

You can attack a primary root cause for critical file resource waits either by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources.

To facilitate this latter option, several CFR states can be requested on SirTune reports if CFR data had been collected by SirTune. These states are:

CFRHANY The state where a user holds any critical file resource.
CFRHDIR The state where a user holds the DIRECT critical file resource.
CFRHIND The state where a user holds the INDEX critical file resource.
CFRHEXS The state where a user holds the EXISTS critical file resource.
CFRHREC The state where a user holds the RECENQ critical file resource.
CFRBANY The state where a user holds any critical file resource and is preventing (blocking) another user from obtaining a critical file resource.
CFRBDIR The state where a user holds the DIRECT critical file resource and is preventing (blocking) another user from obtaining the DIRECT resource.
CFRBIND The state where a user holds the INDEX critical file resource and is preventing (blocking) another user from obtaining the INDEX resource.
CFRBEXS The state where a user holds the EXISTS critical file resource and is preventing (blocking) another user from obtaining the EXISTS resource.
CFRBREC The state where a user holds the RECENQ critical file resource and is preventing (blocking) an other user from obtaining the RECENQ resource.

It should be noted that the CFRBxxx states are weighted based on the number of other users holding the resource and the number of users waiting for the resource. For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state.

On the other hand, if a user at a line of code holds the DIRECT resource (in share mode) along with 4 other users, and a single user is waiting for the DIRECT resource, the line of code is considered to have 1/5th of an observation in the CFRBDIR state.

Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical file resource by lines of SOUL code:

REPORT STATE CFRBANY CHUNK 4

This is probably the most useful of the STATE CFRxxxx reports. Once critical file resource blocking is isolated to specific SOUL instructions, critical file resource enqueuing can be reduced by:

  • Reducing the number of times the offending instructions are executed.
  • Reducing the amount of disk I/O performed by the offending instructions.
  • Reducing the amount of CPU used by the offending instructions.

It might be tempting to use the Find Without Locks SOUL statement to reduce the critical file resource enqueuing associated with a statement. This will only work if the resource causing conflicts is the RECENQ resource. All other critical file resources are processed exactly the same way, whether or not a locked record set is being used.

However, if the resource causing the conflict is indeed the RECENQ resource, it is still not recommended that the solution be Find Without Locks. A high conflict rate on the RECENQ resource indicates that the environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best. A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs.

The CFRHxxx reports can be useful for tracking potential critical file resource enqueuing problems (perhaps in a test environment) before they actually happen. These states include any user that holds a critical file resource, whether or not it is blocking anyone. These reports are difficult to interpret, however, since they require a fairly good estimate of expected future usage patterns to have any predictive value.

See also