SirTune user states

From m204wiki
Jump to navigation Jump to search

Model 204 user states

When the SirTune sampling program is collecting a sample it scans all logged on users. Each user is classified by its state. The user's state is a general indication of the type of activity occurring in a user thread. These states roughly correspond to the states reported by the Model 204 performance monitor, though broken down to a finer level of detail.

Primary states

The following primary states are distinguished by SirTune:

BLKIN This includes any user that is blocked, that is waiting for something, in a server and not waiting for user input. This is distinguished from BLKIU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKIU This includes any user that is blocked, that is waiting for something, in a server and waiting for user input. This is distinguished from BLKIN because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKON This includes any user that is blocked, that is waiting for something, not in a server and not waiting for user input. This is distinguished from BLKOU because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
BLKOU This includes any user that is blocked, that is waiting for something, not in a server and waiting for user input. This is distinguished from BLKON because waits for things other than user input are generally viewed as a performance problem while waits for user input are not.
REDYThis includes any user that is ready to run, that is, in a server and not waiting on anything but not actually being run. Generally a user is in state REDY because another user is currently running.
RUNGThis includes any user that is running, that is, using CPU. Unless MP/204 is installed, there can never be more than one user in state RUNG per sample.
RUNGMIf MP/204 is installed, this includes any user that is running, that is, using CPU, in maintask mode. There can never be more than one user in state RUNGM per sample. See The RUNGM and RUNGS states.
RUNGSIf MP/204 is installed, this includes any user that is running, that is, using CPU, in subtask mode. See The RUNGM and RUNGS states.
SWPGIThis includes any user that is in the process of being swapped into a server.
SWPGOBN This includes any user that is in the process of being swapped out of a server because it is waiting on something other than user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKON.
SWPGOBU This includes any user that is in the process of being swapped out of a server because it is waiting on user input. If what the user was waiting on is still not completed at the point the user is swapped out, the user switches to state BLKOU.
SWPGOW This includes any user that is in the process of being swapped out of a server because it is has been server sliced. If no servers of appropriate size are available at the point the user is swapped out, the user switches to state WTSV.
WPSTThis includes any PST that is not running.
WTSV This includes any user that is waiting for a server to become available so that the user could be run. The only reason a user would be in the WTSV state is that all servers of appropriate size are occupied by other users that cannot be swapped out of server.

In this list, the phrase "waiting for user input" refers to a thread waiting for terminal or line input. In addition, a wait for a response to the console message issued by User 0 on a HALT command is also considered a user input wait. "Sleep" waits, that is, waits resulting from the *SLEEP command and the Pause statement, are not considered user input waits.

Composite states

In addition to the above primary states, several composite states are provided for convenience and report generation. For example, composite state SWPG is made up of primary states SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. Thus any user in any of the indicated primary states is also considered to be in state SWPG. The following are the available composite states, their component primary states, and an explanation that suggests the meaning of the composite state.

ALL This is a composite state that includes all primary states. Any logged on user or PST is considered in state ALL.
ALLI This state is made up of RUNG, REDY, BLKIN, and BLKIU. It includes any user currently in a server and not being swapped out. It does not include non-running PSTs.
ALLN This state is made up of RUNG, REDY, BLKIN, BLKON, WTSV, SWPGI, SWPGOBN, and SWPGOW. It includes any user not blocked for user input. It does not include non-running PSTs.
BLK This state is made up of BLKIN, BLKIU, BLKON, BLKOU, SWPGOBN, and SWPGOBU. It includes any user that is blocked on anything.
BLKI This state is made up of BLKIN and BLKIU. It includes any user that is in a server and blocked on anything.
BLKN This state is made up of BLKIN, BLKON, and SWPGOBN. It includes any user that is blocked for something other than user input.
BLKO This state is made up of BLKON and BLKOU. It includes any user that is not in a server but is blocked on something.
BLKU This state is made up of BLKIU, BLKOU, and SWPGOBU. It includes any user that is waiting for user input.
OSERVN This state is made up of SWPGOBN and BLKON. It includes any user that is either not in a server or being swapped out of a server because it is blocked on something other than user input.
OSERVU This state is made up of SWPGOBU and BLKOU. It includes any user that is either not in a server or being swapped out of a server because it is blocked on user input.
OSERVW This state is made up of SWPGOW and WTSV. It includes any user that is either waiting for a server or being swapped out of a server so that it can wait for a server to free up. This latter case only happens when a user is server sliced.
REDYR This state is made up of RUNG and REDY. It includes any user that is not blocked on anything and is in a server. Users in state REDYR can be either running or waiting for the Model 204 scheduler to provide CPU to run.
RUNBL This state is made up of RUNG, REDY, WTSV, and SWPGOW. It includes any user that is not blocked on anything, that is, is runnable. Users in state RUNBL can be either running or waiting for the Model 204 scheduler to provide the resources (CPU and/or server) to run.
SWPGThis state is made up of SWPGI, SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped into or out of a server.
SWPGO This state is made up of SWPGOBN, SWPGOBU, and SWPGOW. It includes any user that is being swapped out of a server.
SWPGOB This state is made up of SWPGOBN and SWPGOBU. It includes any user that is being swapped out of a server because it is blocked on something.

Specifying states in COLLECT and REPORT STATE statements

Any of the above primary or composite states can be included on COLLECT statements for input to SIRTUNEI and on REPORT STATE statements for input to SIRTUNEREPORT or SIRTUNER. Some valid COLLECT statements are:

COLLECT BLKN SWPG COLLECT ALLN COLLECT BLKIN BLKON SWPGOBN WTSV SWPGOW SWPGI

Some valid REPORT STATE statements are

REPORT STATE BLKIN EVAL REPORT STATE SWPG CHUNK 100 REPORT STATE ALLN EVAL CHUNK 1000 CHUNK 4

In addition to user states, SirTune's COLLECT statement lets you request information about DISKIO and CFR. The following is a valid COLLECT statement:

COLLECT DISKIO CFR

But there is no REPORT STATE statement that allows DISKIO nor CFR.

Any state requested in a REPORT STATE statement must have had the corresponding primary states explicitly or implicitly specified on COLLECT statements for SirTune. The simplest way to ensure this is by explicitly specifying any state to be used in a REPORT STATE statement or a COLLECT statement. For example, if you intend to produce the following reports with SIRTUNEREPORT or SIRTUNER:

REPORT STATE BLKN CHUNK 10 REPORT STATE SWPG CHUNK 10

You can code the following COLLECT statement for SirTune:

COLLECT BLKN SWPG

This statement is functionally equivalent to

COLLECT BLKIN BLKON SWPGOBN SWPGOBU SWPGOW SWPGI

In general, if running a relatively small Online (an average of less than 20 logged on users), this statement should not produce a prohibitively large amount of data and makes all reports possible:

COLLECT ALL

If running a midsize to large Online (an average 20+ logged on users), the following statement should collect a sufficient quantity of data to produce most interesting STATE reports without generating a prohibitively large sample data set:

COLLECT ALLN BLKIU SWPGOBU

Specifying the RUNGM and RUNGS states

When running the MP/204 feature with Model 204, a user that is in state RUNG can be further distinguished to be either running in maintask mode (RUNGM) or subtask mode (RUNGS) for the purposes of reporting. For example, these SIRTUNEREPORT or SIRTUNER statements generate two reports:

REPORT STATE RUNGM EVAL REPORT STATE RUNGS EVAL

The first report is a breakdown of users running in maintask mode by evaluating procedure, and the second is a breakdown of users running in subtask mode by evaluating procedure. Maintask mode is often referred to as "serial" mode, and subtask mode is often referred to as "parallel" mode.

The total observations for state RUNG in any sample is always equal to the total observations for state RUNGM plus the total observations for state RUNGS.

The distinction between maintask and subtask mode can be made either on the basis of the task on which a user is running (maintask or subtask), or on its virtual (or logical) MP mode (that is, whether it is capable of running in a subtask or not). The default distinction is made on the basis of the actual task on which a user is running. This can be changed with the SIRTUNER MPVIRT statement. This is generally the preferred setting when using the REPORT STATE RUNGM report to try to reduce the amount of maintask (serial) SOUL code.

Specifying reports by wait type

Users in state BLK (blocked on anything), always have a wait type associated with them. These wait types are the same wait types that appear next to the users in a Model 204 MONITOR command or in the SirMon WAITTYP statistic. STATE reports can be requested by these wait types. To produce these STATE reports by wait type, COLLECT statements (collecting data for all states in which a wait type might occur) must be added to SirTune's input stream (SIRTUNEI).

For example, disk I/O wait types are not swappable, so it is only necessary to collect state BLKIN to produce a REPORT STATE WDISK report. Since critical file resource waits are swappable, states BLKIN, BLKON, and SWPGOBN must all be collected to produce a REPORT STATE WCFREX report.

The available wait type reports along with the corresponding Model 204 wait type number, a description of the wait type, and the required states to be collected are listed here:

Wait type reports
WMISC0 - Miscellaneous waits. Requires BLKN.
WDISK1 - Wait for disk I/O. Requires BLKIN.
WUSERO2 - Wait for user output. Requires BLKU.
WUSERI3 - Wait for user input. Requires BLKU.
WOPERI4 - Wait for operator input. Requires BLKU.
WDUMPO5 - Wait for dump write. Requires BLKIN.
WDUMPI6 - Wait for restore read. Requires BLKIN.
WENQUE7 - Wait for miscellaneous enqueue. Requires BLKN.
WBUFF8 - Wait for disk buffer. Requires BLKIN.
WPST10 - Wait on PST. Requires BLKN.
WIFAM11 - IFAM waits. Requires BLKN.
WSLEEP12 - Waits for a time interval, including Pause statements and *SLEEP commands. Requires BLKN.
WJRNLO15 - Wait for journal output. Requires BLKIN.
WCHKPO16 - Wait for checkpoint output. Requires BLKIN.
WWRITE17 - Wait for a checkpoint DECB. Requires BLKIN.
WARBMO18 - Waits for output arbitration. Requires BLKN.
WCHKPR19 - Waits for a checkpoint request. Requires WPST.
WDISK20 - Waits for checkpoint completion. Requires BLKIN.
WDEAD21 - Wait forever (dead thread). Requires BLKU.
WVSAMI22 - Wait for VSAM input. Requires BLKN.
WLOGIN23 - Wait after login failure. Requires BLKN.
WCFREX24 - Wait for critical file resource in exclusive mode. Requires BLKN.
WCFRSH25 - Wait for critical file resource in share mode. Requires BLKN.
WVTBUF26 - Wait for VTAM buffer. Requires BLKN.
WCONVI27 - Wait for inter-process input. Requires BLKN.
WCONVO28 - Wait for inter-process output. Requires BLKN.
WSCTYI29 - Wait for security interface. Requires BLKN.
WS$WAI30 - Swappable $Wait call. Requires BLKN.
WN$WAI31 - Non-swappable $Wait call. Requires BLKIN.
WULDB232 - Wait for DB2 subtask. Requires BLKN.
WOCSUB33 - Waiting on Open/Close subtask. Requires BLKIN.
WDBUGU38 - Wait for user being debugged. Requires BLKN.
WDBUGD39 - Wait for user performing debugging. Requires BLKN.
WMQTSK40 - Wait for MQ subtask to become available. Requires BLKN.
WMQAPI41 - Wait for MQ subtask to run. Requires BLKIN.
WMQGWT42 - Wait for MQGET with wait time specified. Requires BLKN.
WECLD43 - Wait for ECF to load/delete a module. Requires BLKN.
WECMOD44 - Wait for external module to become free. Requires BLKN.
WECTSK45 - Wait for ECF subtask to become free. Requires BLKN.
WECRUN46 - Wait for external module to run. Requires BLKN.
W$WTQZ47 - User within $WAIT('CPQZ') wait; CHKPPST within extended quiesce. Requires BLKN.
W$WTXS48 - User within $WAIT('QZSIG') wait. Requires BLKN.
W$NDEQ49 - At end of extended quiesce, waiting for count of $WAIT('CPQZ') and $WAIT('QZSIG') users to go to zero. Requires BLKN.
WHSM50 - Wait FOR HSM recall of a migrated dataset. Requires BLKN.
WCDS51 - Wait for share mode constraints DB lock. Requires BLKIN.
WCDX52 - Wait for exclusive mode constraints DB lock. Requires BLKIN.
WSBBOL53 - Wait for SUB-TRANS CP processing to complete for this user. Requires BLKN.
WSBBFC54 - SUB-TRAN CP postponement - waiting on blocking file command to complete. Requires BLKIN.
WSBTMR55 - SUB-TRAN CP CPTS timer wait. Requires BLKIN.
WSBARY56 - SUB-TRAN CP scanner array wait. Requires BLKIN.
WDMNM57 - A daemon child waiting on its master. Requires BLK.
WDMND58 - A daemon master waiting on its daemon. Requires BLK.
WCUSTn80-89 - Customer reserved wait codes.
WFUNLD97 - Fast Unload request. Requires BLKN.
WMAXAU98 - MAXAUSER delay. Requires BLKN.
WSFQUI99 - SirFact quiesce wait. Requires BLKN.

Thus to produce a breakdown of disk I/O waits by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:

REPORT STATE WDISK EVAL CHUNK 4

To get a breakdown of waits for miscellaneous enqueues (including record locks) by evaluating procedure and by individual lines within the procedures, code the following in SIRTUNEI:

REPORT STATE WENQUE EVAL CHUNK 4

In addition to these primary wait types, there are a few composite wait types for which reports can be generated. These composite wait types, their component primary wait types, and a description of what the composite wait types measure are listed here:

WCFR This is made up of WCFREX and WCFRSH. It measures all waits on critical file resources whether for exclusive or share control.
WLOG This is made up of WJRNLO, WCHKPO, WWRITE, and WARBMO. It measures all waits on activities associated with logging for Model 204 recovery, that is, all checkpoint and journal I/O related waits.

To get a breakdown of waits for critical file resources by evaluating procedure and by individual lines with the procedures, code the following in SIRTUNEI:

REPORT STATE WCFR EVAL CHUNK 4

Critical file resource states

Critical file resources are used by Model 204 to provide multi-user concurrency control on a file level. This control mechanism will sometimes exacerbate some other performance bottleneck. A high value for number of users per sample with wait types CFREX and CFRSH in the SUMMARY report suggests that critical file resource enqueuing bears closer examination.

There are four critical file resources:

Critical file resources
DIRECTProtects table B updates and accesses.
INDEXProtects accesses and updates of table C and the ordered index.
EXISTSProtects accesses and updates of the existence bit map.
RECENQProtects accesses and updates of the record enqueuing table. This is the only critical file resource that can be eliminated by the use of the Find Without Locks SOUL statement.

Determining the cause of a wait

A first step to investigating a critical file resource enqueueing problem is to produce reports for the WCFR state. This will help isolate the programs or lines of code that encounter frequent or long critical file resource waits. Probably the most useful report would be produced by this statement:

REPORT STATE WCFR CHUNK 4

This will break down critical file resource waits by individual lines of SOUL code. Unfortunately, the problem with this type of analysis is that it focuses on the "victims" of critical file resource waits rather than the "culprits," the lines of code holding critical file resources causing other users to wait. While in some situations, the lines of code causing the critical file resource waits are the same lines that suffer from the waits, there is no way to be certain from the WCFR state report that this is indeed the case.

To determine the actual cause of critical file resource enqueuing, more data needs to be collected by the SirTune data collector. To have this additional data collected, simply specify the parameter CFR on a COLLECT statement for SirTune. This parameter can be specified alone or with other COLLECT parameters as in this statement:

COLLECT BLKN DISKIO CFR

After this additional CFR (Critical File Resource) data is collected, SirTune is able to produce several additional reports to help isolate the cause of critical file resource enqueuing. The first report that might be useful is the CFRROOT report. This report indicates the base wait types that are behind critical file resource waits. The CFRROOT report does not provide information about which lines of code cause critical file resource waits, so it is not helpful for application tuning.

The CFRROOT report might indicate that application tuning (rather than system tuning) might be required to reduce critical file resource enqueuing. This would be indicated by a primary root cause of DISK (disk I/O waits) or maybe JRNLO (journal I/O waits).

Reducing wait times

You can attack a primary root cause for critical file resource waits by trying to reduce overall disk I/O's or journal I/O's (with application tuning), or by specifically targeting those instructions that hold critical file resources.

To facilitate this latter option, several CFR states can be requested on SirTune reports if CFR data had been collected by SirTune. These states are:

CFR states
CFRHANY The state where a user holds any critical file resource.
CFRHDIR The state where a user holds the DIRECT critical file resource.
CFRHIND The state where a user holds the INDEX critical file resource.
CFRHEXS The state where a user holds the EXISTS critical file resource.
CFRHREC The state where a user holds the RECENQ critical file resource.
CFRBANY The state where a user holds any critical file resource and is preventing (blocking) another user from obtaining a critical file resource.
CFRBDIR The state where a user holds the DIRECT critical file resource and is preventing (blocking) another user from obtaining the DIRECT resource.
CFRBIND The state where a user holds the INDEX critical file resource and is preventing (blocking) another user from obtaining the INDEX resource.
CFRBEXS The state where a user holds the EXISTS critical file resource and is preventing (blocking) another user from obtaining the EXISTS resource.
CFRBREC The state where a user holds the RECENQ critical file resource and is preventing (blocking) an other user from obtaining the RECENQ resource.

It should be noted that the CFRBxxx states are weighted based on the number of other users holding the resource and the number of users waiting for the resource. For example, if a user at a line of code holds the DIRECT resource and 3 other users are waiting for the resource, that line of code is considered to have 3 observations in the CFRBDIR state.

On the other hand, if a user at a line of code holds the DIRECT resource (in share mode) along with 4 other users, and a single user is waiting for the DIRECT resource, the line of code is considered to have 1/5th of an observation in the CFRBDIR state.

Generally, the most useful reports for reducing critical file resource waits are the CFRB reports. This statement breaks down the state where a user is blocking another user from any critical file resource by lines of SOUL code:

REPORT STATE CFRBANY CHUNK 4

This is probably the most useful of the STATE CFRxxxx reports. Once critical file resource blocking is isolated to specific SOUL instructions, critical file resource enqueuing can be reduced by:

  • Reducing the number of times the offending instructions are executed.
  • Reducing the amount of disk I/O performed by the offending instructions.
  • Reducing the amount of CPU used by the offending instructions.

It might be tempting to use the Find Without Locks SOUL statement to reduce the critical file resource enqueuing associated with a statement. This will only work if the resource causing conflicts is the RECENQ resource. All other critical file resources are processed exactly the same way, whether or not a locked record set is being used.

However, if the resource causing the conflict is indeed the RECENQ resource, it is still not recommended that the solution be Find Without Locks. A high conflict rate on the RECENQ resource indicates that the environment has a high update activity level, which means that operating on unenqueued found sets is a questionable tactic at best. A high conflict rate on the RECENQ resource might suggest examination of strategies for releasing found sets before any terminal I/O occurs.

The CFRHxxx reports can be useful for tracking potential critical file resource enqueuing problems (perhaps in a test environment) before they actually happen. These states include any user that holds a critical file resource, whether or not it is blocking anyone. These reports are difficult to interpret, however, since they require a fairly good estimate of expected future usage patterns to have any predictive value.

See also