Checkpoints: Storing before-images of changed pages: Difference between revisions
m (→Using sub-transaction checkpoints in recovery: add link) |
|||
(11 intermediate revisions by 4 users not shown) | |||
Line 21: | Line 21: | ||
This page discusses the Checkpoint facility, preimage logging, and dumping the CHKPOINT file with the UTILC utility. </p> | This page discusses the Checkpoint facility, preimage logging, and dumping the CHKPOINT file with the UTILC utility. </p> | ||
<ul> | <ul> | ||
<li>HLI | <li>Checkpoints in HLI are discussed in detail in [[HLI: Model 204 recovery and checkpoints]].</li> | ||
<li>[[System and media recovery#Recovery data sets and job control|Recovery data sets and job control]] explains how to use CHKPOINT in the recovery process.</li> | <li>[[System and media recovery#Recovery data sets and job control|Recovery data sets and job control]] explains how to use CHKPOINT in the recovery process.</li> | ||
Line 27: | Line 27: | ||
<li>Checkpoint data stream configurations are described in [[Configuring checkpoint and journal data streams]]. </li> | <li>Checkpoint data stream configurations are described in [[Configuring checkpoint and journal data streams]]. </li> | ||
</ul> | </ul> | ||
==Understanding the Checkpoint facility== | ==Understanding the Checkpoint facility== | ||
<p> | <p> | ||
Line 197: | Line 197: | ||
ST X13,SAVEAREA+4 CHAIN CALLERS SAVEAREA TO OURS | ST X13,SAVEAREA+4 CHAIN CALLERS SAVEAREA TO OURS | ||
LA X13,SAVEAREA SET A(OUR SAVEAREA) | LA X13,SAVEAREA SET A(OUR SAVEAREA) | ||
WTO 'M204CKPX CHECKPOINT EXIT INVOKED, | WTO 'M204CKPX CHECKPOINT EXIT INVOKED, UPDATERS SUSPENDED' | ||
<b></b>* ********************************************************** * | <b></b>* ********************************************************** * | ||
Line 229: | Line 228: | ||
<li>Left interrupts disabled, or did not restore the ESTAE or ESPIE macro routines. </li> | <li>Left interrupts disabled, or did not restore the ESTAE or ESPIE macro routines. </li> | ||
</ul></blockquote> | </ul></blockquote> | ||
==Overview of sub-transaction checkpoints== | ==Overview of sub-transaction checkpoints== | ||
<p> | <p> | ||
Line 264: | Line 263: | ||
A sub-transaction checkpoint causes the current stream to be switched and checkpoint information written to the beginning of the new stream. If the Online comes down before the sub-transaction checkpoint completes then it is unusable and the previous good checkpoint must be used for restart recovery. </p> | A sub-transaction checkpoint causes the current stream to be switched and checkpoint information written to the beginning of the new stream. If the Online comes down before the sub-transaction checkpoint completes then it is unusable and the previous good checkpoint must be used for restart recovery. </p> | ||
<p> | <p> | ||
CCATEMP, <var>[[NDIR parameter|NDIR]]</var>, and <var>[[NFILES parameter|NFILES]]</var> must be the same size or greater than the settings used in the run being recovered.</p> | [[CCATEMP]], <var>[[NDIR parameter|NDIR]]</var>, and <var>[[NFILES parameter|NFILES]]</var> must be the same size or greater than the settings used in the run being recovered.</p> | ||
===Using transaction or sub-transaction checkpoints=== | ===Using transaction or sub-transaction checkpoints=== | ||
<p> | <p> | ||
Line 279: | Line 278: | ||
The best way to rectify this situation is to attempt to modify the applications that cause checkpoints to timeout by improving the use of the <var>Commit</var> statement. However, this may not be a reasonable approach given time and resource constraints or the nature of the application.</p> | The best way to rectify this situation is to attempt to modify the applications that cause checkpoints to timeout by improving the use of the <var>Commit</var> statement. However, this may not be a reasonable approach given time and resource constraints or the nature of the application.</p> | ||
<p> | <p> | ||
An Online with sub-transaction checkpoints enabled uses multiple KOMMs | An Online with sub-transaction checkpoints enabled uses multiple KOMMs.</p> | ||
==Implementing sub-transaction checkpoints in your job== | ==Implementing sub-transaction checkpoints in your job== | ||
<p> | <p> | ||
Line 302: | Line 301: | ||
<li>The <var>NUSERS</var> parameter must be greater than one. If <var>NUSERS</var> equals one, then any stream definition for CHKPNTS is ignored and the run is not a sub-transaction enabled run. </li> | <li>The <var>NUSERS</var> parameter must be greater than one. If <var>NUSERS</var> equals one, then any stream definition for CHKPNTS is ignored and the run is not a sub-transaction enabled run. </li> | ||
<li>If you set <var>CPMAX</var> to greater than one, the following message is issued, and <var class="product">Model 204</var> resets <var>CPMAX</var> to one: | <li>If you set <var>CPMAX</var> to greater than one, the following message is issued, and <var class="product">Model 204</var> resets <var>CPMAX</var> to one: | ||
Line 323: | Line 320: | ||
<p> | <p> | ||
As long as all data sets and parameters are appropriately defined and requirements met within your job, you can reset back and forth between <code>CPTYPE=0</code> and <code>CPTYPE=1</code>.</p> | As long as all data sets and parameters are appropriately defined and requirements met within your job, you can reset back and forth between <code>CPTYPE=0</code> and <code>CPTYPE=1</code>.</p> | ||
===Checkpoint definition restrictions for sub-transaction checkpoints=== | ===Checkpoint definition restrictions for sub-transaction checkpoints=== | ||
<p> | <p> | ||
Line 375: | Line 372: | ||
Calculate the maximum amount of disk space in use in this run in the CHKPOINT data set by dividing the high water mark of records (page preimages) written to the CHKPOINT data set by the number of pages per track for your device type. To determine pages per track, see [[File size calculation in detail#Data set allocation|Data set allocation]]. </p> | Calculate the maximum amount of disk space in use in this run in the CHKPOINT data set by dividing the high water mark of records (page preimages) written to the CHKPOINT data set by the number of pages per track for your device type. To determine pages per track, see [[File size calculation in detail#Data set allocation|Data set allocation]]. </p> | ||
<p> | <p> | ||
You can also monitor the size of the CHKPOINT data set using any existing utility appropriate for your operating system.</p> | You can also monitor the size of the CHKPOINT data set using any existing utility appropriate for your operating system. <var>[[MONITOR SIZE command|MONITOR SIZE]] CHKP</var> is a useful command for this purpose.</p> | ||
<p> | <p> | ||
Generous sizing of the CHKPOINT data set is important, because <var class="product">Model 204</var> terminates whenever the CHKPOINT data set becomes full. For this reason, make the primary extent at least 50% greater than that calculated above. Further, define the data set with a substantial secondary extent, say 25% of the primary extent, and 15 extents of this size should be guaranteed available. For information about configuring the CHKPOINT data set, see [[Configuring checkpoint and journal data streams]].</p> | Generous sizing of the CHKPOINT data set is important, because <var class="product">Model 204</var> terminates whenever the CHKPOINT data set becomes full. For this reason, make the primary extent at least 50% greater than that calculated above. Further, define the data set with a substantial secondary extent, say 25% of the primary extent, and 15 extents of this size should be guaranteed available. For information about configuring the CHKPOINT data set, see [[Configuring checkpoint and journal data streams]].</p> | ||
===Maintaining single volume CHKPOINT data sets=== | ===Maintaining single volume CHKPOINT data sets=== | ||
<p> | <p> | ||
In Model 204 version 7.6 and earlier, multivolume CHKPOINT data sets are not supported for roll back recovery because BSAM does not support read-backward for those kinds of data sets. If a CHKPOINT data set is written to multiple volumes, then it is necessary to copy the data set onto a single volume before running <var>RESTART ROLL BACK</var> recovery.</p> | |||
<p>In Model 204 version 7.7 and later, multivolume CHKPOINT data sets are supported for roll back recovery because roll back pass 1 and pass 2 both read the recovery data set forward. See [[System and media recovery#ROLL BACK facility|ROLL BACK facility]] for details.</p> | |||
<p> | <p> | ||
Large CHKPOINT data sets are usually not required when CPMAX=1 is set in CCAIN. This is explained in [[#Limiting the size of CHKPOINT|Limiting the size of CHKPOINT]], below. GDGs (described in [[Configuring checkpoint and journal data streams]]) can be used for the CHKPOINT data set when long-running-update transactions are common.</p> | |||
<p> | |||
<p>If checkpoint timeouts (which can require large CHKPOINT data sets) are common, the use of [[#Overview of sub-transaction checkpoints|sub-transaction checkpoints]] is recommended. Sub-transaction checkpoints eliminate checkpoint timeouts and help to keep the size of the CHKPOINT data set to a minimum. | |||
</p> | |||
===Limiting the size of CHKPOINT=== | ===Limiting the size of CHKPOINT=== | ||
<p> | <p> |
Latest revision as of 21:02, 29 December 2016
Overview of the Checkpoint facility
The Model 204 Checkpoint facility stores before-images of changed pages, or described another way, saves the image before applying the change to a page. Model 204 supports two types of checkpoints: transaction and sub-transaction.
- Transaction checkpoints are identical to what in pre-V6R1.0 Model 204 releases were simply called checkpoints. If a transaction checkpoint is not taken within a certain period because of too much update traffic, that is, times out, the Checkpoint facility tries again at the next interval.
- Sub-transaction checkpoints guarantee that a checkpoint is taken at the specified interval, which involves switching back and forth between transaction and sub-transaction checkpoints.
You can elect to take only transaction checkpoints by setting the CPTYPE parameter to 0, or you can take both transaction and sub-transactions by setting CPTYPE to 1.
- Transaction checkpoints are saved to CHKPOINT, a sequential data set that contains copies of file pages before updates are applied (before-images, also called preimages) and marker records (checkpoints) that record the date and time when the system is quiescent (no updating activity).
- Sub-transaction checkpoints are saved to CHKPNTS, the same type of sequential data set that saves the same information.
Model 204 would use either CHKPOINT or CHKPNTS during recovery to restore (roll back) the original contents of updated files at a particular checkpoint, usually the most recent.
For more information
This page discusses the Checkpoint facility, preimage logging, and dumping the CHKPOINT file with the UTILC utility.
- Checkpoints in HLI are discussed in detail in HLI: Model 204 recovery and checkpoints.
- Recovery data sets and job control explains how to use CHKPOINT in the recovery process.
- Checkpoint data stream configurations are described in Configuring checkpoint and journal data streams.
Understanding the Checkpoint facility
Checkpoint markers and preimage copies of database pages are recorded on a sequential data set called CHKPOINT (and CHKPNTS). Checkpoints taken during a Model 204 run mark times when no updates are in progress. Between checkpoints, before-images of Model 204 file pages are recorded in a checkpoint file as files are updated. After a system crash, you can roll back the database to its status at the time of a specific checkpoint. Changes made in the files between the time of that checkpoint and the time of the system crash are removed.
Checkpoints are taken on all files simultaneously to preserve logical file consistency. When file changes are rolled back, all files, regardless of their condition, are automatically restored to a checkpoint. Unless otherwise specified, files are rolled back to the most recent checkpoint.
By using the Checkpoint facility in conjunction with the Roll Back and Roll Forward facilities, you can recover a valid copy of the Model 204 database after a system failure.
Checkpoint parameters
You control the operation of the Checkpoint facility by specifying parameters on User 0's parameter line. The following table lists the checkpoint parameters:
Parameter | Meaning |
---|---|
CPMAX | Maximum number of checkpoints saved during the run. The default is 32767. |
CPTIME | Time in minutes between attempts to take automatic checkpoints. The default is 0. |
CPTO | Amount of time in seconds allowed to quiesce updating User Language users, command users, and online IFAM jobs. The default is 0. |
CPTQ | Amount of time allowed in seconds to quiesce updating batch Host Language Interface jobs. The default is 0. |
CPTS | Time interval for in-flight transactions to complete before going into a sub-transaction wait. |
CPTYPE | Choosing to use transaction-only checkpoints, or both transaction and sub-transaction checkpoints |
CPSORT | Maximum number of times to retry an attempt to take a checkpoint initiated by the start or end of an IFAM2 job. |
FRCVOPT | File recovery options. Settings indicate whether or not a file participates in checkpoint and/or roll forward logging. |
NSUBTKS | Checkpoint facility uses three internal processes called pseudo subtasks (PSTs). The setting of this parameter must reserve at least three slots for checkpoint use, if checkpointing is active. Add another pseudo subtask for asynchronous checkpointing in an environment that supports 31-bit processing. |
RCVOPT | Recovery options. Settings activate the Checkpoint facility and govern checkpoint and journal logging. |
Taking a checkpoint
The process of taking a checkpoint is activated by one of the following events:
- Expiration of the specified time between checkpoints (CPTIME)
- CHECKPOINT command
- Host Language Interface IFCHKPT call
- Host Language Interface IFSTRT and IFFNSH calls
Note: Only the first IFSTRT call from an IFAM2 job activate/s the CHKPPST pseudo subtask. Subsequent IFSTRTs for the same IFAM2 job do not initiate checkpoint attempts.
Checkpoint algorithm
Once activated, the checkpoint algorithm proceeds as follows:
- Prevents any new update units from starting until the checkpoint process is completed.
- Waits a preset number of seconds for updating threads to quiesce.
The length of the wait depends on the type of threads that have active update units and the settings of the CPTQ and CPTO parameters.
- If no update units are active, there is no wait.
- If update units are active on batch Host Language Interface threads whose connection to the ONLINE region is initiated with an IFSTRT call, the task waits the number of seconds specified in the CPTQ parameter, or until all such threads have quiesced.
- If update units are active on User Language threads or Host Language Interface threads whose connections to the ONLINE region are initiated with an IFDIAL call, the task waits the number of seconds specified in the CPTO parameter, or until all such threads have quiesced.
- Times out the checkpoint if the specified number of seconds passes and update units are still active.
- Writes a checkpoint record to the CHKPOINT file.
- Allows new update units to begin after a checkpoint is taken or timed out.
If checkpoint has timed out, Model 204 attempts to take a checkpoint each time an update unit completes.
Taking asynchronous checkpoints (31-bit)
Taking asynchronous checkpoints is supported in operating environments with 31-bit processing. This reduces the amount of wait time for preimage writes to the checkpoint data set.
The pseudo subtask CHKPAWW performs the wait for preimage writes. If you are running in 31-bit mode, add one to the value of the NSUBTKS parameter. You might also need to increase the value of the LPDLST parameter.
Using the CHECKPOINT command
If a user issues the CHECKPOINT command to take a checkpoint, control returns immediately to the terminal of the user who issued the command. The terminal does not wait for the checkpoint to complete. When the checkpoint process is complete, a message indicating the status of the checkpoint is issued to the terminal of the user who entered the command. If update units are active, control is returned to the user without taking a checkpoint.
Checkpoint processing with IFSTRT or IFFNSH calls
Whenever the CHKPPST pseudo subtask is activated by a Host Language Interface IFSTRT or IFFNSH call, a nonzero value specified on CPSORT causes the CHKPPST task to loop through the waiting process the number of times specified on CPSORT. New update units must wait for the entire length of the looping process.
For more information about the effect of CPSORT values on the CHKPPST task, refer to the Rocket Model 204 Host Language Interface Reference Manual.
Determining the status of a checkpoint
Any terminal user can issue the CHKMSG command to obtain a copy of the most recent checkpoint message. Host language users can issue an IFCHKPT call to determine the status of the most recent checkpoint. One of the following checkpoint status information messages is broadcast to the operator's console each time CHKPPST is activated (attempts to take a checkpoint):
*** M204.0843: CHECKPOINT COMPLETED ON date/time
or
*** M204.0843: CHECKPOINT TIMED OUT ON date/time DUE TO USER nn
Aborting a checkpoint
If a pending checkpoint causes too great a delay in new request initiation or host language jobs, a user with system manager privileges can issue the CHKABORT command to cause the pending checkpoint to time-out immediately. The CHKABORT command is issued without arguments.
M204CKPX checkpoint user exit
The checkpoint user exit, M204CKPX, can be invoked if linked in. This exit runs after the CHECKPOINT records are written, and just before the CHECKPOINT COMPLETED
message is issued. All Model 204 databases are physically consistent on disk at this time, because all updated pages have been flushed to disk to prepare for checkpoint processing.
The exit runs before any update users are allowed to run in the Online, which allows users to write an exit that backs up all their database files, between the hours of x
and y
, all based on the user exit code.
When the exit abends or completes, the Online releases updating users to run and continues. Because this exit is invoked each time a checkpoint is taken, you must forego any extensive processing until the Online is in a low use period, then back up the files.
Example
The following example is a M204CKPX ASSEMBLER exit. You can use the shell of the following program to write your own user exit. If your user exit abends, Model 204 tries to continue. All registers can be used. You can safely copy the files because the modified pages have all been flushed to disk and are physically consistent. The Online continues to service read-only users, and updating is suspended until this exit completes. You might want to include your own ESTAE
exit macro to deal with abends.
A sample M204CKPX ASSEMBLER exit:
M204CKPX CSECT M204CKPX AMODE 31 M204CKPX TITLE 'TEST THE MODEL 204 CHECKPOINT USER EXIT' X10 EQU 10 X11 EQU 11 X12 EQU 12 X13 EQU 13 X14 EQU 14 X15 EQU 15 STM X14,X12,12(X13) SAVE CALLERS REGISTERS LR X12,X15 ESTABLISH BASE REGISTER USING M204CKPX,X12 LA X10,SAVEAREA GET A(LOCAL REGISTER SAVEAREA) ST X10,8(,X13) CHAIN OUR SAVEAREA TO CALLERS ST X13,SAVEAREA+4 CHAIN CALLERS SAVEAREA TO OURS LA X13,SAVEAREA SET A(OUR SAVEAREA) WTO 'M204CKPX CHECKPOINT EXIT INVOKED, UPDATERS SUSPENDED' * ********************************************************** * * * Add code here that calls routines to back up your * * * * Model 204 databases. * * * * * * * DC X'000000000000' TEST TO SEE WHAT HAPPENS WHEN * * * THE USER EXIT ABENDS * * * ********************************************************** * WTO 'M204CKPX EXIT ENDING, UPDATERS WILL BE RELEASED' L X13,4(,X13) RESTORE CALLERS SAVE AREA ADDRESS ST X10,16(X13) SET RETURN CODE (R15) LM X14,X12,12(X13) RESTORE CALLERS REGISTERS BR X14 RETURN TO CALLER DS 0D SAVE AREA DS 18F REGISTER SAVE AREA LTORG END
If Model 204 detects an abend in the M204CKPX exit, the following message is written to the JES log or is displayed on the z/VM console:
M204.CKPX: ABEND IN M204CKPX IGNORED, ATTEMPTING TO CONTINUE
Note: If M204CKPX abends, Model 204 attempts to continue processing, but there is no guarantee that it can do so. Further processing might be prevented for one of the following reasons:
- M204CKPX destroyed storage that Model 204 depends on.
- Left interrupts disabled, or did not restore the ESTAE or ESPIE macro routines.
Overview of sub-transaction checkpoints
A sub-transaction checkpoint is a checkpoint that can be taken while updating transactions are in progress and un-committed, eliminating checkpoint timeout situations. A sub-transaction checkpoint is a guaranteed checkpoint; it can be taken under almost all updating situations.
The exceptions are long-running file-update commands, which must run to completion before a transaction or a sub-transaction checkpoint can be taken. Examples of potentially long-running file-update commands are CREATE, INITIALIZE, and REDEFINE FIELD. If any of these commands require an extended period of time to complete, a sub-transaction checkpoint will be postponed. Generally, these commands are issued infrequently in production Onlines.
Guaranteed checkpoints improve recover ability, reduce recovery time, and further improve 24 X 7 availability.
Transaction checkpoints are identical to what in previous Model 204 releases were simply called checkpoints.
Reviewing transaction checkpoints
At the time a transaction checkpoint is taken, all updates have ended and been written to their respective M204 disk files. A transaction checkpoint is indicated by a date/time stamp record on the checkpoint, journal, and deferred update data sets, and represents a state to which a restart rollback can restore a set of Model 204 files, in preparation for using roll forward to reapply later updates.
For a transaction checkpoint to be taken, user threads must end their update units. This can be done with some form of COMMIT or IFCHKPT.
Introducing sub-transaction checkpoints
At the time a sub-transaction checkpoint is taken, all updates will not have ended (although a non-zero value of CPTS will give them time to do so and therefore change a sub-transaction checkpoint attempt into a transaction checkpoint).
A typical update unit is made up of a sequence of sub-transactions. A typical sub-transaction consists of reading a Model 204 page to be updated, writing the image of the page to the checkpoint stream, making an update to the page, creating a back out and constraint entries in CCATEMP (in case the update is backed out) and writing the update to the journal stream.
At the time a sub-transaction checkpoint is taken, all sub-transactions will have ended and been written to their respective Model 204 disk files. However, there will still be active update units with associated non-zero back out and constraint entries. These are written to the checkpoint stream. So, a sub-transaction checkpoint is indicated by a date/time stamp record on the checkpoint stream, along with all back out and constraint entries for active updates. This is followed by a sub-transaction checkpoint completion record. The checkpoint stream therefore contains a snapshot of active update units at the time of checkpoint.
For sub-transaction checkpoints only the initial date/time stamp record is written to the journal stream and deferred update data sets.
Using sub-transaction checkpoints in recovery
At restart recovery, the back out and constraint logs from the input restart streams are used recreate the update environment needed to recover transactions interrupted by the original sub-transaction checkpoint.
Sub-transaction enabled Onlines use two checkpoint streams: CHKPOINT and CHKPNTS. At any given time during the Online run, one stream or the other is the current checkpoint stream — the one to which preimages are written.
A sub-transaction checkpoint causes the current stream to be switched and checkpoint information written to the beginning of the new stream. If the Online comes down before the sub-transaction checkpoint completes then it is unusable and the previous good checkpoint must be used for restart recovery.
CCATEMP, NDIR, and NFILES must be the same size or greater than the settings used in the run being recovered.
Using transaction or sub-transaction checkpoints
If a transaction checkpoint is taken on an sub-transaction enabled Online then the current checkpoint stream is rewound instead of switched.
Note: The initialization and termination checkpoints are always transaction checkpoints. The command CHECKPOINT E Q
forces the next checkpoint to be a transaction checkpoint. The checkpoint at the end of a roll forward is always a transaction checkpoint.
IFAM SIGNON/SIGNOFF and IFCHKPT checkpoints are always transaction checkpoints. And, there are certain run configurations and checkpoint stream/ and file type configurations, which require all checkpoints to be transaction checkpoints. This means that sub-transaction checkpoints cannot be enabled for those configurations. An example of this is a NUSER=1
run.
Considerations for implementing sub-transaction checkpoints
Sites that find they regularly have checkpoints timeout, with no intervening successful checkpoint over a significant duration of time, might consider this option.
The best way to rectify this situation is to attempt to modify the applications that cause checkpoints to timeout by improving the use of the Commit statement. However, this may not be a reasonable approach given time and resource constraints or the nature of the application.
An Online with sub-transaction checkpoints enabled uses multiple KOMMs.
Implementing sub-transaction checkpoints in your job
The CPTYPE parameter designates the type of checkpoints to use:
CPTYPE=0
, the default, indicates that transaction checkpoints are activated.CPTYPE=1
indicates that both transaction and sub-transaction checkpoints are activated.
Note: CPTYPE is resettable by the system manager.
Requirements for CPTYPE=1
To enable sub-transaction checkpoints, you must make the following adjustments to your data sets and Model 204 parameters. Otherwise the run terminates with one of the M204.2684
messages, for example:
M204.2684 CHECKPOINT CONFIGURATION CONFLICT
- The data set CHKPNTS must be defined to the job, in addition to CHKPOINT. Rocket Software recommends that you allocate the CHKPNTS data set with the same space parameters as the CHKPOINT data set.
If the CHKPNTS data set is not defined or cannot be opened, an error message is issued during initialization and the job is terminated.
- The NUSERS parameter must be greater than one. If NUSERS equals one, then any stream definition for CHKPNTS is ignored and the run is not a sub-transaction enabled run.
- If you set CPMAX to greater than one, the following message is issued, and Model 204 resets CPMAX to one:
M204:2685 CHKPNTS IS OPEN SO CPMAX SET TO 1
- The DKUPDTWT parameter must be set to zero, which is the default.
- You can set the CPTS parameter to establish the number of seconds to delay the start of new updating transactions. A non-zero CPTS time interval gives in-flight transactions a chance to end before they are forced into a sub-transaction wait.
If all transactions end during the CPTS interval, a transaction checkpoint is taken instead of a sub-transaction checkpoint.
- The CPTIME parameter is still required to establish the time interval, in minutes, between automatic checkpoints. It must be greater than zero.
- Set RCVOPT to nine to enable checkpoints and journals.
CPTO and CPTQ can be left in your CCAIN stream even if you set CPTYPE=1
. However, they will be ignored unless CPTYPE is reset to 0.
As long as all data sets and parameters are appropriately defined and requirements met within your job, you can reset back and forth between CPTYPE=0
and CPTYPE=1
.
Checkpoint definition restrictions for sub-transaction checkpoints
Both CHKPOINT and CHKPNTS must be present. Each may be defined as a stream but may not contain:
- A ring stream
- More that 16 levels of recursive stream definitions
- A CMS formatted file
Before-image logging
Every updated database page is copied (logged) into the CHKPOINT data set before any change is performed. Whenever a request to update a page is received, Model 204 first checks to see if the page was logged since the last checkpoint. If the page has not been logged, the page is logged and the update allowed to proceed. If the page has already been logged, the update proceeds immediately.
You can disable logging of before-images on a file-by-file basis by using the file parameter FRCVOPT. If logging is disabled, the file can be recovered only by using the media recovery procedures. (See Media recovery NonStop/204 for more information on media recovery.)
Creating the CHKPOINT/CHKPNT (and CHKPNTS) data set
The checkpoint data set is named CHKPOINT
in z/OS and z/VM; in z/VSE, use the name CHKPNT
. If your site uses the sub-transaction capability as well, that checkpoint data set is named CHKPNTS
for all operating systems.
Notes:
- Due to changes in Journal record layouts, CCAJRNL and CHKPOINT/CHKPNT data sets are not compatible with previous releases of Model 204. For details, see Journal block header information for SWITCH STREAM.
- If your site has sub-transaction checkpoints enabled, you must create a CHKPNTS data set in addition to the original checkpoint data set. The following discussion applies equally to creating a CHKPNTS data set.
To create a checkpoint data set:
- Include a CHKPOINT (or CHKPNT, and as needed, CHKPNTS) DD statement in the JCL.
- Set RCVOPT to include the X'01' bit on User 0's parameter line.
A checkpoint data set is a sequential, unblocked data set with a record length equal to the Model 204 page size (6184 bytes). You can store it only on disk.
You may define the CHKPOINT and CHKPNTS data sets using parallel streams. The only restriction on CHKPOINT and CHKPNTS stream definitions is that they may not contain CMS data sets. There are no restrictions on the type of steam you can use to define the CCAJRNL and CCAJLOG journals.
Calculating disk space required
You can determine amount of disk space required by the CHKPOINT data set in any given run by using the high water mark of records (page preimages) written to the CHKPOINT data set. You can obtain the high water mark via a User Language request. For example, you may want to include the following procedure as part of your CCAIN input stream just prior to EOJ
:
BEGIN PRINT 'HIGH WATER MARK FOR CHECKPOINT RECORDS' AND - $CHKPINF(8) END
Calculate the maximum amount of disk space in use in this run in the CHKPOINT data set by dividing the high water mark of records (page preimages) written to the CHKPOINT data set by the number of pages per track for your device type. To determine pages per track, see Data set allocation.
You can also monitor the size of the CHKPOINT data set using any existing utility appropriate for your operating system. MONITOR SIZE CHKP is a useful command for this purpose.
Generous sizing of the CHKPOINT data set is important, because Model 204 terminates whenever the CHKPOINT data set becomes full. For this reason, make the primary extent at least 50% greater than that calculated above. Further, define the data set with a substantial secondary extent, say 25% of the primary extent, and 15 extents of this size should be guaranteed available. For information about configuring the CHKPOINT data set, see Configuring checkpoint and journal data streams.
Maintaining single volume CHKPOINT data sets
In Model 204 version 7.6 and earlier, multivolume CHKPOINT data sets are not supported for roll back recovery because BSAM does not support read-backward for those kinds of data sets. If a CHKPOINT data set is written to multiple volumes, then it is necessary to copy the data set onto a single volume before running RESTART ROLL BACK recovery.
In Model 204 version 7.7 and later, multivolume CHKPOINT data sets are supported for roll back recovery because roll back pass 1 and pass 2 both read the recovery data set forward. See ROLL BACK facility for details.
Large CHKPOINT data sets are usually not required when CPMAX=1 is set in CCAIN. This is explained in Limiting the size of CHKPOINT, below. GDGs (described in Configuring checkpoint and journal data streams) can be used for the CHKPOINT data set when long-running-update transactions are common.
If checkpoint timeouts (which can require large CHKPOINT data sets) are common, the use of sub-transaction checkpoints is recommended. Sub-transaction checkpoints eliminate checkpoint timeouts and help to keep the size of the CHKPOINT data set to a minimum.
Limiting the size of CHKPOINT
Model 204 writes to the CHKPOINT data set until the specified maximum number of checkpoints saved (CPMAX) are taken and the next checkpoint is about to be taken. At that point, Model 204 rewinds the CHKPOINT data set and takes the next checkpoint. The data previously written to CHKPOINT is overwritten.
If you set CPMAX to one and set CPTIME to less than 15, the CHKPOINT data set will generally be small enough to fit on a single volume and, usually, occupy no more than about 20 cylinders. CPMAX set to one ensures that you keep only the most recent checkpoint and the preimages logged since that checkpoint. When the CPMAX-plus-one checkpoint is about to be taken, the CHKPOINT data set rewinds and the new checkpoint is taken again at the beginning of the data set. This keeps the data set small and ensures faster recovery.
To preserve all checkpoints and all preimages in the CHKPOINT data set, omit CPMAX on User 0's parameter line.
Obtaining checkpoint information (UTILC)
The UTILC utility is a standalone batch utility that provides information about the Model 204 checkpoint process by interpreting a checkpoint file and printing out checkpoint records in dump format. UTILC is a debugging aid for Model 204 users working with Technical Support staff on a specific problem. However, the information provided by UTILC can also be useful for performance analysis and capacity planning.
Data in a checkpoint data set consists of the record types shown in the following table:
Type code | Contents |
---|---|
1 (01) | File page preimage |
2 (02) | Checkpoint |
3 (03) | Copy of the roll forward file directory |
4 (04) | Unused |
5 (05) | File Parameter List (FPL) page preimage |
6 (06) | Unused |
7 (07) | Batch (IFAM2) job sign on |
8 (08) | Batch (IFAM2) job sign off |
9 (09) | Deferred update open |
10 (0A) | Discontinuity |
11 (0B) | RESET FISTAT (turn off physically broken bit) |
12 (0C) | Dynamic allocation |
Checkpoint record formats
In checkpoint records (type 2), the date and time of the checkpoint is stored in the first 8 bytes of the record, offset 0(0), in the format:
00yydddFhhmmssth
Where:
- yyddd is the Julian date.
- hhmmssth is the time.
For example, 88.335 15:04:27.88
is represented as 0088335F15042788
.
Each record ends with a 40-byte trailer that includes the 1-byte numeric type code at offset 6172 (181C)
, followed by the date and time of the posted checkpoint in the format:
yydddFxxxxxxxx
Where:
- yyddd is the Julian date.
- xxxxxxxx is the millisecond count of the day.
Trailer information identifies preimages, record types 1 and 5:
- First 8 bytes of the trailer, offset
6144(1800)
from the start of the record, contains the name of the file in EBCDIC. - Last 4 bytes of each preimage record, offset
6180(1824)
from the start of the record, is a page identifier in the format:TTPPPPPP
Where:
TT
is a 1-byte table identifier:Identifier code Table 00 File Control Table (FCT) 01 Table A 02 Table B 03 Table C 04 Table D PPPPPP
is a 3-byte hexadecimal table-relative page number.
UTILC input data sets
You can use only single data sets as input to the UTILC utility. You must process members of concatenated, parallel, or ring streams one at a time.
UTILC options
The following options control the execution and output of UTILC:
- FROMDATE and TODATE specify the range of dates for which checkpoint records are printed. FROMDATE and TODATE must be 5-digit Julian dates (for example, 88062).
- FROMTIME and TOTIME specify the time range for which checkpoint records are printed. FROMTIME and TOTIME must be in 24-hour clock (hhmmss) format (for example, 151003 = 3:10:03 PM).
- RECTYPE specifies the type of records to be printed. For a list of record types, see the "Checkpoint record types" table. The default is all record types.
Repeat RECTYPE as necessary. The following example prints all preimages and all File Parameter List page preimages.
RECTYPE=1,RECTYPE=5
- FILENAME specifies the ddname of the file, the name specified in the CREATE command, to be printed. Pages without file names are also printed unless excluded by record type; for example, CHECKPOINT records.
- FROMPAGE and TOPAGE specify the range of Model 204 file pages printed. Page numbers must be hexadecimal. The following example prints all Table B pages for all files:
FROMPAGE=02000000,TOPAGE=02FFFFFF
- Use the FILENAME option to further restrict printing to selected pages from specific files.
TRAILERS=ONLY
specifies printing only page trailers.If you do not specify the
TRAILERS=ONLY
parameter, entire checkpoint records are printed.
UTILC examples
The following examples provide the job control statements to run UTILC in z/OS, z/VSE, and z/VM environments. Each example shows how to request printing of checkpoint record trailers for the hour between 12:05:13 and 13:05:13 in different operating systems.
z/OS
//UTILC EXEC PGM=UTILC,PARM='FROMTIME=120513,TOTIME=130513, // TRAILERS=ONLY' //STEPLIB DD DSN=M204.LOADLIB,DISP=SHR //CCAPRINT DD SYSOUT=A //SYSU DD SYSOUT=A //CHKPOINT DD DSN=M204.CHKPOINT,DISP=OLD
z/VSE
The input stream, CHKPNT, for running UTILC is defined in the JCL by DLBL and EXTENT statements or a TLBL statement:
// JOB UTILC PRINT DISK CHECKPOINT FILE // DLBL CHKPNT,'M204.CHKPOINT.FILE' // EXTENT SYS001,SYSWK1,,,1390,1000 // EXEC UTILC,SIZE=AUTO,FROMTIME=120513,TOTIME=130513, // TRAILERS=ONLY /&
If the input file is on tape, assign the symbolic unit SYS004
to the tape drive on which the tape is mounted. The output is printed on the symbolic unit SYSLST
:
// JOB UTILC PRINT TAPE CHECKPOINT FILE // TLBL CHKPNT,'M204.CHKPOINT' // ASSGN SYS004,X'300' // EXEC UTILC,SIZE=AUTO,FROMTIME=120513,TOTIME=130513, // TRAILERS=ONLY /&
z/VM
The following UTILC EXEC procedure runs UTILC. The command format is:
UTILC dsn mode (FROMTIME hhmmss TOTIME hhmmss TRAILERS=ONLY
Where:
- dsn is the data set name.
For z/OS format data sets, the DSN is specified with spaces instead of periods between the qualifiers.
For CMS data sets, the DSN is specified as filename filetype.
- mode specifies the access mode of the disk containing the file to be processed.
- For tape files, the DSN and MODE parameters are replaced by the keyword TAPE.
Tape files also require issuing a FILEDEF, and a LABELDEF, if necessary, for the CHKPOINT data set before issuing the UTILC command. For example:
FILEDEF CHKPOINT TAP1 SL VOLID 12345 (RECFM FB LRECL 6184 BLKSIZE 6184 LABELDEF CHKPOINT standard labeldef parameters UTILC D TAPE (FROMTIME 120513 TOTIME 130513 TRAILERS=ONLY
Note: The length of optional parameters you can specify is limited by the z/VM console line length of 130 characters.