File integrity and recovery

From m204wiki
Jump to navigation Jump to search

Overview

After files have been designed and loaded, one of the file manager's most important responsibilities is to maintain the integrity of the files. This topic describes some of the error conditions that threaten file structures and the steps that you can take to safeguard the files.

Error recovery features

Model 204 provides many features to ensure the data integrity, including:

  • Extensive command syntax checking and compiler diagnostics.
  • Double-checking of files and tape backups to ensure that files are correctly mounted.
  • SNA Communications Server (formerly VTAM) error recovery routines for data transmission from user terminals.
  • Trailers on each page with information such as file name, page number, and table number, which Model 204 checks every time the page is read or written to protect against loss of integrity due to a disk error.

In addition, Model 204 provides error recovery facilities as described in the following sections.

Transaction back out facility

Data integrity and logical consistency of the files are protected by the Model 204 Transaction back out facility, which can undo the effects of incomplete transactions on file data. Model 204 automatically backs out an incomplete transaction for a transaction back out file if a user's request is canceled, if Model 204 detects a file problem such as a table full condition, or if Model 204 is restarting the user. See Transaction back out for details.

RESTART recovery facility

The system manager must include a RESTART command as part of User 0's input if an installation is to use system recovery facilities. The RESTART command and System and media recovery pages discuss the syntax of this command and describe the Roll Back, Roll Forward, and Ignore features.

Media recovery

In the event of a media failure, such as a disk head crash, Model 204 allows you to recover files using the media recovery feature. Media recovery works by restoring files from a previously made backup copy, and then using the Roll Forward feature to reapply the updates that were made to the files since the time the copy was made.

Note: The REGENERATE command invokes the Restore and Roll Forward facilities automatically. In a media recovery run, the RESTORE and RESTART commands are not specified.

FRCVOPT and FOPT parameters

The FRCVOPT (file recovery options) parameter determines whether or not an installation logs checkpoint and/or Roll Forward information. The system manager also must include the RESTART command in a Model 204 job in order for recovery to run. If an installation uses the system recovery facilities, the file manager can control how files are affected.

The default setting of FRCVOPT is X'00'. The file participates in checkpointing, and batch Model 204 jobs that update the file can be run while the Online job is up. If your installation uses Roll Forward, the file participates. The back out mechanism of transaction back out is enabled.

If the FOPT parameter is also set to X'00' (enabling lock pending updates), the file is a transaction back out file. If the FOPT setting includes the X'02' option, disabling lock pending updates, Model 204 automatically disables transaction back out by turning on the X'08' bit of FRCVOPT.

FRCVOPT parameter

File recovery options are controlled by the FRCVOPT parameter. The possible FRCVOPT settings are:

Bit Meaning
X'80' File cannot be updated if Roll Forward logging is not active.
X'40' File cannot be updated if checkpoint logging is not active.
X'20' File does not participate in checkpoint logging.
X'10' Discontinuities not allowed (hold enqueuing while file is closed if the file has been updated in this run).
X'08' Transaction back out is disabled. If X'01' is specified, this option is automatically set.
X'04' File does not participate in Roll Forward logging.
X'02' File does not participate in Roll Forward logging.
X'01' All updates are reapplied to the file during Roll Forward, without regard to update unit boundaries. Because transactions are explicitly handled differently between transaction back out (X'08' off) and Roll Forward all the way (X'01' on), this option forces the X'08' option; that is, transaction back out is automatically turned off.

Eliminating checkpoints and Roll Forward logging

You can eliminate the overhead of checkpoint and Roll Forward logging if system crashes are infrequent and if manual backup facilities provide sufficient protection. To do this, turn on the X'24' bits of FRCVOPT. This strategy also works well for scratch files.

Logging all changes

At the other extreme, you might want to log all changes for recovery purposes. You can enforce this by setting the X'C0' bits of FRCVOPT. OPEN turns off all updating if logging is not active. The X'80' and X'40' bits are ignored if the OPEN command or statement is issued by User 0 and the file privileges include file manager. This override capability is designed to allow you to run batch update jobs, such as the File Load utility, without the overhead of logging.

Setting and clearing the X'10' bit

The FRCVOPT X'10' bit (discontinuities not allowed) is somewhat independent of the recovery facilities, although preventing discontinuities is less important when checkpointing is not being used. After the file has been updated, if the X'10' bit is set, Model 204 retains a share enqueue for the file, even if all users have closed the file. A separate retrieval job runs, but an updating job waits.

To allow a specific job to run, clear the X'10' bit of FRCVOPT and then close the file. If no other users have the file open, the batch job runs. After the batch job has finished, set the X'10' bit again.

Crash recovery features

In case of failure, Model 204 makes every effort to clean up files and shut down softly. If Model 204 or the operating system under which it is running crashes, any Model 204 file being updated is flagged by a special setting of the FISTAT parameter. The FILE PHYSICALLY INCONSISTENT message is issued whenever the file is opened thereafter until recovery procedures have been executed. In this case, or in the case of a hard crash, integrity can be recovered using the DUMP/RESTORE utility or restart/recovery facility.

FISTAT parameter

FISTAT is a flag parameter that provides control of update completion and other file status information. Generally, the meaning of the FISTAT setting and the actual state of the file determine any special action to be taken.

It is important to understand the different types of errors that can occur during processing, the effect each error has on the file and the setting of FISTAT, and the separation of updates into "update units."

The possible settings of FISTAT are:

Bit Meaning
X'40' File might be logically inconsistent. (Soft restart might have occurred.) Logical inconsistencies are described in more detail in Logically inconsistent files.
X'20' File is in deferred update mode. Either a file load program or an OPEN in deferred update mode was applied to the file, and the Z command has not yet been issued. See Deferred update feature for more information about deferred update mode.
X'10' File has been recovered. This notifies you that the file is intact but that some updates might have been undone. You might need to do some additional work to get the file to its most up-to-date state. Messages that describe the last updates applied are displayed when the file is opened.
X'08' File is full. One of the Tables A, B, C, or D has filled up. If Table A or C fills up, or if Table B fills up in a hashed file, a reorganization is required. Otherwise, the INCREASE command can be used to add more space.
X'02' File is physically inconsistent. The file was being updated when some form of severe error or a system crash occurred. This is described in detail in Hard restarts. This setting produces the FILE PHYSICALLY INCONSISTENT message. (A hard restart might have occurred.)
X'01' File is not initialized. After a file is created, it must be initialized before data can be added. See Initializing files for more information.

Restarts

Model 204 provides two kinds of restarts:

  • Soft
  • Hard

Soft restarts

Soft restarts occur when Model 204 recognizes a severe error that occurs between (as opposed to during) active update operations.

Causes of soft restarts

Severe errors include nonrecoverable terminal I/O errors such as:

  • Phone is hung up
  • No input is coming in on a thread (with the inactive thread timeout feature enabled) for a specified number of seconds. See User environment control parameters for information on the TIMEOUT parameter.
  • Full server tables

Full server tables in transaction back out files

If any server tables are full and the file is a transaction back out file, Model 204 automatically backs out the incomplete transaction without a restart and informs the user through an error message that the transaction has been backed out. Otherwise, Model 204 performs a soft restart.

Effects of soft restarts

In a soft restart:

  • User receives the USER RESTARTED SOFTLY message and is automatically logged out.
  • File-physically-inconsistent flag is cleared (unless other users are updating the file) and the file-logically-inconsistent flag is set.

The file is physically consistent (that is, file structural elements are in order: entries in the index correspond to entries in the data, and so on). However, there is a possibility that the file might be logically inconsistent. The restarted user might have been in the middle of an update transaction that would cause multiple updates to the file. Some updates might have been made before the soft restart occurred, while other updates were not made.

Hard restarts

Hard restarts occur for severe errors during active updating operations.

Causes of hard restarts

Hard restarts can be caused by:

  • Disk I/O errors
  • Table A, B, C, or D full conditions if transaction back out is not in effect

For a description of how these errors are handled when transaction back out is in effect, see Transaction back out. If a severe error occurs, the user receives the USER RESTARTED message and is automatically logged out. The file-physically-inconsistent flag is set.

Effects of hard restarts

One hard restart can affect other users. For example:

  1. User 1 and user 2 are each updating the same non-transaction-back out file.
  2. User 1 updates KEY fields and gets a message that Table D is full.
  3. File-physically-inconsistent flag (X'02') is set and user 1 is restarted.
  4. User 1 now knows that there is a problem with the file.
  5. User 2, however, continues updating NON-KEY fields and is not notified that Table D is full.
  6. User 2 finishes the request and closes the file without ever knowing that the file-physically-inconsistent flag was set.

Logically inconsistent files

A file is logically inconsistent when an incomplete update unit is left applied to the file. For example, consider an application that generates purchase orders.

Records are generated using the User Language STORE RECORD/END STORE construct and the PO number is the internal record number ($CURREC). The internal record number (PO number) is displayed on the screen along with the input items to be added to the record (through FOR RECORD NUMBER and ADD statements) when the transaction is committed.

If you enter an EOJ command and Model 204 terminates before the transaction is committed, the file is marked logically inconsistent (FISTAT X'40' is set), because an update is active and a user is waiting for terminal input.

Physically inconsistent files

If the user is waiting for disk I/O during an ADD operation, and an EOJ is issued, it is possible that some, but not all tables were updated. In this case the file is marked physically inconsistent (FISTAT X'02' is set).

Preventing inconsistent files

In the previous examples, you could prevent the file from becoming logically or physically inconsistent by bumping all users before terminating Model 204.

File manager responsibilities

It is the file manager's responsibility to determine which users have been affected by a file integrity problem and notify them of the status of the file. You can use the BROADCAST FILE command described in BROADCAST command to do so.

System failure

When the operating system or Model 204 crashes during an updating request, the file-physically-inconsistent flag is set. The file might or might not actually be broken, depending on whether the modified file pages were written out before the crash. There is no way to predict whether or not this happens. Even if the pages were written out, the file might be logically inconsistent, depending on the point in the processing at which the crash occurred.

Premature system termination

Model 204 can be terminated prematurely (by operator cancellation or because of certain error conditions) while requests are still running. When Model 204 is prematurely terminated, modified pages are not automatically written. This means that a user might have been updating and some or all updates might not appear on the file. In addition, if any other users were updating the same file shortly before termination, their updates might not appear either, even though their requests ran to completion.

Request cancellation

If a serious user error occurs during evaluation of a User Language request, the request is canceled. Examples of serious user errors are:

  • Incorrect use of a field name variable
  • Including field-level security violations
  • Attempt to store too many values or too long a value for a preallocated field

Request cancellation functions in the same manner as a soft user restart. The user is notified that the request has been canceled, and the file is marked with the flag. If the file is a transaction back out file, Model 204 cancels the request, backs out the incomplete transaction, and does not mark the file logically inconsistent. The user is notified that the transaction has been backed out.

Recovery methods

Methods for recovering from file problems are presented in the following sections.

Taking the necessary precautions

Take the following precautions to allow your site to recover from a file problem:

  • Use the DUMP/RESTORE utility regularly to take regular backups. If a recent DUMP of the file is available, the file can be restored to its state at the time of the dump using the RESTORE command. See File dumping and restoring for more information.
  • Dump the files. If journaling is ordinarily active, and all subsequent file updates are available, then you can use media recovery to restore file integrity. See System and media recovery for information about using journals to recover files.
  • Activate checkpointing during Online processing. In the event of system crash, Model 204 crash, or hard restart, you can use the RESTART command to roll back to the last valid checkpoint. If journaling is also active during the Online processing, you can use the Roll Forward feature to restore as many file changes as possible. These system recovery procedures are described in System and media recovery.

Warning: To ensure file integrity, Rocket Software strongly recommends that you never reset the FISTAT (file status) parameter when it is set to the file-physically-inconsistent flag until just before you reorganize the file. However, you must reset FISTAT before you reorganize a physically inconsistent file.

Reloading broken files

When the precautions recommended above have been neglected and the file is broken, recovery still might be possible. The safest method consists of resetting FISTAT temporarily and then running the following User Language request:

USE OUTFILE BEGIN ALL: FIND ALL RECORDS END FIND PRINT.LOOP: FOR EACH RECORD IN ALL PRINT '*' PRINT ALL INFORMATION END FOR END

You can then use a file load program with OUTFILE to reload the file (see the discussion of file reorganization and PAI FLOD in "Using PAI FLOD for files with varying record formats".

Reducing integrity problems

In addition to procedures used by the file and system managers for backup purposes, applications can be designed to reduce file integrity problems.

Using COMMIT statements for manageable update units

Use a COMMIT statement between each logical update so that the system can set the file-physically-inconsistent flag off and write out updated file pages frequently. Also, Model 204 can take checkpoints only between update units. Long update units can severely inhibit the taking of checkpoints. (Update units are discussed in detail in Model 204 update units.)

To accomplish transaction back outs, Model 204 maintains a transaction back out and constraint log for each active update unit, built in the system file CCATEMP. Sizable update units can greatly increase the amount of CCATEMP space used by the system. Because insufficient CCATEMP space terminates the run, commit update units frequently to minimize the CCATEMP space requirement.

Updates to TBO and non-TBO files in the same request

You can update a non-TBO file, commit it, and then update a TBO file-or reversed file order-in the same request.

Requests that attempt to update TBO and non-TBO files without an intervening COMMIT will compile, but will fail during evaluation with the following message:

*** 1 CANCELLING REQUEST; M204.2771: ATTEMPT TO UPDATE TBO AND NON-TBO FILES IN THE SAME TRANSACTION

Designing Host Language Interface jobs

For Host Language Interface jobs, set the update indicator in the IFSTRT function only when updating is really occurring, and issue the IFCHKPT call at various points during a long updating program.

Retrieval-only Host Language Interface threads are not included in update units. Passing data from a retrieval-only thread to an update thread can result in logical inconsistencies to the updated file during Roll Forward. To prevent these inconsistencies, start the thread as an update thread and use a retrieval-only password to open the file, thus providing share-mode enqueuing and preventing updating from this thread. The file also is prevented from being marked physically inconsistent in the event of a hard restart or system crash. See Recovery considerations for single and multicursor IFAM2 threads for further description.

Update units and transactions

In general data processing terms, a transaction is a sequence of operations that access a database. The order of these operations is defined by the application or the user.

Model 204 transaction types

Model 204 recognizes the following types of transactions:

Type Transactions that...
1 Cannot update the database
2 Can update the database but cannot be backed out
3 Can update the database and can be backed out

Model 204 update units

In the Model 204 environment, the term update unit refers to any sequence of operations that is allowed to update the database. The two types of update transactions described above, types 2 and 3, are both update units in Model 204:

  • Update units that cannot be backed out (type 2) are called non-backoutable update units.
  • Update units that can be backed out (type 3), are called transactions or backoutable update units.

Non-backoutable update units

The following types of updates cannot be backed out and must be part of non-backoutable update units:

  • Updates to non-transaction-back out files
  • Updates resulting from file updating commands (such as RENAME field)
  • Host Language functions that correspond to the updates in the previous two items
  • Updates resulting from the EDIT subcommands END and GO
  • Procedure definitions

Backoutable update units

Backoutable update units are User Language updates to a transaction back out file and their Host Language Interface counterparts (record or record set updating calls). These are the only types of update operations that can occur inside a Model 204 transaction. A transaction can either be completed so that it persists in the file, or backed out so that the update is logically undone in the file.

Detailed descriptions of the boundaries of User Language backoutable and non-backoutable update units are presented in the following sections, followed by a discussion of the Host Language Interface update unit boundaries.

Boundaries of non-backoutable update units

Updating commands must be in non-backoutable update units. Some updating commands end any previous active update unit, whether backoutable or not.

The following updating commands always end active update units:

INITIALIZE
TRANSFORM
REDEFINE FIELD
RENAME FIELD
DELETE FIELD
CREATE FILE or CREATE PERM GROUP
EDIT permanent-procedure
INCREASE +++
DECREASE +++
FLOD +++
FILELOAD +++
REGENERATE +++
REORGANIZE +++
RESTORE
RESTOREG
Z +++

The commands followed by +++ cannot be issued inside a procedure.

Updating commands that sometimes end active update units

Other updating commands end the previous update unit only when that update unit is a User Language update. If a command non-backoutable update unit is in progress, the following commands are included in that non-backoutable update unit:

SECURE
BROADCAST FILE
DEFINE
PROCEDURE or PROC
DELETE PROCEDURE or DELETE PERM GROUP
RESET
RENAME PROCEDURE +++
ASSIGN
DEASSIGN
DESECURE
CREATEG +++

Nonupdating commands that end active update units

Nonupdating commands that end any previous update unit are:

CLOSE FILE
ALLOCATE
FREE
DUMP
DUMPG
LOGOUT/LOGOFF
DISCONNECT
EOJ +++

When a BEGIN or MORE command follows another command that does not automatically end its own update unit (for example, DEFINE), BEGIN or MORE ends the current update unit.

SOUL statements that end active update units or transactions

SOUL statements that end the current update unit or transaction are:

COMMIT
COMMIT RELEASE
BACKOUT
END (only when END returns the user to include level 0 or terminal command level)

For information about the commit exits feature, which enables you to set up SOUL code to run at commit time, see Application Subsystem development.

Other ways to end active update units

Other situations that end the current update unit are:

  • Return to terminal command level
  • End of an application subsystem procedure, when control is transferred through the communications global variable, unless the Auto Commit option was turned off in the subsystem definition
  • Request cancellation
  • User restart

Starting non-backoutable update units

Non-backoutable update units in User Language start with the first FIND statement or the first User Language statement that performs an update operation on a file that is not a transaction back out file.

In a procedure, non-backoutable update units start with the first FIND statement, the first User Language statement that performs an update operation on a non-transaction back out file, or the first updating command. Each line of input to a procedure definition is treated as a separate update unit, but only one update ID is assigned to a single procedure definition.

Update units for the EDIT command start and end when the edited procedure is copied to the Model 204 file during the END, GO, or SAVE subcommand.

Boundaries of transactions

A transaction (backoutable update unit) begins when the first SOUL statement performing an update operation on a transaction back out file in a User Language request or a procedure is executed.

SOUL statements that perform update operations

The SOUL statements that perform update operations are:

STORE RECORD
DELETE RECORD
DELETE RECORDS
FILE RECORDS
ADD fieldname = value
DELETE fieldname = value
DELETE EACH fieldname
INSERT fieldname = value
CHANGE fieldname TO value

These are the only update operations that can occur inside transactions or can be backed out. A non-backoutable update unit must end before a transaction begins.

Commands that end active transactions

Commands that end an active transaction are:

Any file updating command
CHECKPOINT
CLOSE FILE
RESET
ALLOCATE
FREE
LOGOUT/LOGOFF
DISCONNECT
DUMP
DUMPG

Other events that can end active transactions

Other events that end a transaction are:

  • Return to terminal command level
  • End of an application subsystem procedure, when control is transferred through the communications global variable, unless the Auto Commit option was turned off in the subsystem definition
  • Request cancellation
  • User restart

In User Language, ending a transaction does not start a new transaction. A new transaction does not start until a User Language statement performs an update operation on a transaction back out file.

Host Language Interface update units

For details about update units in the Host Language Interface environment, see HLI: Transactions.