SirFact post hoc debugging

Model 204 debugging

Software errors are typically referred to as bugs, and removing errors from software is called debugging. A bug is typically a problem with software that results in a display of incorrect data to the end-user or a problem that causes premature termination of a program. There is also a class of bugs that can cause poor software performance. In any case, if uncorrected, most bugs are ultimately noticed by end-users as incorrect data, abnormal termination of a system, or poor performance.

Debugging essentially consists of three steps:

Determining an error has occurred.
Determining the cause of the error.
Correcting the cause of the error.

The last item is essentially a programming task, so it is essentially beyond the scope of debugging tools (and probably any other kinds of tools). Understanding the reason something is an error, the logic flaws that caused the error, and how the logic needs to be corrected require understanding, so they are essentially human activities. It is not likely this process can be significantly improved by automation any time in the near future.

One might also consider managing and distributing the fixes to errors to be part of the debugging process. Whether or not this is accurate, fix management and distribution are different enough in their nature that they are rarely handled by debugging facilities. They are more typically handled by change control facilities, such as SirLib, which is available for Model 204.

Detect errors early

Software logic errors can cause immediately noticed problems, or they can cause errors long after the original logic error. For example, a bug in a piece of code might cause a global to be incorrectly set. This incorrectly-set global might cause another piece of code to store some invalid data in the database. Months later, that data might be loaded by a different piece of code which then might terminate abnormally because of the invalid data, or it might display incorrect data on the end-user's screen. Clearly, in this sort of situation, especially if the incorrect data might have been stored by any one of dozens of procedures, determining the cause of the problem can be very difficult.

The further from the original logic error one catches the error, the more difficult it is to determine the cause of the error. Because of this, one of the goals of debugging is to catch errors as early as possible. SirFact provides the SIRFACT CANCEL command and the Assert and SirFact statements to facilitate catching errors earlier rather than later.

Gather abundant error information

A second goal of debugging is to collect as much information as possible when an error occurs. A classic example of the antithesis of this is the Model 204 message M204.0553 SUBSCRIPT RANGE ERROR. This message causes request cancellation but provides virtually no information as to the cause of the problem. Until recently, this message didn't even include the name of the array to which the invalid subscript applied. Nevertheless, there are still many key pieces of information missing, such as the line of code the error occurred on, the value of the array's subscript, the values of other variables or fields from which the subscript was derived, and so on. SirFact provides the SIRFACT system parameter and the SIRFACT MAXDUMP and SIRFACT DUMP commands to collect as much information about application errors as possible.

A tremendous amount of information about the application at the time of an error is collected in SirFact dumps, under control of these commands. The data in these dumps can then be viewed using SirFact $functions or a ready-to-use application subsystem called FACT.

Ad hoc versus post hoc debugging

Ad hoc debugging

When most people think of debugging tools, they think of tools used by a programmer during the development process or perhaps when a programmer is trying to reproduce in a test environment a problem that occurred in a production environment. These types of tools are typically interactive and allow things like setting of breakpoints, examination and "manual" modification of values of variables, and tracing of code paths. Because the user of such interactive debugging tools must be familiar with the code being debugged, and because an application being debugged by such a tool must run "inside" the debugging environment, these tools are generally only useful when a programmer is running the application.

Because of the interactive nature of these types of debugging tools, they are sometimes referred to as ad hoc debugging tools. The Rocket Model 204 Janus Debugger, TN/3270 Debugger, and SoftSpy are accomplished ad hoc debugging tools available for Model 204.

As desirable as it is to catch all errors during the development process, this is simply not possible:

All but the simplest code has just too many possible combinations of user and external inputs, database values, and environmental variables for all possible combinations to be tested. Often a bug can only be induced by specific combinations of all these variables. Because of this complexity, bugs can and will be detected without a programmer present and so outside the environment of an ad hoc debugger. Even worse, some of these bugs will not be reproducible in a test environment so that a programmer could use an ad hoc debugger to attack the problem.
Very often, a user will not remember the exact sequence of inputs they entered to cause an error, or perhaps the error was caused by a combination of user inputs and environmental problems beyond the view of the user. Often, the combination of factors that caused a bug will not be understood until the actual cause of the bug is understood. Occasionally, with timing related bugs, even understanding all the factors required to cause a bug to happen might not be sufficient to consistently reproduce the problem.

Post hoc debugging

Ad hoc debuggers are basically useless for problems that occur in production or unit-test away from the view of a programmer. Instead, a different debugging tool is needed for these kinds of problems: a post hoc debugging tool. A post hoc debugging tool is useful in solving problems that occur outside of development because:

It contains facilities to trap errors earlier rather than later.
It collects and stores as much information as possible at the time of an error.

The only real well-defeloped post hoc debugging tool available in the Model 204 environment is SirFact.

Even in the later stages of development, a post hoc debugging facility might be preferable to an ad hoc debugger. This is because a programmer might wish to quickly go through many code paths without an intrusive ad hoc debugger in the way. Because they run in production systems, post hoc debugging tools must be extremely unobtrusive. In any case, a programmer might be willing to pay a price in getting slightly less interactive debugging capabilities for the benefit of having the debugger "out of the way" most of the time.