Sorted files

From m204wiki
Jump to navigation Jump to search

Overview

In an ordinary Model 204 file, each new record is stored in the next available space in Table B. When a set of records is retrieved by a SOUL FIND statement (or an HLI call that performs a Find function) and is processed one by one by a FOR statement (or successive IFFTCH or IFGET calls), Model 204 processes the records in the (usually chronological) order in which they are stored.

You might, however, want to process a set of Model 204 records in alphabetic or numeric order depending upon the value of one or more fields of those records. To accomplish this, you can use:

  • SOUL statements (or HLI function calls)
  • Sorted file structure

These methods are described in this topic.

For Parallel Query Option/204 (PQO) sites, you can use the SORT RECORDS and SORT RECORD KEYS statements to sort data in remote files. See the Parallel Query Option/204 User's Guide for details.

Sorting files using SOUL statements

Use the following SOUL statements to sort Model 204 records:

  • FOR EACH RECORD IN ORDER BY fieldname
  • FOR EACH VALUE
  • SORT RECORDS
  • SORT RECORD KEYS

FOR EACH RECORD IN ORDER BY fieldname statement

For ORDERED fields, whose values are stored in order in the Ordered Index, you can use the FOR EACH RECORD IN ORDER BY fieldname statement to print records in a found set in order using any ORDERED field in the file.

FOR EACH VALUE statement

For ORDERED fields, you can use the FOR EACH VALUE statement to very efficiently process the values in ascending or descending order.

SORT RECORDS statement

The SOUL SORT RECORDS statement allows you to temporarily sort a set of records into ascending or descending order by the values of any number of arbitrary fields. For example, you can print three separate reports from one set of Model 204 records: one sorted by last name, another by salary, and a third by department and last name.

The SORT RECORDS statement does have the drawback that the sort is performed each time you run the User Language procedure. Because each sort takes time and system resources, you might find it more efficient to use a sort key, ordered index, or an external sort package to sort large amounts of data.

For the Host Language Interface, user programs can invoke the Model 204 internal sort to sort data passed to them by a call that performs a Find function by calling IFSORT.

SORT RECORD KEYS statement

You can use a variation of the SORT RECORDS statement, SORT RECORD KEYS. In cases where the quantity of data to be sorted is very large, or where the CCATEMP space available is relatively small, it might be advantageous to limit the data sorted to the sort key and the record number. SORT RECORD KEYS accomplishes this, by building, in effect, a sorted index to the database for the designated key(s).

The Host Language Interface call, IFSKEY, performs the same function and is documented in the Rocket Model 204 Host Language Interface Reference Manual.

Sorting files using a sorted file structure

Instead of using the ordinary Model 204 file structure, you can use the sorted file structure. In a sorted file, you can designate one (and only one) field as the sort key when the file is initialized (for example, Social Security number, last name, or part number). When a record is stored in Table B, it is stored in such a way that any set of records can be processed efficiently in the order of their sort keys. For example, if a file is sorted, you can append the IN ORDER phrase to the FOR EACH RECORD statement. For sorted files, the Host Language Interface IFFTCH and IFGET calls automatically retrieve records in sorted order.

You can also use the SORT statement with sorted files if special reports require ordering on some field other than the designated sort key.

Sorted file structure characteristics

Files that have sorted structure have the following characteristics:

  • The order of processing is the standard ascending, left-aligned, EBCDIC character set collating sequence (blanks first, special characters next, letters next, and numbers last). If the sort key is a number, remember that "10" comes before "9" in this collating sequence. You might need to retain leading zeroes to achieve the desired order, such that "09" comes before "10".
  • At least half of the records in the file must be loaded in sort key sequence to allow effective use of the sorted storage organization and to avoid excessive overhead during ordered retrieval operations.

The remainder of this page describes sort keys, structure, parameters, and how to create and load sorted files.

Sort key characteristics

The following statements summarize the characteristics of keys in a sorted file and ways in which these keys can be specified:

  • Any given file can have only one sort key, which cannot be changed without reloading the entire file.
  • Record can have only one value for the sort key field.
  • Several different records can have the same value for their sort key fields.
  • Value of a record's sort key cannot be changed in any way. You can, however, delete the entire record and store it again with a different sort key.
  • Special forms of the SOUL STORE RECORD statement and the Host Language Interface IFBREC function allow a sort key to be specified for a new record. See the Rocket Model 204 Host Language Interface Reference Manual for IFBREC details.

Like ordinary Model 204 files, the logical records of a sorted file are stored on the pages of Table B. In a sorted file, the pages are combined into sort groups, each of which contains the records whose sort keys lie within a particular small range of values.

Sort groups

A sort group consists of a fixed number of pages of master area in which the records are stored in strict order, and a fixed number of pages of overflow area that contains records of the sort group that could not be inserted into the master area in order. The file generally contains many sort groups. In addition, a number of extra overflow areas are reserved at the beginning of Table B.

Initial record loading

When a file is initially loaded, the input data must be presorted into the correct order. During this initial load, the records are stored sequentially on master area pages.

For example, in a file in which each master area has two pages, each overflow area has one page, and there are two extra overflow areas, Table B looks like the following figure after the initial load:

Table B after initial load

This initial load has determined that the first sort group (pages 2 through 4) is to contain all records that have sort keys from ABBEY to AHERN.

After the initial load, Model 204 attempts to store a new record first in the appropriate master area. If it cannot be inserted in correct order, it is stored instead on a page of that sort group's overflow area. If all the pages of this preferred overflow area are full, the record is said to "spill" and Model 204 attempts to store it in the previous overflow or extra overflow area.

Parameters for sorted files

The following subsections describe parameters that are relevant to sorted files.

FILEORG parameter

FILEORG controls various sorted file options. It normally defaults to 0, indicating an ordinary file. The following options are available for sorted files. Hash key file options are summarized in Hash key files.

Option Meaning
X'04' Reuse record numbers (RRN). Record numbers of deleted records are reused for new records added, if the record number is available on the Table B page to which the new record is being added.
X'02' Sort key required. Every record in the file must have a value for the sort key, or a compilation error results.
X'01' Sorted file. Records that do not have sort keys are stored in the next available master area space in Table B.

Set FILEORG to the sum of the desired options.

BPGPMSTR and BPGPOVFL parameters

BPGPMSTR and BPGPOVFL establish the number of pages per master area in each sort group and per overflow area in each sort group, respectively. These parameters' values depend upon how large a percentage of the file is loaded in sorted order. Recommended values are listed in BPGPMSTR and BPGPOVFL parameters.

Recommended master and overflow area pages
Percentage (%)
of file to be
loaded in
sorted order
BPGPMSTR area BPGPOVFL area
50 1 1
51-60 3 2
61-67 2 1
68-72 5 2
73-75 3 1
76-80 4 1
81-83 5 1
84-86 6 1
87-89 9 1
90-93 14 1
94-97 30 1
98-100 50 1

BEXTOVFL parameter

The BEXTOVFL parameter establishes the number of extra overflow areas. BEXTOVFL defaults to 0. Each extra overflow area contains the same number of pages as a normal overflow area.

SORTKEY parameter

The VIEW SORTKEY command displays the sort key field of the file, if the file is a sorted file and the user has a sufficient security level to view the field name. Model 204 sets the parameter at file initialization according to the sort key specified for the file.

Creating a sorted file

A sorted file is created in the normal way, as described in Creating a file. At this time, set the five parameters discussed in Parameters for sorted files.

Initializing sorted files

A special form of the INITIALIZE command is provided for sorted files. This form enables you to establish the field name of the file's sort key.

This special INITIALIZE command must be used each time the file is initialized.

Defining a field name with a sort key

Unlike ordinary field names, the field name of the sort key is not defined with a DEFINE command but with the INITIALIZE command.

Specify the desired field description for the sort key in the INITIALIZE command.

Rules for defining sort key fields

Sort key fields cannot have the CODED, INVISIBLE, FLOAT, or BINARY attributes and cannot have an UPDATE option specified for them. The sort key is not required to be KEY. Initializing files provides a full discussion of the INITIALIZE command and its use with sort key fields.

The following sequence creates a sorted PEOPLE file with LAST NAME as the sort key:

CREATE FILE PEOPLE PARAMETER ASTRPPG=200,ATRPG=2,FVFPG=3 PARAMETER MVFPG=8,BRECPPG=45,BSIZE=1200 PARAMETER BRESERVE=135,FILEORG=5,BPGPMSTR=3 PARAMETER BPGPOVFL=1,BEXTOVFL=2 PARAMETER CSIZE=220,DSIZE=920 END OPEN PEOPLE INITIALIZE SORT LAST NAME (KEY)

Loading a sorted file

The efficiency of processing records in order in a sorted file depends on the percentage of the records that can be stored in sorted order. At least 50 percent of the records are required for good performance. If the percentage is higher, the efficiency of in-order processing more closely approaches the efficiency of ordinary processing.

Initial loads normally are done with the File Load utility. The input data must first be sorted by sort key value, usually with an IBM sort utility.

File Load utility restrictions

The File Load utility handles sorted files in the same way that it handles ordinary files, with the following restrictions:

  • If the record being loaded has a sort key value, the sort key must be the first field loaded. That is, the read-and-load-a-field statement that loads a sort key must have the X'8000' mode bit specified. For example:

    LAST NAME=1,15,X'8000'

  • If the record being loaded does not have a sort key value, the read-and-load-a-field statement that loads the first field in the record must specify the X'2000' mode bit, indicating no sort key, as well as X'8000'. For example:

    ADDR=16,30,X'A000'