Hash key files: Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 348: Line 348:
      
      


[[Category:File management]]
[[Category:Sorted and hash key FILEORG files]]

Revision as of 21:43, 22 January 2014

Overview

Some Model 204 applications require each record to contain a unique or nearly unique identifier field, such as a serial number or a Social Security number. These applications often retrieve and process records one at a time according to this field.

Model 204 typically requires two disk transfers to accomplish this: one to the index and one to the record in Table B. By making the file a hash key file, the disk read of the index is eliminated and the whole operation is quicker and more efficient. If the retrieval is for more than a single hash key value, or if the records are not processed immediately, however, a hash key file might not provide any savings.

Hash key field

In a hash key file, one and only one field is designated the hash key when the file is initialized. When a record is stored in a hash key file, it is stored on an apparently random page of Table B; the page number actually depends upon the value of the record's hash key field. This hash key field is similar to the sort key of sorted Model 204 files and often appears syntactically where a sort field would appear (see Sorted files for a discussion of sorted files). A file can be either a sorted file or a hash key file, but not both.

This chapter describes the hash keys, parameters, creation, and loading of hashed files.

Characteristics of hash key files

Hash keys have the following characteristics:

  • Any one record can have only one value for the hash key field.
  • Several different records can have the same value for their hash key fields. These records are stored near each other in Table B.
  • Value of a record's hash key field cannot be changed in any way. You can, however, delete the entire record and store it again with a different hash key value.
  • Special forms of the User Language STORE RECORD statement and the Host Language IFSTOR and IFBREC functions allow a hash key to be specified for a new record. See the Rocket Model 204 User Language Manual and the Rocket Model 204 Host Language Interface Reference Manual for details.

Hash key file parameters

The following parameters are relevant to hash key files.

BSIZE parameter

BSIZE determines the number of pages to be assigned to Table B. Its computation is described in File size calculation .

BSIZE must be computed particularly carefully, because the size of a hash key file's Table B cannot be changed with an INCREASE command. If Table B fills up, the entire file must be reloaded.

FILEORG parameter

FILEORG controls various hash key file options. It normally defaults to 0, indicating an ordinary file. Sorted file options are summarized in Sorted files. The available options for hash key files are:

Option Meaning
X'08'

Hash key file.

User designates the field that becomes the hash key. Model 204 generates a hash key for each record supplied without one. The generated key determines where to place the record but is not stored with the record.

X'04'

Reuse record numbers (RRN).

Record numbers of deleted records are reused for new records added, if available on the Table B page to which the new record is being added.

X'02'

Hash key required.

Every record in the file must have a value for the hash key, or a compilation error results.

Set FILEORG to the sum of the desired options. For a complete description of the FILEORG parameter, see the Model 204 Parameter and Command Reference.

HASHKEY parameter

The VIEW HASHKEY command displays the hash key field of the file, if the file is a hashed file and the user has a sufficient security level to view the field name. This parameter is set by Model 204 at the time of file initialization according to the hash key specified for the file.

Creating a hash key file

A hash key file is created in the normal way, as described in the file creation chapter. Set the FILEORG parameter at this time.

Initializing hash key files

A special form of the INITIALIZE command is provided for hash key files. This form enables you to establish the field name of the file's hash key. This special INITIALIZE command must be used each time the file is initialized.

Defining hash key fields

Unlike ordinary field names, the field name of the hash key is not defined with a DEFINE command. Specify the desired field description for the hash key in the INITIALIZE command. Hash key fields cannot have the CODED, INVISIBLE, BINARY, FLOAT, or KEY attributes, and they cannot have the UPDATE option specified for them. The file initialization chapter provides a full description of the INITIALIZE command and its use with hash key fields.

Storing records in a hash key file

Special forms of the User Language STORE RECORD statement and the Host Language IFBREC function allow a hash key to be specified for a new record. They are identical to the special forms used with sorted files. Refer to the Rocket Model 204 User Language Manual and Rocket Model 204 Host Language Interface Reference Manual for details.

File Load utility restrictions

If the records are stored with the File Load utility (see the File Load chapter), the following restrictions apply:

  • If the record being loaded has a hash key value, the hash key must be the first field loaded. That is, the read-and-load-a-field statement that loads the hash key must have the X'8000' mode bit specified. For example:

SOC.SEC.NO=1,9,X'8000'

  • If the record being loaded does not have a hash key value, the read-and-load-a-field statement that loads the first field in the record must specify the X'2000' mode bit, indicating no hash key, as well as X'8000'. For example:

NAME=10,15,X'A000'

The following discussion describes a Model 204 utility, M204HASH, that is designed to expedite the process of storing records in a hash key file with the File Load utility.

M204HASH utility

The M204HASH utility optimizes the performance of loading Table B during the file load program step of the File Load utility for Model 204 hash key files. The utility accomplishes this by sorting the file load program input records by the Table B page to which they will hash during the program. When the output file from M204HASH is used as the TAPEI file for the file load program, records are added to the Model 204 file one page at a time. The file is thereby loaded in one pass of Table B.

The utility is composed of a user exit to the standard IBM Sort/Merge or any compatible sort package such as SYNCSORT. It can be used on the z/OS, z/VM, and z/VSE operating systems, and supports fixed and variable record format files.

M204HASH and Table B page numbers

To determine the Table B page number for a given record, the hash key input field is hashed and divided by the number of pages in Table B (BSIZE). The resulting Table B page number is a 3-byte binary field, which is appended to the front of the input record before sort processing.

Removing the Table B page number

This 3-byte field is used as the sort key, which, if file load statements that correspond to the input record format are to be used for loading records into the file from the output data set, must be removed before the record is written to the output data set.

For z/OS

In the z/OS environment, an additional user exit is provided for removing the 3-byte page number from each record. The additional user exit is necessary, because the IBM z/OS Sort/Merge program does not recognize any control statement for reformatting records before output.

For z/VM and z/VSE

In the z/VM and z/VSE environments, a sort program control statement can be used to remove the 3-byte page number before the records are written to the output data set.

M204HASH utility input requirements

Input record requirements

Every input record to M204HASH must contain at least the hash key for the Model 204 record to which the data on that sequential record belongs. The output of the standard PAI FLOD method of reorganizing a file does not meet this requirement and is unsuitable for use with M204HASH.

Using multiple sequential records

The input to M204HASH typically is a sequential file of records, each record containing all data for a single Model 204 record. If any Model 204 record contains more data than can fit on a single 256-byte record (256 being the maximum record length allowed for sequential input), multiple sequential records, each containing the hash key, are required for that Model 204 record. In this case, modify the SORT FIELDS sort control statement (described in SORT statement) to include the position of the hash key as well as the position of the three-byte page number, to keep in order all sequential records that contain data for each single Model 204 record.

Note

The position of the hash key for the SORT FIELDS statement appears three bytes later on the record than it appears on input to M204HASH because of the three-byte page number appended to the front of the record.

$HSH function

The User Language $HSH function allows you to explicitly call the M204HASH utility. For information about using $HSH, see the Rocket Model 204 User Language Manual.

Using M204HASH in the z/OS environment

The object library distributed with the z/OS installation contains the object modules HA15OS and HA35OS. Link-edit HA15OS and place it in a load library.

The examples in MODS statement and in Using M204HASH in the z/VM environment assume that the link-edit module has been given the name HASH15. If you use the IBM z/OS Sort/Merge and you remove the 3-byte page number from the front of the output records, you must link-edit the HA35OS object module and place it in a load library. The following examples assume that this link-edit module has been given the name HASH35.

Required DD statements

Follow the instructions in the sort package documentation for setting up the sort job. Five additional DD statements are required:

Statement Defines...
CCAIN Control statement input data set containing 80-byte records. The control statements are described in CCAIN control statements under z/OS.
CCAPRINT Error message output data set, which contains 80-byte records.
UNSORTED Data set that contains the records to be sorted. Any SORTIN DD statement present is ignored by the M204HASH utility.
SORTED Data set into which the sorted records are placed as output. The M204HASH utility overwrites the record format, logical record length, and block size of the SORTED data set with corresponding values from the UNSORTED file.
CCAEXITS Library that contains the link-edited exit modules HASH15 and HASH35.

CCAIN control statements under z/OS

The CCAIN data set must contain two control statements, the BSIZE and HASH KEY statements. It can contain one additional statement, the MODE statement. The statements need not begin in the first column, and they can appear in any order. Comment lines beginning with an asterisk are allowed. Descriptions of these statements follow.

BSIZE statement

The syntax for the BSIZE statement is:

Syntax

BSIZE=number

where:

number is the actual number of Table B pages of the file into which the records are loaded. This number must be in decimal format.

HASH KEY statement

The syntax for the HASH KEY statement is:

Syntax

HASH KEY=position,length

where:

position is a decimal number that defines the position of the hash key field on the input record

length is its length.

HASH KEY statement with fixed length records

For fixed length records, these values must be the same as the position and length values that appear in the file load statement for loading the hash key field (the statement on which the mode bits for starting a new record appear).

HASH KEY statement with variable length records

For variable length records, do not consider the 4-byte Record Descriptor Word as part of the record. That is, if the hash key field begins on the first data byte of the record, code 1 for position, not 5, as in the file load statement.

MODE statement

The syntax for the MODE statement is:

Syntax

MODE=X'hexadecimal_digits'

where:

hexadecimal_digits within the quotes are the same as that of the mode bits in the file load statement, as described in "File load statements: mode bits."

Because the hash field is always the first field of each record loaded into a hash file, the first digit of the mode bits in the file load statement is always eight. The eight means start a new record.

File load mode bits

The mode bits indicate whether or not to strip leading and trailing blanks and/or leading zeroes, and whether to store an all-zero field as 0 or not to store it at all. Valid values for the mode bits in hash key files are listed in Mode bits for hash key files.

Mode bits for hash key files

The following mode bits can be used with hash key files. For a complete description of the mode bits, see "File load statements: mode bits."

Mode Bit Meaning
X'8000' Begin a new Model 204 record.
X'2000' Sort or hash key omitted (specified with X'8000')
X'0800' Suppress deletion of blanks
X'0200' Load all-zero fields as '0'
X'0100' Strip leading zeroes

Mode bits can be summed.

SYSIN control statements under z/OS

The sort input control file SYSIN requires three control statements:

  • RECORD statement
  • SORT statement
  • MODS statement

These statements are described in the following sections.

RECORD statement

Two parameters of the RECORD statement are required:

Parameter Set...
TYPE To either F or V, depending on whether the input file is in fixed or variable record format.
LENGTH

At least the first three values (the fourth through seventh are optional):

  • Set L1 to the maximum record length in the input file. For fixed length records, this is equivalent to the logical record length (LRECL).
  • Set L2 to the value of L1 plus 3 bytes for the appended Table B page number.
  • Set L3 to the same value as L1, if the Table B page number is removed before the record is written to the output file.
  • If you choose to provide an L4 setting, set it to a value greater than or equal to 3 bytes.

For variable record format, add 4 bytes to L1, L2, L3, and L4 to include the 4-byte Record Descriptor Word.

SORT statement

For fixed record format files

For fixed record format the syntax is:

Syntax

SORT FIELDS=(1,3,CH,A)

For variable record format files

For variable record format the syntax is:

Syntax

SORT FIELDS=(5,3,CH,A)

MODS statement

Using MODS to remove the page number

To remove the 3-byte page number from the front of the output records the syntax is:

Syntax

MODS E15=(HASH15,4000,CCAEXITS,N),E35= (HASH35,900,CCAEXITS,N)

Using MODS without removing the page number

If you do not need to remove the 3-byte page number, omit the E35 parameter from the MODS statement. The syntax is:

Syntax

MODS E15=(HASH15,4000,CCAEXITS,N)

Output data set

The output data set DD name used by the HASH35 exit is SORTED. If the E35 parameter is omitted, the output data set DD name must be SORTOUT, which is the sort program's standard output data set.

If the 3-byte page number is not removed from the output record, the file load statements used for loading the file must refer to the fourth byte of each record (eighth byte for variable-length data sets) as the beginning of the actual data.

z/OS JCL

The following JCL shows a sample SORT step:

//SORT EXEC PGM=SORT //CCAPRINT DD SYSOUT=A //SYSOUT DD SYSOUT=A //SYSUDUMP DD SYSOUT=A //UNSORTED DD DSN=UNSORTED.DATA,DISP=SHR //SORTED DD DSN=FLOD.INPUT,DISP=SHR //SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,500) //SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,500) //SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,500) //CCAEXITS DD DSN=M204.LINKLIB,DISP=SHR //CCAIN DD * BSIZE=397 HASH KEY=9,4 MODE=X'8B00' /* //SYSIN DD * SORT FIELDS=(1,3,CH,A) RECORD TYPE=F,LENGTH=(80,83,80) MODS E15=(HASH15,4000,CCAEXITS,N),E35=(HASH35,900,CCAEXITS,N) /*

Using M204HASH in the z/VSE environment

Instructions for using M204HASH on z/VSE are similar to those for z/OS, with appropriate Job Control Language changes. Additional SYSIN control statements might also be required to define the data sets being used. In addition, the IBM z/VSE Sort/Merge program allows use of the OUTREC statement in the SYSIN file to remove the 3-byte page number from the front of the output record. This replaces the function of the HASH35 exit used under z/OS.

Note the following about using M204HASH in a z/VSE environment:

  • Syntax for the OUTREC statement for fixed record format is:

OUTREC FIELDS=(4,length)

where length is the length of the original input record.

The OUTREC statement syntax for variable record format is:

OUTREC FIELDS=(1,4,8)

  • HA15DOS module is contained in the object library in object format. Link-edit HA15DOS and place it in a load library, giving it the name HASH15. The HASH35 exit is not included with the installation software, because its function is performed by use of the OUTREC statement.
  • CCAIN control statements must be included as in-stream data in the JCL. CCAPRINT messages appear in the output.

z/VSE JCL

The following JCL shows a sample SORT step:

* $$ JOB JNM=LOAD,CLASS=G // JOB LOAD // DLBL UNSORT,'UNSORTED.VARIABLE.RECORDS',60 // EXTENT SYS023,SYSWK3 // DLBL SORT,'SORTED.VARIABLE.RECORDS',60 // EXTENT SYS023,SYSWK3,,,33409,200 // DLBL HASHCL,'M204.LOAD.LIBRARY' // EXTENT ,SYSWK3 // LIBDEF CL,SEARCH=HASHCL // EXEC SORT,SIZE=100K SORT FIELDS=(5,3,CH,A),FILES=1,WORK=1 RECORD TYPE=V,LENGTH=(28,31) INPFIL BLKSIZE=2400 OUTFIL BLKSIZE=2400 OPTION SORTIN=(023),SORTOUT=(023),FILNM=(SORT,UNSORT) OUTREC FIELDS=(1,4,8) MODS PH1=(HASH15,L4000,E15) /* BSIZE=397 MODE=X'8B00' HASH KEY=9,4 /* /& * $$ EOJ

Using M204HASH in the z/VM environment

The M204HASH utility can be used in the z/VM environment if SYNCSORT CMS is installed.

The procedure for installing M204HASH using SYNCSORT CMS is the same as for the standard IBM z/OS Sort/Merge described earlier in this section with the following exceptions:

  • HA15CMS exit appears on the tape in TEXT format. The HASH35 exit is not required when using SYNCSORT CMS and, therefore, does not appear with the distributed software.
  • FILEDEF commands replace DD statements. For variable record format, set the file mode number on the FILEDEF command for the SORTED file to 0, 1, 2, or 5 (or allow it to default to 1). You must change the file mode number to 4 before using the file as input to the file load program.
  • HA15CMS TEXT module does not need to be link-edited before it is used, nor is a FILEDEF command required for it, because a search for the TEXT module is performed in the order described in the SYNCSORT CMS Programmer's Guide under the description of the MODS control statement.

SYNCSORT z/VM issues

Note the following about SYNCSORT CMS:

  • SSORT command initiates SYNCSORT. When using SSORT, specify the E15 option, giving an exit name of HA15CMS and a length of L4000.
  • SYNCSORT allows use of the OUTREC control statement in the SYSIN file to remove the 3-byte page number from the record before writing the record to the output file. This replaces the function of the HASH35 exit used in the z/OS environment.

    Syntax for the OUTREC statement, for fixed record format, is:

OUTREC FIELDS=(4,length)

where length is the length of the original input record.

The OUTREC statement syntax for variable record format is:

OUTREC FIELDS=(1,4,8)

  • Omit the MODS control statement in the SYSIN file, because it is ignored by SYNCSORT.

z/VM EXEC

The following sample EXEC defines the files and initiates SYNCSORT:

 

&CONTROL OFF FILEDEF UNSORTED DISK UNSORTED DATA A (LRECL 80 BLKSIZE 80) FILEDEF SORTED DISK SORTED DATA A (LRECL 80 BLKSIZE 80) FILEDEF CCAIN DISK CCAIN DATA A FILEDEF CCAPRINT DISK CCAPRINT DATA A (LRECL 80 BLKSIZE 80) SSORT E15 HASH15 L4000 SORTOUT DATA A SYSIN DATA A

If the previous EXEC is used, the CCAIN DATA A file is:

BSIZE=397 MODE=X'8B00' HASH KEY=9,4

The SYSIN DATA A file is:

RECORD TYPE=F,LENGTH=(80,83) SORT FIELDS=(1,3,CH,A) OUTREC FIELDS=(4,80)