Hash key files: Difference between revisions
mNo edit summary |
(Automatically generated page update) |
||
Line 350: | Line 350: | ||
<p> </p> | <p> </p> | ||
[[Category:File manager]] | [[Category:File manager]] | ||
[[Category:File | [[Category:File management]] | ||
[[Category:File Architecture and Management]] | [[Category:File Architecture and Management]] |
Revision as of 14:12, 18 April 2013
Overview
Some Model 204 applications require each record to contain a unique or nearly unique identifier field, such as a serial number or a Social Security number. These applications often retrieve and process records one at a time according to this field.
Model 204 typically requires two disk transfers to accomplish this: one to the index and one to the record in Table B. By making the file a hash key file, the disk read of the index is eliminated and the whole operation is quicker and more efficient. If the retrieval is for more than a single hash key value, or if the records are not processed immediately, however, a hash key file might not provide any savings.
Hash key field
In a hash key file, one and only one field is designated the hash key when the file is initialized. When a record is stored in a hash key file, it is stored on an apparently random page of Table B; the page number actually depends upon the value of the record's hash key field. This hash key field is similar to the sort key of sorted Model 204 files and often appears syntactically where a sort field would appear (see Sorted Files for a discussion of sorted files). A file can be either a sorted file or a hash key file, but not both.
This chapter describes the hash keys, parameters, creation, and loading of hashed files.
Characteristics of hash key files
Hash keys have the following characteristics:
- Any one record can have only one value for the hash key field.
- Several different records can have the same value for their hash key fields. These records are stored near each other in Table B.
- Value of a record's hash key field cannot be changed in any way. You can, however, delete the entire record and store it again with a different hash key value.
- Special forms of the User Language STORE RECORD statement and the Host Language IFSTOR and IFBREC functions allow a hash key to be specified for a new record. See the Rocket Model 204 User Language Manual and the Rocket Model 204 Host Language Interface Reference Manual for details.
Hash key file parameters
The following parameters are relevant to hash key files.
BSIZE parameter
BSIZE determines the number of pages to be assigned to Table B. Its computation is described in File Size Calculation .
BSIZE must be computed particularly carefully, because the size of a hash key file's Table B cannot be changed with an INCREASE command. If Table B fills up, the entire file must be reloaded.
FILEORG parameter
FILEORG controls various hash key file options. It normally defaults to 0, indicating an ordinary file. Sorted file options are summarized in Sorted Files. The available options for hash key files are:
Option | Meaning |
---|---|
X'08' |
Hash key file. User designates the field that becomes the hash key. Model 204 generates a hash key for each record supplied without one. The generated key determines where to place the record but is not stored with the record. |
X'04' |
Reuse record numbers (RRN). Record numbers of deleted records are reused for new records added, if available on the Table B page to which the new record is being added. |
X'02' |
Hash key required. Every record in the file must have a value for the hash key, or a compilation error results. |
Set FILEORG to the sum of the desired options. For a complete description of the FILEORG parameter, see the Model 204 Parameter and Command Reference.
HASHKEY parameter
The VIEW HASHKEY command displays the hash key field of the file, if the file is a hashed file and the user has a sufficient security level to view the field name. This parameter is set by Model 204 at the time of file initialization according to the hash key specified for the file.
Creating a hash key file
A hash key file is created in the normal way, as described in the file creation chapter. Set the FILEORG parameter at this time.
Initializing hash key files
A special form of the INITIALIZE command is provided for hash key files. This form enables you to establish the field name of the file's hash key. This special INITIALIZE command must be used each time the file is initialized.
Defining hash key fields
Unlike ordinary field names, the field name of the hash key is not defined with a DEFINE command. Specify the desired field description for the hash key in the INITIALIZE command. Hash key fields cannot have the CODED, INVISIBLE, BINARY, FLOAT, or KEY attributes, and they cannot have the UPDATE option specified for them. The file initialization chapter provides a full description of the INITIALIZE command and its use with hash key fields.
Storing records in a hash key file
Special forms of the User Language STORE RECORD statement and the Host Language IFBREC function allow a hash key to be specified for a new record. They are identical to the special forms used with sorted files. Refer to the Rocket Model 204 User Language Manual and Rocket Model 204 Host Language Interface Reference Manual for details.
File Load utility restrictions
If the records are stored with the File Load utility (see the File Load chapter), the following restrictions apply:
- If the record being loaded has a hash key value, the hash key must be the first field loaded. That is, the read-and-load-a-field statement that loads the hash key must have the X'8000' mode bit specified. For example:
SOC.SEC.NO=1,9,X'8000'
- If the record being loaded does not have a hash key value, the read-and-load-a-field statement that loads the first field in the record must specify the X'2000' mode bit, indicating no hash key, as well as X'8000'. For example:
NAME=10,15,X'A000'
The following discussion describes a Model 204 utility, M204HASH, that is designed to expedite the process of storing records in a hash key file with the File Load utility.
M204HASH utility
The M204HASH utility optimizes the performance of loading Table B during the file load program step of the File Load utility for Model 204 hash key files. The utility accomplishes this by sorting the file load program input records by the Table B page to which they will hash during the program. When the output file from M204HASH is used as the TAPEI file for the file load program, records are added to the Model 204 file one page at a time. The file is thereby loaded in one pass of Table B.
The utility is composed of a user exit to the standard IBM Sort/Merge or any compatible sort package such as SYNCSORT. It can be used on the z/OS, z/VM, and z/VSE operating systems, and supports fixed and variable record format files.
M204HASH and Table B page numbers
To determine the Table B page number for a given record, the hash key input field is hashed and divided by the number of pages in Table B (BSIZE). The resulting Table B page number is a 3-byte binary field, which is appended to the front of the input record before sort processing.
Removing the Table B page number
This 3-byte field is used as the sort key, which, if file load statements that correspond to the input record format are to be used for loading records into the file from the output data set, must be removed before the record is written to the output data set.
For z/OS
In the z/OS environment, an additional user exit is provided for removing the 3-byte page number from each record. The additional user exit is necessary, because the IBM z/OS Sort/Merge program does not recognize any control statement for reformatting records before output.
For z/VM and z/VSE
In the z/VM and z/VSE environments, a sort program control statement can be used to remove the 3-byte page number before the records are written to the output data set.
M204HASH utility input requirements
Input record requirements
Every input record to M204HASH must contain at least the hash key for the Model 204 record to which the data on that sequential record belongs. The output of the standard PAI FLOD method of reorganizing a file does not meet this requirement and is unsuitable for use with M204HASH.
Using multiple sequential records
The input to M204HASH typically is a sequential file of records, each record containing all data for a single Model 204 record. If any Model 204 record contains more data than can fit on a single 256-byte record (256 being the maximum record length allowed for sequential input), multiple sequential records, each containing the hash key, are required for that Model 204 record. In this case, modify the SORT FIELDS sort control statement (described in SORT statement) to include the position of the hash key as well as the position of the three-byte page number, to keep in order all sequential records that contain data for each single Model 204 record.
Note
The position of the hash key for the SORT FIELDS statement appears three bytes later on the record than it appears on input to M204HASH because of the three-byte page number appended to the front of the record.
$HSH function
The User Language $HSH function allows you to explicitly call the M204HASH utility. For information about using $HSH, see the Rocket Model 204 User Language Manual.
Using M204HASH in the z/OS environment
The object library distributed with the z/OS installation contains the object modules HA15OS and HA35OS. Link-edit HA15OS and place it in a load library.
The examples in MODS statement and in Using M204HASH in the z/VM environment assume that the link-edit module has been given the name HASH15. If you use the IBM z/OS Sort/Merge and you remove the 3-byte page number from the front of the output records, you must link-edit the HA35OS object module and place it in a load library. The following examples assume that this link-edit module has been given the name HASH35.
Required DD statements
Follow the instructions in the sort package documentation for setting up the sort job. Five additional DD statements are required:
Statement | Defines... |
---|---|
CCAIN | Control statement input data set containing 80-byte records. The control statements are described in CCAIN control statements under z/OS. |
CCAPRINT | Error message output data set, which contains 80-byte records. |
UNSORTED | Data set that contains the records to be sorted. Any SORTIN DD statement present is ignored by the M204HASH utility. |
SORTED | Data set into which the sorted records are placed as output. The M204HASH utility overwrites the record format, logical record length, and block size of the SORTED data set with corresponding values from the UNSORTED file. |
CCAEXITS | Library that contains the link-edited exit modules HASH15 and HASH35. |
CCAIN control statements under z/OS
The CCAIN data set must contain two control statements, the BSIZE and HASH KEY statements. It can contain one additional statement, the MODE statement. The statements need not begin in the first column, and they can appear in any order. Comment lines beginning with an asterisk are allowed. Descriptions of these statements follow.
BSIZE statement
The syntax for the BSIZE statement is:
Syntax
BSIZE=number
where:
number is the actual number of Table B pages of the file into which the records are loaded. This number must be in decimal format.
HASH KEY statement
The syntax for the HASH KEY statement is:
Syntax
HASH KEY=position,length
where:
position is a decimal number that defines the position of the hash key field on the input record
length is its length.
HASH KEY statement with fixed length records
For fixed length records, these values must be the same as the position and length values that appear in the file load statement for loading the hash key field (the statement on which the mode bits for starting a new record appear).
HASH KEY statement with variable length records
For variable length records, do not consider the 4-byte Record Descriptor Word as part of the record. That is, if the hash key field begins on the first data byte of the record, code 1 for position, not 5, as in the file load statement.
MODE statement
The syntax for the MODE statement is:
Syntax
MODE=X'hexadecimal_digits'
where:
hexadecimal_digits within the quotes are the same as that of the mode bits in the file load statement, as described in "File load statements: mode bits."
Because the hash field is always the first field of each record loaded into a hash file, the first digit of the mode bits in the file load statement is always eight. The eight means start a new record.
File load mode bits
The mode bits indicate whether or not to strip leading and trailing blanks and/or leading zeroes, and whether to store an all-zero field as 0 or not to store it at all. Valid values for the mode bits in hash key files are listed in Mode bits for hash key files.
Mode bits for hash key files
The following mode bits can be used with hash key files. For a complete description of the mode bits, see "File load statements: mode bits."
Mode Bit | Meaning |
---|---|
X'8000' | Begin a new Model 204 record. |
X'2000' | Sort or hash key omitted (specified with X'8000') |
X'0800' | Suppress deletion of blanks |
X'0200' | Load all-zero fields as '0' |
X'0100' | Strip leading zeroes |
Mode bits can be summed.
SYSIN control statements under z/OS
The sort input control file SYSIN requires three control statements:
- RECORD statement
- SORT statement
- MODS statement
These statements are described in the following sections.
RECORD statement
Two parameters of the RECORD statement are required:
Parameter | Set... |
---|---|
TYPE | To either F or V, depending on whether the input file is in fixed or variable record format. |
LENGTH |
At least the first three values (the fourth through seventh are optional):
For variable record format, add 4 bytes to L1, L2, L3, and L4 to include the 4-byte Record Descriptor Word. |
SORT statement
For fixed record format files
For fixed record format the syntax is:
Syntax
SORT FIELDS=(1,3,CH,A)
For variable record format files
For variable record format the syntax is:
Syntax
SORT FIELDS=(5,3,CH,A)
MODS statement
Using MODS to remove the page number
To remove the 3-byte page number from the front of the output records the syntax is:
Syntax
MODS E15=(HASH15,4000,CCAEXITS,N),E35= (HASH35,900,CCAEXITS,N)
Using MODS without removing the page number
If you do not need to remove the 3-byte page number, omit the E35 parameter from the MODS statement. The syntax is:
Syntax
MODS E15=(HASH15,4000,CCAEXITS,N)
Output data set
The output data set DD name used by the HASH35 exit is SORTED. If the E35 parameter is omitted, the output data set DD name must be SORTOUT, which is the sort program's standard output data set.
If the 3-byte page number is not removed from the output record, the file load statements used for loading the file must refer to the fourth byte of each record (eighth byte for variable-length data sets) as the beginning of the actual data.
z/OS JCL
The following JCL shows a sample SORT step:
//SORT EXEC PGM=SORT //CCAPRINT DD SYSOUT=A //SYSOUT DD SYSOUT=A //SYSUDUMP DD SYSOUT=A //UNSORTED DD DSN=UNSORTED.DATA,DISP=SHR //SORTED DD DSN=FLOD.INPUT,DISP=SHR //SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,500) //SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,500) //SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,500) //CCAEXITS DD DSN=M204.LINKLIB,DISP=SHR //CCAIN DD * BSIZE=397 HASH KEY=9,4 MODE=X'8B00' /* //SYSIN DD * SORT FIELDS=(1,3,CH,A) RECORD TYPE=F,LENGTH=(80,83,80) MODS E15=(HASH15,4000,CCAEXITS,N),E35=(HASH35,900,CCAEXITS,N) /*
Using M204HASH in the z/VSE environment
Instructions for using M204HASH on z/VSE are similar to those for z/OS, with appropriate Job Control Language changes. Additional SYSIN control statements might also be required to define the data sets being used. In addition, the IBM z/VSE Sort/Merge program allows use of the OUTREC statement in the SYSIN file to remove the 3-byte page number from the front of the output record. This replaces the function of the HASH35 exit used under z/OS.
Note the following about using M204HASH in a z/VSE environment:
- Syntax for the OUTREC statement for fixed record format is:
OUTREC FIELDS=(4,length)
where length is the length of the original input record.
The OUTREC statement syntax for variable record format is:
OUTREC FIELDS=(1,4,8)
- HA15DOS module is contained in the object library in object format. Link-edit HA15DOS and place it in a load library, giving it the name HASH15. The HASH35 exit is not included with the installation software, because its function is performed by use of the OUTREC statement.
- CCAIN control statements must be included as in-stream data in the JCL. CCAPRINT messages appear in the output.
z/VSE JCL
The following JCL shows a sample SORT step:
* $$ JOB JNM=LOAD,CLASS=G // JOB LOAD // DLBL UNSORT,'UNSORTED.VARIABLE.RECORDS',60 // EXTENT SYS023,SYSWK3 // DLBL SORT,'SORTED.VARIABLE.RECORDS',60 // EXTENT SYS023,SYSWK3,,,33409,200 // DLBL HASHCL,'M204.LOAD.LIBRARY' // EXTENT ,SYSWK3 // LIBDEF CL,SEARCH=HASHCL // EXEC SORT,SIZE=100K SORT FIELDS=(5,3,CH,A),FILES=1,WORK=1 RECORD TYPE=V,LENGTH=(28,31) INPFIL BLKSIZE=2400 OUTFIL BLKSIZE=2400 OPTION SORTIN=(023),SORTOUT=(023),FILNM=(SORT,UNSORT) OUTREC FIELDS=(1,4,8) MODS PH1=(HASH15,L4000,E15) /* BSIZE=397 MODE=X'8B00' HASH KEY=9,4 /* /& * $$ EOJ
Using M204HASH in the z/VM environment
The M204HASH utility can be used in the z/VM environment if SYNCSORT CMS is installed.
The procedure for installing M204HASH using SYNCSORT CMS is the same as for the standard IBM z/OS Sort/Merge described earlier in this section with the following exceptions:
- HA15CMS exit appears on the tape in TEXT format. The HASH35 exit is not required when using SYNCSORT CMS and, therefore, does not appear with the distributed software.
- FILEDEF commands replace DD statements. For variable record format, set the file mode number on the FILEDEF command for the SORTED file to 0, 1, 2, or 5 (or allow it to default to 1). You must change the file mode number to 4 before using the file as input to the file load program.
- HA15CMS TEXT module does not need to be link-edited before it is used, nor is a FILEDEF command required for it, because a search for the TEXT module is performed in the order described in the SYNCSORT CMS Programmer's Guide under the description of the MODS control statement.
SYNCSORT z/VM issues
Note the following about SYNCSORT CMS:
- SSORT command initiates SYNCSORT. When using SSORT, specify the E15 option, giving an exit name of HA15CMS and a length of L4000.
- SYNCSORT allows use of the OUTREC control statement in the SYSIN file to remove the 3-byte page number from the record before writing the record to the output file. This replaces the function of the HASH35 exit used in the z/OS environment.
Syntax for the OUTREC statement, for fixed record format, is:
OUTREC FIELDS=(4,length)
where length is the length of the original input record.
The OUTREC statement syntax for variable record format is:
OUTREC FIELDS=(1,4,8)
- Omit the MODS control statement in the SYSIN file, because it is ignored by SYNCSORT.
z/VM EXEC
The following sample EXEC defines the files and initiates SYNCSORT:
&CONTROL OFF FILEDEF UNSORTED DISK UNSORTED DATA A (LRECL 80 BLKSIZE 80) FILEDEF SORTED DISK SORTED DATA A (LRECL 80 BLKSIZE 80) FILEDEF CCAIN DISK CCAIN DATA A FILEDEF CCAPRINT DISK CCAPRINT DATA A (LRECL 80 BLKSIZE 80) SSORT E15 HASH15 L4000 SORTOUT DATA A SYSIN DATA A
If the previous EXEC is used, the CCAIN DATA A file is:
BSIZE=397 MODE=X'8B00' HASH KEY=9,4
The SYSIN DATA A file is:
RECORD TYPE=F,LENGTH=(80,83) SORT FIELDS=(1,3,CH,A) OUTREC FIELDS=(4,80)