File design

From m204wiki
Revision as of 01:18, 1 February 2014 by JAL (talk | contribs)
Jump to navigation Jump to search

Model 204 provides many features that allow you, as the file designer, a great deal of flexibility in creating files that optimize the efficiency of Model 204 at your site.

This topic covers the options available to you when designing files, along with the associated performance issues. Understanding these options, and understanding the database and application needs of your site, helps you select the correct options for file design, and the associated data structures described in:

Record design
Field group design
Field design

As the file designer, your responsibility is to make informed decisions concerning these issues based on knowledge of the data and required results. In so doing, you can have a dramatic effect on the efficiency with which Model 204 runs.

Design and planning overview

When creating a Model 204 file, you can:

  • Use the default file settings. (See CREATE command for information about its default parameter values.) Model 204 has been designed so that the default options meet the needs of most users; however it is useful to be aware of what you can customize.
  • Customize the file for your environment with settings for organization, recovery and security.

If you customize your file, consider the following questions, discussed in this topic:

  • Which file organization type provides the most efficient data retrieval for my needs?
  • What file recovery options are needed for this file?
  • What level of security does this file need?
  • What file sizing options do I need?
  • How do I want to set the non-resettable parameters?

Choosing a file organization

During the design process, once you determine the requirement for a new file you can look at the file design. The FILEORG parameter lets you choose a file organization.

This parameter controls two things:

  • The physical organization of the files Table B
  • A variety of options on how record numbers are managed, and the physical data storage.

File types

Entry order is the default. See the Model 204 Parameter and Command Reference, "FILEORG: File organization" for more information.

The file organization you choose determines the organization of the file's Table B. The following types of file organization are available:

  • Entry order
  • Unordered
  • Sorted
  • Hash key

Entry order, unordered, sorted, and hash key files can be used in any combination to make up a database. Any Model 204 file can be cross-referenced to any other Model 204 file regardless of the type of organization.

Criteria in choosing a file organization

Choose Table B organization based upon these factors:

  • Frequency of retrievals
  • Ordering of reports
  • Volatility or stability of the data
  • Available disk space

Entry Order Files

In ordinary Model 204 files, each new record is simply stored in the available space after any other records in Table B. When a set of records is retrieved from such files, the records are processed in the order in which they were stored. This order is chronological. These files are known as entry order files.

Entry order files are the most widely used Model 204 files. They provide full inverted file capabilities but do not support the unordered file, sort file, and hash key file facilities described later in this section. If sorted output is desired, you can include the SORT statement in a User Language request, use the IFSORT call in a Host Language Interface program, or specify the ORDERED attribute.

In an entry order file, new records are added only after other records in the file. A fixed number of pages is assigned to Table B of the file by the CREATE command. You can increase the number of pages any time, subject only to the limits of the total disk space available (or 16M pages). You can decrease them at any time to the value of the BHIGHPG parameter plus one. When an attempt is made to store another record in a file that has insufficient space available at the end of the file, the transaction is backed out if possible, and an error message is issued; the file might also be marked full. This occurs even though numerous records in the file might have been deleted.

In an entry order file, space made available by deleting records can be reused for the expansion of existing Table B records, but cannot be reused for adding new records to the file. These records must be added in whatever space is available after all the other records in the file.

Unordered Files

The unordered file organization makes most deleted Table B space available for reuse for new records. A queue of pages formed from deleted Table B space is maintained and called the reuse queue. Most of the available space on these pages is reused before the file is considered full.

Assuming random deletions, the longer a page has been on the reuse queue, the more likely it is that additional space has become available on it. When a record is to be added, Model 204 first tries the current Table B appends page. The appends page is the page to which records are currently being added; its page number is equal to the value of the BHIGHPG parameter. Page space is evaluated in the following sequence:

  1. If the appends page has insufficient space, the oldest page from the head of the reuse queue is examined.
  2. If that page is full, Model 204 removes it from the reuse queue and tries the next page.
  3. If the reuse queue is empty, Model 204 starts a new page (BHIGHPG+1).
  4. If no more pages are available, Model 204 looks for available space on as many as sixteen randomly selected already-used pages
  5. If this fails the file is considered full.

Reusing record numbers

In order to use space on any page for a new record, a record number must be available on that page. To be useful, an unordered file must either be designed with many extra record numbers per page, or must use the reuse record number (RRN) option. See Reuse record number (RRN) files.

Planning open space

For unordered files the BREUSE parameter specifies how much space a page must have available before being placed on the reuse queue. BREUSE is expressed as a percentage of the space available over and above BRESERVE.

The BRESERVE parameter specifies the size of an area at the end of each Table B page in which new records cannot be started. This area is reserved for the expansion of existing records on the page. When deletions occur, if there are enough additional bytes of free space available beyond BRESERVE to provide the free space required for reuse (parameter BREUSE), and at least one record number is available, the page is added to the reuse queue.

You can set BREUSE for files with any file organization, but the reuse queue is built for unordered files only. Files, other than unordered, may have BREUSE set but no reuse queue present.

Maintaining the reuse queue

The reuse queue is a single queue of pages chained together by a 4-byte pointer at the bottom of each page. The total number of pages on the queue, along with pointers to the top and bottom of the queue, are maintained on the FPL page.

Searching for pages with reusable record numbers begins at the top of the queue. As pages qualify for the reuse queue due to deleted records, those pages are added to the bottom of the queue. Starting a search at the top of the queue, where the pages are the oldest, allows additional time for constrained space to be released and helps guarantee that a record can be reused on that page.

A page is added to the reuse queue if free space on the page is larger than:

(PAGE SIZE - 8 - BRESERVE) * BREUSE/100

When a page is removed from the queue, indicating that it no longer contains reusable record numbers, the reuse queue chain pointer on that page is set to zero and the pointer on the previous page in the queue is adjusted accordingly.

Understanding eligible and ineligible reuse queue pages

A page is added to the reuse queue when deletions occur on a Table B page and the page has BREUSE space available over and above BRESERVE, which is the space reserved only for expanding existing records. New records cannot be started in BRESERVE space. Also, there must be at least one available record number on the page. A page that meets this criteria as an eligible page.

Once an eligible page is on the reuse queue it remains there even if it becomes ineligible, until one of the following events occurs:

  • During updating, if a page is selected from the reuse queue that is ineligible, it is removed from the queue.
  • You issue a BLDREUSE command. For complete syntax, see the BLDREUSE command in the Model 204 Parameter and Command Reference.
  • The file is reorganized.

Generating ineligible reuse queue pages

Adjustment to file parameters BREUSE and/or BRESERVE does not immediately remove a page from the reuse queue although the new values may make the page ineligible.

If the page is eligible with room to add the entire new record (physical base or extension record) to the page, the addition is made and the page remains on the reuse queue, even if it is now ineligible.

Sorted files

If most processing or output is required in order by a certain key, consider a sorted file organization. Sorted files store and can retrieve records in order by a sort key field. Thus, when processing records in order by the sort key, a large amount of Online sorting can be eliminated. Because a sort key can be alphabetic, it also provides a convenient method of doing alphabetical ranging on a single key without doing a direct search on the data.

The sorted organization in Table B is similar to ISAM (Indexed Sequential Access Method), where there are master areas and overflow areas. Because of this, use a sorted organization only when at least half of the total records can be loaded in sorted order and the data is not volatile.

See Sorted files for more information.

Hash Key Files

If all records contain a unique or fairly unique key and if most retrievals are done on the basis of that key alone, consider a hash key file organization. In a hash key file, Model 204 hashes directly to the Table B record, which can drastically reduce disk I/O. However, because Table B is ordered randomly in a hash key file, Table B must be made large enough to allow for the expected growth of the file; it cannot be expanded without reloading the entire file.


Fine Tuning a File Type

In addition to the basic structure of the file's Table B, there are four other settings which can change the way in which Model 204 stores and processed records.

  • x'04' - Reuse Record Numbers
  • x'80' - Optimized Field Extraction Files
  • x'100' - Enhanced Data Handling Files
  • x'200' - Large files (up to 48 million records

x'04' - Reuse Record Numbers

The reuse record number (RRN) feature allows Model 204 to reuse the record number of a deleted record for a new record, if a record number is available on the Table B page to which the new record is added. Consider the following points during the design of applications that allow record number reuse:

  • Record number order might not correspond to chronological order. If record A is added to a file, and record B is added later, record A might have a record number higher than that of record B.
  • RRN is optional for unordered, sorted, and hash key files. It is activated by turning on the X'04' bit of the FILEORG parameter. Setting the RRN bit without setting the sorted or hash bits automatically sets the unordered file bit of FILEORG.

The Reuse Record Number option may be used with unordered, sort, or hash files.

Using INVISIBLE fields in RRN files

Special care must be taken in the use of INVISIBLE fields within files that reuse record numbers. Because INVISIBLE fields reside only in the index portion of a file, they are not automatically deleted from Model 204 files when the records with which they are associated have been deleted.

To avoid the possibility of new records inheriting the INVISIBLE fields of records previously deleted, it is necessary to delete explicitly a record's INVISIBLE fields at the time that the record itself is deleted. Use the User Language statement DELETE field name = value or the Host Language Interface IFDVAL function to perform this task.

Deleting records properly in RRN files

The User Language statement DELETE ALL RECORDS IN and the Host Language Interface IFDSET function make record sets inaccessible (that is, they logically delete the records), but neither physically remove the records from the file nor release the related record numbers. Therefore, neither the space that contains the deleted records nor the record numbers can be reused. Only record numbers freed by the DELETE RECORD statement or the IFDREC function can be reused.

Committing deleted records

When deleting records in a found set, you must use COMMIT RELEASE after deleting the records to remove records from the found set and to make the record numbers available to be reused. The COMMIT RELEASE (rather than a simple COMMIT) is required to prevent future record-locking conflicts between the deleted records and any new records being stored.

Using the RRN option in hash key files

The efficiency of the RRN option with hash key files is that there is no reuse queue, because one is not built, and therefore not maintained, for hash key files. In a hash key file, when Model 204 hashes to a page and there is an available record number, the record is stored there. If the RRN option is defined for the file, Model 204 can reuse record numbers from previously deleted records on the hashed to page.

RRN files and deferred update mode

Model 204 customer support suggests that you do not place RRN files in deferred update mode. Using deferred update mode might cause your site to lose updates without any warning. FLOD jobs are particularly susceptible, because they automatically open files for deferred update and require additional steps to override this behavior. However, any RRN file used in deferred update mode is in danger.

The following scenario demonstrates what can happen in a FLOD job:

Input record Key Action
A 1
  • Delete all existing records with Key = 1. Space used by these records is deleted from Table B. There is sufficient free space on the page, so it is put on the reuse queue.
  • Add record A to file.
many records later....
J 90
  • Delete all existing records with key = 90.
  • Add record J to file. Appends page is full; the records are placed on the page from the reuse queue that originally contained records with key = 1.
K 91
  • Add another record to Table B in a slot deleted by A.
L 92
  • Add another record to Table B in a slot deleted by A.
many records later....
X 1
  • Delete all existing records with Key = 1. Table B records added by J, K, and L are deleted.

When records are deleted with Key = 1, Model 204 immediately updates the Existence Bit Pattern (EBP) in Table D. However, because the file is in deferred update mode, the index is not updated

When input record J is added, it finds space on the Table B page previously occupied by records with Key = 1. So input records J, K, and L are added in the available slots. Again, index updates are deferred.

Later, input record X deletes all records with key = 1. It finds the records that originally had key = 1 (the index has not been updated). It also finds that these records do exist (because the EBP was updated when records J, K, and L were added), so they are deleted. When the file is checked after the FLOD job, you find that input records J, K, and L were never added to the file.

Exceptions: You can use deferred update mode for RRN files if you are doing simple record adds, such as a file reorganization, or if record adds and deletes are performed as separate jobs.

Locating RRN records

Some Model 204 features locate records through explicit use of record numbers, notably the User Language POINT$ retrieval condition and the Host Language Interface IFPOINT function. Avoid dependence upon record numbers wherever possible, because a record number that at one time pointed to a record that was deleted can, at some later time, again point to a valid record, which is not the same record originally stored with that record number.

X'80' - Optimized Field Extraction files

When this option is set, all nonpreallocated (non-OCCURS) fields are preceded by a field-value length byte.

With a length byte on every field, even FLOAT, CODED or BINARY fields, several instructions and one IF test are eliminated from the internal field scan loop. Having a length byte also allows some simple compression of BINARY, CODED, and FLOAT values which offsets, to some extent, the possible increase in space that this feature entails.

If, for example, you have a large number of 'amount' fields defined as BIN, and they are predominantly small numbers, you might not only improve performance but save space by taking this option. Alternately, if you are storing mostly dates, for example, each field value pair will take an additional byte.

x'100' - Enhanced Data Handling Files

Available in Model 204 V7R5.

Enables a number of enhancements to the file structure, including:

  • support for Field groups
  • the definition of up to 32000 fields in a file
  • system maintained Automatic fields and additional Field constraints providing content validation
  • improved space management of fields containing Large Objects
  • If set, the X'80' bit is automatically also set.

    There is one mlimitation of note: at this time FILEORG=X'100' files may not be accessed via FLOD or FILELOAD. The following message is produced if this is attempted:

    M204:2903: FILEORG=X'100' IS NOT SUPPORTED DURING FLOD/FILELOAD

    x'200' - Large File Enhancement

    Available in Model 204 V7R5.

    Allows for a file to hold up to 48 million records.

    File recovery options

    If a file transaction does not complete and the data is not fully updated, do you want the data transactions backed out? How much logging of file transactions do you want done in case recovery is needed?

    When creating each of your files, you can specify whether you want the file to use recovery features such as Transaction Back out (TBO) and logging. These features are enabled by default. In an online environment, it is recommended that you use the defaults so that the file you are creating can participate in recovery if the system crashes.

    File recovery options are determined by the FOPT and FRCVOPT parameters.

    Transaction back out is a file integrity facility that can logically undo the effects of incomplete transactions on file data. You can use checkpoints and TBO to restore file data. In the event of a system crash, Model 204 crash, or hard restart, you can use the RESTART command to roll back to the last valid checkpoint. If roll forward logging to the journal is also active during the Online processing, you can use the roll forward feature to restore as many file changes as possible.

    See File integrity and recovery and Transaction back out for more information on TBO, logging, and other recovery features.

    To use file recovery, you will want to design your files to take best advantage of the recovery functionality and minimize any performance disadvantages of logging. See Transaction back out for more information on file considerations when using the TBO feature.

    File security levels

    There are three options for defining file security, as determined by the OPENCTL parameter:

    • public file (OPENCTL=X'80')
      This option allows anyone to open the file without a password and everyone gets the default privileges depending on the value of PRIVDEF. See the Rocket Model 204 Parameter and Command Reference Manual for details on PRIVDEF.
    • semi-public file (OPENCTL=X'40)
      This option allows anyone to open the file and if they do not provide a password they get privileges associated with PRIVDEF. If they do provide a password, the privileges are determined by the password.
    • private file (OPENCTL=X'00')
      The user must provide a password and privileges are determined from the password.

    You can also specify record security for individual records in a file and supply a record security key. See Security for details on Security and PRIVDEF options.

    Note: Most of the file access will likely be done through Application Subsystems (APSY). The security options for APSY access are set in its definition.

    File sizing options

    Decide on file sizing and whether you want to add pages dynamically to tables as needed. This is done with Auto Increase parameters for tables B, D, and X (BAUTOINC, DAUTOINC, and XAUTOINC parameter) as described in Managing file and table sizes which also describes how this maintenance may be done manually.

    Which tables do you need?

    Data files

    Rules for the sizing of files are presented in File size calculation, but there is, before that, a more basic question: which tables are needed in your file?

    Table A is required for the internal filed directory; Table B for the Base records (at least) of data; and Table D for the list and bit map pages; and the B-tree for any ordered indices.

    Table C can be left to its default (of a single page) if you do not have any Key fields.

    Rocket software recommends that for almost any file, Table X should be turned on. The advantages of using Table X are set forth here.

    If you expect to store data in Large Objects, then you should implement Table E. Because, once enabled (by setting Table E to a size greater than 0), Table E can be enlarged at any time, Rocket Software recommends that you err on the side of caution: set Table E to a single page (FILEORG X'100 files) in any file where there is even the possibility of wanting to store data in LOBs.

    Procedure files

    Because procedure code is stored in Table D this is the only Table that is required for files which will only contain procedures. (Note that Tables A, B and C have minimums, so they will exist in procedure files.)

    Non-resettable parameters

    Of the parameters that can be set by the file manager at file creation and file initialization, some can later be reset. The parameters below cannot be reset.

    Decide how you want to set the following parameters when you create and initialize your file.

    FPARMS set during file creation

    The following parameters can be set by the file manager during file creation and cannot be reset:

    FILEORG
    IVERIFY
    LANGFILE

    Nonresettable FPARMS set during file initialization

    The following parameters are set during file initialization (the INITIALIZE command), if applicable, and cannot be reset:

    HASHKEY
    SORTKEY
    RECSCTY

    For more information on this topic, refer to: FPARMS and TABLES file parameters

    Designing for Scalability

    The next decision that the file designer must make is whether to store all data in one physical file or to separate data into several files. This section discusses the merits of storage in a single physical file, in multiple files, or in a file group.

    Combined logical files

    Because records in a Model 204 file are completely variable, a single physical file can contain more than one logical record type (for example, personnel and payroll records). Each record type can have its own distinct collection of fields. Each record might or might not contain a field such as RECTYPE, which identifies the type of record.

    If records of different types are to be related, each record should contain fields whose values form the relationship.

    Example

    For example, suppose that the payroll and personnel records are related by employee number. You can write the following request to display personnel information for employees who earn between $20,000 and $30,000:

    BEGIN FIND.PAY: FIND ALL RECORDS FOR WHICH RECTYPE = PAYROLL SALARY IS BETWEEN 19999 AND 30001 END FIND PAY.LOOP: FOR EACH RECORD IN FIND.PAY EMPL.NO: NOTE EMPL NO FIND.PRSNL: FIND ALL RECORDS FOR WHICH RECTYPE = PERSONNEL EMPL NO = VALUE IN EMPL.NO END FIND INFO.LOOP: FOR EACH RECORD IN FIND.PERSNL PRINT: PRINT ALL INFORMATION END FOR END FOR END

    The advantages of incorporating several logical files in one physical file include the following:

    • Many requests can be done without explicit cross-referencing, thereby saving disk accesses. For example:

      BEGIN ALL: FIND ALL RECORDS FOR WHICH EMPL NO = 1234 END FIND FOR.LOOP: FOR EACH RECORD IN ALL PRINT: PRINT ALL INFORMATION END FOR END

      The preceding request displays information from both the personnel and payroll records with only one FIND statement.

    • Duplication of data can sometimes be avoided. For example, the payroll and personnel records both contain the field EMPL NO. Instead of having index entries for all values of EMPL NO in two files, one index entry for each value has pointers to all the records where it is found. Also, suppose that STATE, a CODED field, is part of the address in the personnel records and is carried for tax purposes on the payroll records. Maintaining these records in a single file allows coded values to be stored only once in Table A, but enables STATE to be decoded for both record types.
    • When files are small (that is, less than 50,000 records), maintaining different kinds of data in a single file can reduce disk storage overhead.

    Separate logical files

    If related data is organized into separate physical files, you can easily cross-reference the data according to values of fields in each of the files. Some of the advantages to organizing related data into separate physical files are:

    • Retrievals are usually faster if the RECTYPE condition is eliminated.
    • If online storage space is scarce, some files can be left off-line if the data they contain is not required.
    • Heavily updated data can be separated from data that is rarely or never updated, thus simplifying checkpointing and backup procedures.
    • User access to logically separate data can be controlled easily by means of file security (See Security).

    File groups

    The file group feature provides a compromise between a single physical file and many separate files, and incorporates some of the advantages of both. File groups offer a separation of the physical aspects from the logical aspects of operating with complex databases.

    A group is a collection of physically distinct files that appear to the end user as a single logical entity. For example, the file group PEOPLE can be defined to contain the files PERSONEL and PAYROLL. The personnel and payroll files are accessible directly by their own names, and, in addition, the union of all of the data in both files is available under the name PEOPLE. The data are not duplicated; a special table relates the group PEOPLE to its member files.

    File groups can be created either locally, with all files in the group owned by one copy of Model 204 or, with the Parallel Query Option/204 product, your file groups can be spread, or scattered, across multiple copies of Model 204. Scattered file groups are discussed later in this section.

    Groups provide several important benefits:

    • File group facility is ideal for data aging applications. Each member file is one aging unit that can be replaced easily.
    • Data can be combined into a number of different and possibly overlapping categories. For instance, files for each state in the United States can be members of multiple regional groups.
    • With Parallel Query Option/204 (PQO) you can create file groups that include remote files. See the Parallel Query Option/204 User's Guide for more information.
    • File groups can provide an alias facility, allowing users to address a file or collection of files by different names. Because of the alias facility, file groups can provide a simple environment for testing and conversion.
    • You can define a group of files, each with the same fields. After the group is formed, you can add new fields to one of the files, and the group continues to function as before without error messages. Files containing newly defined fields can be grouped with others that do not share the same field definitions. A search for a field does not return an error, if the field for which you are searching is in at least one file in the group. If none of the files contains a specified field, any statement referring to that field is rejected.

    For information about creating a file group, see Managing file groups. For information about storing and using file group definitions (CCAGRP), see the Model 204 System Manager's Guide.