File size calculation in detail
Trying to do a precise file size for a Model 204 file is difficult because:
- The flexibility of Model 204 makes the knowledge of the detail needed unlikely
- During the application design process, it is highly likely that the data structures and field attributes will change
- Model 204 performs so well that there is no advantage to having such precise sizes
Rocket Software recommends a more flexible, ad-hoc approach, as discussed in File sizing introduction.
What follows is detail which is unlikely ever to be done more than once by a file manager. That said, the detail provided is useful and may be referred to to help in the ad-hoc design approach.
The detailed design process
After choosing the fields and field attributes for a file, you need to calculate how much disk space the file requires and then to allocate the space. After being calculated, the values of file parameters are set when the file is created. Before you can calculate the space, you need to know:
- Types of fields in the input data for the file (such as ORDERED or FRV)
- Number of fields that the average record contains
- Number of records you expect to be in file
Use this information to calculate the file parameters, and then use the file parameters to calculate the expected disk space.
This topic contains:
- Detailed instructions to help you calculate the file parameters and disk space
- Information about allocating disk space for your operating system
- Complete space estimation example using the steps shown in the first section of this topic
- Space calculation and file parameter worksheets to help you calculate file sizes for your data.
This topic shows you how to find the total number of Model 204 pages you need for a file, that is, to resolve the following equation:
Number of pages = ASIZE + BSIZE + CSIZE + DSIZE + ESIZE + XSIZE + 8
Note: The Model 204 Dictionary/204 FILEMGMT subsystem facility can automatically calculate file spacing allocations, as described in File sizing overview.
Testing your file design
The detail of the process still necessitates that the final sizing be validated. You should still load a representative sample of your records into a test file (and, for larger files, at least one segment's worth). This allows you to test the accuracy of space calculations and parameter settings before loading the entire file.
Many of the formulas used to calculate parameters contain a constant (for example, 1.1 in the formula for ATRPG) multiplied by an expression. The constants generally allow for inaccuracies in knowledge about data in the file and for file expansion. If you know in advance what values are going to be stored, and that the amount of data in the file will remain static, you can reduce the multipliers (to a minimum value of 1).
Model 204 usable page size constant
The standard Model 204 page size is 6184 bytes. Although Model 204 has accepted other page sizes in previous releases (to accommodate hardware no longer supported by IBM), the 6184-byte size is currently the only valid page size. Therefore, the calculation for usable page size is:
6184 - 40 = 6144
Sizing Table A
Table A is an internal file dictionary in which character strings and their corresponding codes are recorded. Table A contains the following information:
|Attribute||Field names of all fields in the file.|
|FEW-VALUED||Character string values of all fields with the FEW-VALUED field attribute, and either the CODED attribute or the FRV (for-each-value) attribute. Values for fields that have both the CODED and FRV attributes appear only once, as do values used for more than one field.|
|MANY-VALUED||Character string values of all fields that have the MANY-VALUED attribute and either the CODED attribute or the FRV attribute.|
The Table A parameters you need as part of the total Model 204 number of pages are as follows:
|This attribute||Specifies the number of Table A...|
ASIZE, the total size of Table A, is calculated by Model 204.
After it has been allocated, Table A cannot be expanded. However, because Table A is always small in relation to the rest of the file, be generous when allocating space.
Computing ASTRPPG (character strings per Table A page)
Before you can compute the Table A parameters, you need to know ASTRPPG, which is the number of character strings per Table A page. First, estimate the average length (L) of all character strings you will store in Table A. After you compute L, you can compute ASTRPPG.
Computing L (the length of each string)
In computing L, the length of each string must include system overhead. Increase the basic character string lengths using the following rules:
- For each CODED or FRV value, add 3 bytes.
- For each field name, regardless of attributes, add 2 bytes. In addition:
- If the field has any of the following attributes, add 1 more byte: OCCURS, LEVEL, FLOAT, UPDATE IN PLACE, or ORDERED.
- If the field is OCCURS, add 2 more bytes.
- If the field is LEVEL, add 1 more byte.
- If the field is FLOAT, add 1 more byte.
- If the field is ORDERED, add 4 more bytes.
- If the field is UNIQUE, add 1 more byte.
- If the field is NUMERIC RANGE, it requires a number of auxiliary field names. For each NUM RANGE field, add:
((4 + field_name_length) * (# of digits of the longest value + 3)) bytes
|A||Number of field names|
|B||Number of FEW-VALUED FRV or FEW-VALUED CODED values|
|C||Number of MANY-VALUED FRV or MANY-VALUED CODED values|
|D||Maximum number of digits in a NUM RANGE field + 3|
|S||Sum of all D's for all NUMERIC RANGE fields|
|T||Total number of strings: A + B + S + C|
|V||Space needed by FEW-VALUED FRV or FEW-VALUED CODED value and value overhead|
|W||Space needed by MANY-VALUED FRV or MANY-VALUED CODED values and value overhead|
|N||Space needed by field names and names overhead|
L = (V + N + W) / T
Computing ASTRPPG (character strings per Table A page)
After you have estimated the length of the average character string for this file, you can compute ASTRPPG as follows:
ASTRPPG = 6144 / L
The default value of ASTRPPG is 400, which corresponds to an average string length plus overhead of 15 bytes.
Computing ATRPG (the number of attribute pages)
ATRPG specifies the number of pages to be assigned to the attribute section of Table A. Compute ATRPG as follows:
|N||Total amount of space consumed by field names|
|A||Number of field names|
|S||Number of extra NUMERIC RANGE fields (as computed above for ASTRPPG)|
Next, compute the following equations:
ATRPG = 1.1 * (N / 6144 - (ASTRPPG * 2) - 2) ) ATRPG = 1.1 * (A + S) / ASTRPPG
Round up to the nearest integer and use the larger of the two numbers for ATRPG.
ATRPG has a default value of 1 (its minimum value), which allows as many as 400 field names when the default value of ASTRPPG (400) is also used.
The multiplier of 1.1 in the ATRPG formula allows room for adding field names that were not originally part of the file, as well as for redefining field names. When the REDEFINE command is used, one or two bytes can be added to or deleted from a Table A entry, if the LEVEL or UPDATE option is changed. The amount of overhead required for a redefined field is computed according to the rules for the original definition (see ASTRPPG above). When you delete a field definition, all but two bytes are made available for reuse.
If you are sure that field names will not be added to a file, you can use a multiplier closer to 1. The size of the multiplier is important if ATRPG comes out to be just over one page. A one-page attribute section of Table A provides much better performance than a multiple-page section. This performance difference can be seen in the amount of disk I/O required to compile a User Language request or Host Language Interface call that refers to many fields.
Note: The product of ATRPG and ASTRPPG must not exceed 4000.
Computing FVFPG (the number of FEW-VALUED pages)
FVFPG specifies the number of pages to be assigned to the FEW-VALUED section of Table A. The number of FEW-VALUED pages depends upon the total number of distinct values to be taken on by the various FEW-VALUED fields that are either CODED or FRV.
Examine your data to estimate the following:
|V||Total amount of space consumed by FEW-VALUED fields.|
|B||Number of FEW-VALUED values (as computed for ASTRPPG).|
FVFPG = 1.2 * V / (6144 - (ASTRPPG * 2) - 2) FVFPG = 1.2 * B / ASTRPPG
Round up to the nearest integer and use the larger of the two numbers for FVFPG. FVFPG must not exceed 65,535. FVFPG has a default value of 1, which is its minimum value. Even if the file has no FEW-VALUED fields, set FVFPG to 1 to avoid error conditions caused by incorrect or unforeseen field definitions in the future.
Like the attribute section of Table A, the FEW-VALUED section is most effective when it is very small. The value sections of Table A are accessed most heavily by retrieving or updating CODED fields. CODED fields are retrieved as a result of User Language PRINT and arithmetic statements or IFGET calls.
Keeping FVFPG small
If FVFPG is larger than two pages, you might want to reevaluate the choice of FEW-VALUED fields to reduce the number of distinct values. If you cannot reduce the number of distinct values, try to redesign the FEW- and MANY-VALUED sections of Table A so that one of the sections is one page, if possible. Sometimes moving a field from one section to the other can reduce the size of one section to less than a page.
Computing MVFPG (the number of MANY-VALUED pages)
MVFPG specifies the number of pages to be assigned to the MANY-VALUED section of Table A. The number of MANY-VALUED pages depends upon the total number of distinct values to be taken on by the various MANY-VALUED fields that are either CODED or FRV.
Examine your data to estimate the following:
|W||Total amount of space consumed by MANY-VALUED fields|
|C||Number of MANY-VALUED values (as computed for ASTRPPG)|
MVFPG = 1.2 * V / (6144 - (ASTRPPG * 2) - 2) MVFPG = 1.2 * B / ASTRPPG
Round up to the nearest integer and use the larger of the two numbers for MVFPG. MVFPG must not exceed 65,535. MVFPG has a default value of 1, which is its minimum value.
As discussed in the preceding description of FVFPG, Model 204 achieves the best performance when either the FEW-VALUED or MANY-VALUED section of Table A is small. If both MVFPG and FVFPG are larger than two pages, place most of the fields in one of the sections or the other so that either the FEW-VALUED section or the MANY-VALUED section is one page.
ASIZE (Table A size)
ASIZE is calculated by Model 204 and is the sum of the ATRPG, MVFPG, and FVFPG parameters. Because each of these parameters has a default value of 1, the default value of ASIZE is 3.
Sizing Table B
Table B consists of either the full logical records-a base record, plus extension(s) (that contain the values of all VISIBLE fields), or if Table X is enabled, the visible fields in the base record. This section discusses Table B by itself, and the Table X impact is discussed in the next section.
Either way, to size the data are correctly, you need a good idea of what an average record will look like after all of the data has been loaded. More precisely, you need to know, for each record type in the file:
- Number of fields in the average record
- Number of records
When calculating Table B space, remember that some fields can be missing entirely in some records and can occur more than once in others.
To calculate the total disk space you need for a file, you need to know the size of Table B: the BSIZE parameter. To calculate BSIZE, you need:
|R||Average record size|
|BRECPPG||Number of records per Table B page|
Instructions for calculating these parameters are discussed in this section.
Estimating space for hash key files
The method for calculating Table B space is the same for all file organizations. Because Table B cannot be expanded in a hash key file, Table B calculations for hash key files must be based on the total number of records that the file will ultimately contain. The final count of records is less critical for ordinary and sorted Table B organizations. Refer to the pages on sorted and hash key files, Sorted files and Hash key files, respectively, for the settings of the FILEORG, BPGPMSTR, BPGPOVFL, and BEXTOVFL parameters.
Achieving the best performance
Model 204 achieves the fullest use of Table B space when different record types are uniformly distributed on each Table B page. Uniformly distributing record types also increases retrieval speed when related records of different types are processed together.
Storing records on Table B pages
The following conditions must be met before a new record is stored on a Table B page:
- Record number must be available.
- Basic record overhead must be available without using any reserved space. In a sorted or hash key file, the sort or hash key, unless it is preallocated, must also fit without using the reserved space.
- If any fields are preallocated, the space for all such fields must be available on the page. Preallocated fields can extend into reserved space.
Computing R (the average record size)
Before calculating BSIZE, you need to compute R, the Table B space required for the average record, according to these rules:
- Start with five bytes of basic overhead for the record (or eight bytes for overflow records in sorted files).
- Ignore any field that has the INVISIBLE attribute.
- Compute the space needed for non-preallocated fields (fields that do not have an OCCURS clause) as follows:
- For each compressible occurrence of each BINARY field, add six bytes. Leading zeros or nonnumeric characters override the compress option.
- For each occurrence of each CODED field, add six bytes.
- For each occurrence of each NON-CODED field, add three bytes plus the average length of the values of that field.
- For each occurrence of each FLOAT field, add two bytes plus the defined LENGTH for the values of that field.
- Compute the space needed for preallocated fields as follows:
- For each CODED or BINARY field, add (4 * n) bytes, where n is the number of occurrences.
- For each field defined with the LENGTH option (including FLOAT fields), add (m * n) bytes, where m is the length and n is the number of occurrences.
- Add 30 bytes for each occurrence of a non-preallocated BLOB or CLOB field descriptor. If the BLOB or CLOB field is preallocated, add 27 bytes for each occurrence of a BLOB or CLOB field descriptor.
The total number of bytes used by all preallocated fields in one record must be less than the page size and must leave space on the page for the basic record overhead.
Computing BRECPPG (the number of records per Table B page)
BRECPPG specifies the maximum number of logical records that Model 204 will store on one Table B page. Compute BRECPPG as follows:
BRECPPG = 1.1 * (6144 - 4) / R
BRECPPG has a default value of 256, which corresponds to an average record length of 26 bytes.
Calculating BRECPPG accurately is important, because it can affect the way storage is utilized in Tables B, C, and D, which in turn affects efficient Model 204 operation. If you estimate that fewer records fit on a page than actually do fit, you might waste a great deal of storage space (although the resulting unused space per page allows you to add new fields to existing records and, in hash key and unordered files, to create new records).
By estimating that more records fit than actually do fit, performance can be adversely affected in two ways:
- One or more extension records per page might be created. Extension records are described on Computing BRESERVE (reserved Table B space), the other parameter that affects their creation.
- Record numbers might be wasted. Record numbers are assigned sequentially, starting with 0 for the first record on the first page of Table B. Each page has BRECPPG numbers allocated to it. If fewer than BRECPPG records actually fit on the page, the extra record numbers are wasted.
Wasted record numbers do not take space in Table B, but in certain cases they can affect inverted retrieval speeds and the sizes of Tables C and D. Wasted record numbers are a concern if they cause you to increase the size of the file size multiplier, described in Tables C and D indexing structure. For small files (under 50,000 records), wasted record numbers have no effect.
Computing BSIZE (Table B size)
BSIZE specifies the number of pages to be assigned to Table B. Compute BSIZE using the following equation:
BSIZE = 1.2 * Total-Number-of-Records / BRECPPG
Round the result up to an integer. You can change the value of BSIZE (except in a hash key file) with the INCREASE and DECREASE commands.
BSIZE has a default value of 5, which corresponds to 1280 record slots if the BRECPPG default is taken.
BSIZE cannot exceed 16,777,216, nor can the product of BRECPPG and BSIZE exceed 16,777,216, the maximum number of record slots.
Computing BRESERVE (reserved Table B space)
BRESERVE reserves a number of bytes on each Table B page for the expansion of records on that page. Model 204 allows you to add fields to records virtually without limit. Reserved space is used for new fields, if it is available on the page. Otherwise, an extension record is created in the next available space in Table B. Thus, records are infinitely expandable, subject only to Table B space limitations (BSIZE).
For example, suppose that an estimated six records fit on a 6144-byte page and reserved space is 17 bytes. If Model 204 has loaded five records that are each 1200 bytes long, it begins a sixth record on the same page because the amount of space left (144 bytes) is greater than the reserved space. Only the first few fields of the sixth record fit on the page. The extra fields are placed on another page in an extension record, which uses up another record number.
While extension records are transparent to the user, access to the fields in extensions can be much less efficient than access to fields contained in the basic portions of records. To avoid extension records during initial file loading, set BRESERVE to the average record length (R). That is:
BRESERVE = R
If, in the example above, you set reserved space to 1200, only five records are placed on the page. The fifth record begins with 1344 bytes remaining on the page. Fields are added, crossing the reserved space boundary, until the record is complete. The sixth record then begins on a new page, avoiding an extension record.
Sizing BRESERVE to avoid extension records
If all the records in the file are less than about 1000 bytes, set BRESERVE to the average record length. If you set BRESERVE to the maximum record length (and at least one complete record fits on each Table B page), Model 204 does not build extension records unless new fields are added or inserted, or variable-length fields are changed to be longer.
For files in which you initially load skeleton records and add the bulk of the fields later, set BRESERVE to a value much higher than the average record length. You can reset BRESERVE after some or all of the records have been loaded.
Too many extension records can have a serious negative impact on performance. However, for very large records, or for files in which the size of records varies dramatically, you might need to have some extension records and set BRESERVE to a smaller value.
The default value of BRESERVE is 17, which can be changed any time when the file is not being updated by another user.
Sizing Tables B and X
Creating a file with a Table X
A file has Table X allocated when XSIZE greater than zero is designated at file create.
In the following example, when XSIZE is set greater than zero, Table X is established for the VEHICLES file.
CREATE FILE VEHICLES PARAMETER FILEORG=X'24" */Unordered, RRN file organization/* PARAMETER BSIZE=128 PARAMETER BRESERVE=100 */100 free bytes are required to store /* */a new record on page /* PARAMETER BREUSE=30 */when 30% or more page space is free, /* */put page on reuse queue /* PARAMETER XSIZE=600 PARAMETER XRESERVE=800 */800 free bytes are required to store /* */a new record for Table X on page /* PARAMETER XREUSE=15 */when 15% or more page space is free, /* */put page on reuse queue /* END
Considerations for Table X
If you want to add a Table X to a Model 204 file created prior to V7R21.0, you must re-create the file and reload it in V7R1.0 or later.
You can implement Table X for files created in V7R1.0 or later that are unordered or entry order, but Table X is not supported for sort key and hash key files.
When you issue a VIEW TABLES command against a file that does not have a Table X, the Table X parameters are displayed with zero values.
If XAUTOINC is set to a non zero value, Model 204 will automatically increase Table X as needed, when the file is opened by the first user.
Preallocated fields can reside only in Table B records. Model 204 will never store them in Table X.
Model 204 will store non-preallocated fields in Table B records. However, when a given Table B record has no more room for additional non-preallocated fields, those fields will be stored in Table X extension records. The fields stored in Table X records have exactly the same format and therefore space requirements as fields stored in Table B records.
Dividing data between Tables B and X
The requirement is simply to have enough data storage using either Table B alone or Table B and Table X:
- When XSIZE is set to 0, Table B must be sized such that it can contain all visible fields in all records.
- When XSIZE is greater than 0, the total size of Table B and Table X must be such that each visible field in all records will be stored in Table B or Table X.
There are many possible combinations of BSIZE and XSIZE that meet this requirement. So, for a file with a Table X, there is no one formula for determining a unique BSIZE or XSIZE, but there are a number of approaches you can take.
- If you have records with a generally consistent size you may be able to keep most of your data in Table B and have only a small Table X for the occasional overflow.
- If you have wildly divergent size records, size Table B so that the vast majority of the smaller size records fit in Table B so only the largest ones create extensions.
- If you have records which start small, and then increase dramatically over time, consider very small (perhaps even only large enough to handle the preallocated fields) in Table B, with the rest as extensions.
But, as long as you understand first the overall size you would need if you were only storing the data in Table B, splitting it into the two parts is straightforward. (And if RECRDOPT is set to one, then sizing of Table B is trivial — how many records do you expect to have?)
Table X overhead
The purpose of Table X is to free page slots in Table B that might have been used for extension records. There could be a performance side effect with using Table X. By experimenting with different values of XRECPPG, it might be possible to reduce the size of record extension chains: that is, have fewer but larger extension records instead of many smaller extension records. Having fewer extension records would potentially reduce I/O normally required to read in very large records, such as those with many extensions.
Sizing tables with XSIZE greater than zero
Setting a default for XSIZE depends on the difference in the size of your records. The more variation in the length of your records, the more likely that you will have extension records and, therefore, need more Table X pages. Rocket Software recommends the following: if the size of your records varies by 10%, then allocate 10% of the pages in Table B for Table X.
If XSIZE is greater than 0, the following formula can be used to size Table B:
BSIZE=1.2 *(total number of base records) / BRECPPG
And the following formula can be used to size Table X:
XSIZE=1.2 *(total number of extension records) / XRECPPG
Note: Table X slots are always reused after extension records are deleted. Table B slots are reused only for Reuse Record Number (RRN) files.
Tables C and D indexing structure
Tables C and D comprise the indexing structure of a Model 204 file. Only fields defined with the KEY, NUMERIC RANGE, or ORDERED attribute generate entries within the indexing structure:
|Entries in...||Are made for each distinct value of...|
|Table C||KEY or NUMERIC RANGE field.|
|Table D||ORDERED field, and for each record that contains a particular value of a KEY, NUMERIC RANGE, or ORDERED field, if that value occurs in more than one record in the file.|
The two indexes are:
|Hashed Index||Composed of Table C, which indexes KEY and NUMERIC RANGE fields, plus a secondary index (located in Table D) containing Table B record numbers pointed to by Table C entries.|
|Ordered Index||Stored in Table D, is composed of the Ordered Index B-tree, which indexes ORDERED fields, plus a secondary index (located in Table D) containing Table B record numbers pointed to by Btree entries.|
In addition to these tables, some free space might be available to the file on unassigned pages in a free-space pool.
FRV attribute entries
In addition, Tables C and D contain extra entries for fields that have the FRV attribute. However, the space for these entries generally is insignificant in relation to the other entries, and so formulas for calculating FRV entries are not provided. To allow for FRV entries and to compensate for imprecise knowledge of data values and their distribution, the following formulas result in generous space estimates.
Computing the file size multiplier (N)
To minimize disk storage space and to optimize record retrieval techniques, the records in Table B are divided into internal file segments that are transparent to the user. The maximum number of records stored in one file segment is 49,152 (that is, eight times a page size of 6184).
Both Table C and Table D space estimation formulas depend upon the file size multiplier N, which represents the number of internal file segments. Use the following equation to calculate N:
N = Number-of-Records-in-the-File / 8 * Page-size
Round the result up to an integer. If BRECPPG is set too high or if a large number of extension records exists, there can be fewer actual records per segment. In this case, base N on the number of record numbers used in the file (EXTNADD + MSTRADD), rather than on the number of records actually stored.
For space estimation purposes, the records are considered to be distributed evenly among the segments. If the records are not distributed evenly, make separate estimates for each segment individually.
Sizing Table C
Table C organization
Table C is a hashed table divided into entries of seven bytes each. Table C entries store index information for fields that have the KEY or the NUMERIC RANGE attributes. Model 204 creates a chain of entries in Table C for each value stored in a KEY field and several chains of entries for each value stored in a NUMERIC RANGE field.
Table C property entries
The head of each chain is called the property entry. The property entry identifies the field name = value pair that is indexed by the other entries in the chain. Model 204 places one entry in the chain for each segment of the file containing records that have the field name = value pair identified in the property entry.
PROJECT, a 4-segment file, contains a field named
STAGE is defined with the KEY attribute. One of the values stored in the field
PLANNING. In the first and second segments of the
PROJECT file, there are records containing the field name = value pair,
STAGE = PLANNING.
Therefore, in Table C of the
PROJECT file, there is a chain of three entries:
- Property entry for
STAGE = PLANNING
- Entry for the first segment of the
- Entry for the second segment of the
Storing segment and property entries
Model 204 attempts to store segment entries on the same page as the property entry. When this is not possible, Model 204 continues chains of entries in Table C across Table C page boundaries, ensuring uniform use of the pages in Table C by reducing the likelihood of one page filling while other pages are relatively empty.
The CSIZE parameter specifies the number of pages to be assigned to Table C. After it has been allocated, the size of Table C cannot change until you re-create the file.
Compute CSIZE as follows:
- Place the distinct values of each KEY or NUMERIC RANGE field into one of two categories:
- Category u contains those field name = value pairs that usually appear in only one record in the file, such as Social Security number.
- Category n contains those field name = value pairs that occur in more than one record in the file, such as the values of SEX or AGE. For simplicity, field name = value pairs in this category are assumed to occur in records in every segment. This is the worst-case assumption and results in slightly high estimates.
- Then let Vu = total number of pairs in category u and Vn= total number of pairs in category n.
For fields that have both the KEY and NUMERIC RANGE attributes, count the values twice, as if there were two distinct fields. Calculate the number of extra entries required for NUMERIC RANGE retrieval fields. For each NUMERIC RANGE field:
- Determine the maximum number of significant digits the field will have. Include digits on both sides of the decimal point.
- Multiply by 10.
- Add 2.
- Let Vr = total number of extra entries required for all NUMERIC RANGE retrieval fields.
When calculated this way, Vr is the maximum number of extra entries required. You can reduce this number slightly if some digits never take on all the values between 0 and 9. For example, in a 3-digit age field, the first digit never goes above 1. Refining the estimate of Vr is usually unimportant because Vr is usually outweighed by Vn.
CSIZE = 1.2 * ((14 * VU) + 7 * (N +1)(VN + VX)) / (6144 -4)
Round up to the nearest integer. Do not reduce the multiplier, even if you can determine the exact number of entries required in Table C, because it is not possible to use all the space available. CSIZE must not exceed 16,777,216. CSIZE has a default value of 1.
Sizing Table D
Table D data
Table D contains a number of different types of data. The principal types are:
- Ordered Index B-tree pages
- Lists or bit patterns of indexing information for KEY, NUMERIC RANGE, and ORDERED fields that appear in multiple records
- Existence bit pattern pages: bit patterns that specify which records currently exist in the file segment
- Preallocated field record descriptions
- Text of stored procedures
- Procedure names and aliases (procedure dictionary)
- Access Control Table (ACT) pages
- Sorted file group index pages, if the file is a sorted file
- Reserved area: a pool of pages kept available for transaction back out use. The size of the reserved area is controlled by the DPGSRES file parameter.
In most files, indexing entries constitute the major portion of the table, but in files that have very few KEY, NUMERIC RANGE, and ORDERED fields, procedures can overshadow the indexing data.
Data storage in Table D
Table B record locating information is stored in Table D record number lists and bit patterns for Ordered Index fields and for KEY and NUMERIC RANGE field name = value pairs that occur in more than one record in the file.
Record list pages contain Model 204 record numbers for a given file segment, stored in 2-byte entries. Lists that grow too large are converted into bit patterns. Bit pattern pages are Model 204 pages where each bit on the usable page represents a single record number for a given file segment.
The total amount of space required for Table D is the sum of the space computed for the Ordered Index pages, the index lists, the preallocated field record descriptions, the procedure texts, the procedure dictionary, the ACT, and the reserved area:
DSIZE = OIT + IT + F + P + (K * PDSIZE) + Q + DPGSRES
|OIT||Size of the Ordered Index|
|IT||Size of index list space|
|F||Number of preallocated fields|
|P||Number of procedures|
|K||Number of blocks of pages required for the procedure dictionary|
|PDSIZE||Size of the procedure dictionary|
|Q||Number of pages required for the Access Control Table (ACT)|
|DPGSRES||Size of the Table D reserved area|
The space requirements of the principal components of Table D are discussed in the following sections.
Calculating the size of the Ordered Index (OIT)
About Ordered Index space
The Ordered Index is stored in Table D. Record location information is stored on list or bit pattern pages when an ORDERED field value occurs a greater number of times than the IMMED parameter allows to be held locally in a segment of the Ordered Index B-tree. The space requirements for these list pages are the same as for the KEY field lists, and are discussed in detail on Computing the total index list space (IT). The Ordered Index B-tree space calculations follow.
The following formulas yield an approximation for the total amount of space used by the Ordered Index B-tree structure. The formula variables are field specific; you need to calculate the space for each field in the Ordered Index.
Estimating Ordered Index space (OI) for each ORDERED field
For each field in the file that has the ORDERED attribute, the number of Table D pages required for the section of the Ordered Index B-tree structure that indexes the field is estimated as follows.
- Estimate the following numbers:
This value Equals... NE Number of distinct values (or elements) in the field N Number of segments in the file
- Estimate the average length (
First estimate the average length of the distinct values stored in the ORDERED field. For numeric values of ORDERED NUMERIC fields, the average length of the numeric values is 8. Compute the following:
AV = estimated av. length of ORDERED values + 1
- Divide the ORDERED values into categories. To estimate space for the Ordered Index, perform separate calculations on each of the following categories of distinct field value:
This category Equals values that occur in... A
One and only one record in the file.
ValA = the number of values in category A
More than one record in the file and in a number of records per segment less than or equal to the setting of the field's IMMED parameter.
ValB = the number of values in category B
A greater number of records per segment than the setting of the field's IMMED parameter.
ValC = the number of values in category C
- For each category of distinct values, use the following appropriate formula:
- Calculate category A.
Total length of the Ordered Index entries placed in category A is:
ENa = ValA * (AV + 3)
- Calculate category B.
For the values in category B, first estimate the average number of records per segment that has one of the values in category B.
Let AB represent the average number of records per segment with one of the values in category B. AB is between 1 and the value of the IMMED parameter for that field.
The total length of the Ordered Index entries placed in category B is:
ENb = ValB * (AV + (2 * AB) + (2 * N))
(AV + (2 + AB) + (2 * N))is greater than 3000, substitute 3000.
- Calculate category C.
The total length of the Ordered Index entries placed in category C is:
ENc = ValC * (AV + (5 * N))
- Calculate category A.
- Calculate OIB.
Assuming that the values of the ORDERED field are distributed evenly over the segments of the file, the estimated total length of all the Ordered Index entries is:
OIB = ENa + ENb + ENc
If the values are not evenly distributed, estimate ENa, ENb, and ENc (as appropriate) for each segment in which the values occur.
Note: The value calculated as OIB should roughly correspond to the value of the OINBYTES parameter after the file is fully loaded. OINBYTES is a file table parameter that displays the current number of Ordered Index B-tree entry bytes.
Estimating leaf page overhead (LOa)
To estimate the actual amount of overhead space on each leaf page, first calculate the amount of overhead expected on each leaf page, then the minimum amount of overhead necessary for each leaf page, and use the larger of the two.
- Calculate the expected leaf page overhead (LOe)
The amount of overhead expected on each leaf page,
LOe, depends on the usual mode of updating used when updating the ORDERED field:
- If most updates are in deferred update mode (using either the deferred update feature or the File Load utility), then use the setting of the field's LRESERVE parameter to calculate LOe:
LOe = 6144 * (LRESERVE / 100)
- If you expect most updates to be in non-deferred update mode then use the setting of the field's SPLITPCT parameter to calculate LOe:
LOe = 6144 *( (100 - SPLITPCT) / 100)
- If most updates are in deferred update mode (using either the deferred update feature or the File Load utility), then use the setting of the field's LRESERVE parameter to calculate LOe:
- Calculate the minimum leaf page overhead
To determine the minimum amount of overhead for each leaf page, LOmin, first calculate the average number of bytes per Ordered Index entry:
AE = DIB / NE
Then calculate LOmin using the following formula:
LOmin = 2 * (6144 / AE)
- Estimate leaf page overhead (LOa)
The estimate of the overhead for each leaf page, LOa, is the larger of LOe and LOmin:
LOa = max(LOe, LOmin)
Estimating the number of required leaf pages (LP)
The number of leaf pages required for the ORDERED field is:
LP = OIB / (6144 - 24 - LOa)
Round up to the nearest integer.
Calculating the size of the index for each ORDERED field
The number of Table D pages required for the ORDERED field's section of the Ordered Index B-tree is:
OI = (LP * 1.01) rounded up to the nearest integer
This formula assumes conservatively that the number of intermediate pages is 1% of
Calculating the total size of the Ordered Index (OIT)
If there is more than one ORDERED field in the file, the total number of pages required for the Ordered Index B-tree is the sum of the pages required for each ORDERED field.
OIT = OI1 + OI2 + ... + OIn
Computing the total index list space (IT)
If a record number list grows to exceed the available space on a Table D list page, but is still less than 30% of the Table D page, the list is moved to a Table D page that has enough space to hold the list. If a list grows longer than 30% of a Table D list page, it is converted into a bit pattern. Bit patterns are not converted back to lists.
Model 204 deletes empty lists. If a Table D list page becomes empty because the lists originally stored on the page have been deleted, moved onto another page, or converted into bit patterns, Model 204 makes the empty page available for reuse.
The amount of Table D space used by index lists depends primarily upon how many records contain a particular field name = value pair and how many of those records are in each file segment. Field name = value pairs that were placed in category u for Table C estimates do not take up any space in Table D.
Before you can calculate the index list space, you need to choose a value for the DRESERVE parameter, which is the percentage of space reserved for expansion of current record number lists. If a list grows into the DRESERVE section of the current page for lists, the next new list goes on a new page. If more space becomes available on the current page before a list grows into the DRESERVE section of the page, a new list can be started in the newly available space. New lists cannot start in the DRESERVE section of the Table D page.
The default value of DRESERVE is 15%.
Calculating I (the index list space)
Compute I, the amount of space required for index lists for each segment, according to the following rules:
N, the file size multiplier, is greater than 1, consider the total number of records in the file to be divided evenly into segments.
- For each segment of the file, take each KEY and/or NUMERIC RANGE field name = value pair that occurs in more than one record in the file, and each ORDERED field name = value pair that occurs in a greater number of records than the setting of the field's IMMED parameter, and place it in one of the following categories:
This category Equals field name = value pairs that occur in... A More than one record but fewer than 2 percent of the records in the segment. For files with a page size of 6184 (6144 usable), field name = value pairs in this category occur in fewer than approximately 1000 records in the segment. B Two percent or more of the records in the segment. Their record numbers are stored on bit pattern pages.
Fields that have both the KEY and NUMERIC RANGE, or KEY and ORDERED attributes have their values counted twice, as if there were two distinct fields. It is possible that different values of the same field might not be in the same category.
For example, if DEPT = PERSONNEL is contained in 5000 records of a segment, it is placed in category B, whereas DEPT = SECURITY might occur in only 100 records in the segment and, therefore, be placed in category A. If the distribution of values is not known, then assume that all values of a field occur equally in each segment.
Each pair placed in category A requires the following number of bytes:
T = 2 + (2 * (Number of Records Containing the Pair))
This value Equals... X Total number of bytes available on a Table D page. X depends on the DRESERVE parameter, which defaults to 15% and represents the percentage of reserved space per page. The default value of X is 5222, calculated as follows:
X = 6144 * (1 - (DRESERVE / 100) )
This value Equals... A Total number of pages required by the category A pairs for the segment, where:
A = T / X
This value Equals... B Total number of pages required by pairs in category B. Each field name = value pair in category B requires 1 page for the segment. B is equal to the number of pairs in the category.
- Calculate the number of extra values per segment for NUMERIC RANGE fields. For each field:
- Determine the maximum number of significant digits the field will have. Include digits on both sides of the decimal point.
- Multiply by 10.
- Add 2.
If the field appears in fewer than 2% of the records, each extra value just calculated requires the following number of bytes:
T' = 2 + (2 * (Number of Records Containing the Field))
If the NUMERIC RANGE field appears in 2% or more of the segment's records, the number of pages required is:
B' = number of extra values
The extra space required for all NUMERIC RANGE fields is computed as follows. First, let:
T" = sum of all the values of T' B" = sum of all the values of B'
Then, the total number of pages required is:
C = (T" / X) + B"
Thus, the amount of index list space, I, for each segment is:
I = A + B + C
The total number of pages required for index lists and bit patterns for the entire file is equal to the sum of the totals (IT) for each segment, plus the number of existence bit pattern pages. Because there is one existence bit pattern page per file segment, the number of existence bit pattern pages is equal to N, the number of segments. The total number of pages for index lists and bit patterns can thus be represented by the following equation:
IT = A1 + B1 + C1 + ... + AN< + BN + CN + N
Calculating F (the number of pages for preallocated fields)
If any preallocated fields are defined in a file, one Table D page is used to store a record description of the arrangement of fields in the block of storage preallocated in each record. The record description uses 36 bytes of fixed overhead and 8 bytes for each preallocated field. The maximum number of preallocated fields on a 6144-byte record description page is, therefore, 763.
Let F be the number of Table D pages required for the record description. F is always either 0 or 1.
Calculating P (the number of procedures)
Procedures are stored in Table D. In most cases, the text of each procedure requires one page. A very long procedure might require more than one page. Let:
P = total number of procedures
Sizing the procedure dictionary
Procedure names and aliases are stored in a procedure dictionary in Table D. Like procedure text, the procedure dictionary associates a procedure name or alias with information about the location of the procedure's text, and with a class, if the procedure is secured.
The procedure dictionary is allocated in blocks of one or more contiguous pages. When Model 204 verifies a procedure name, it begins searching on a random page in the first block. If the name is not found on that page, the remaining pages in the same block are searched. If the name is still not found, Model 204 searches the pages in the second block, and so on.
Storing new procedure names
If Model 204 does not find the name (that is, if this is a new procedure name), it stores the new name in the first block in which it can find space. Model 204 allocates a new block when it cannot find space for a new name in any of the preceding blocks. Space used by deleted names is reused.
Choosing a PDSIZE
There are two possible paths you can take in choosing a PDSIZE:
- Have one large block containing many pages. Because name searches always begin with the first block, this increases the likelihood of finding a name on the first page read. However, as the pages fill up, Model 204 might allocate a new block when space still exists on the old block.
- Have a number of smaller blocks with fewer pages. Although it might take Model 204 longer to find the procedure name, there is less impact on Table D when a new block is allocated.
When choosing PDSIZE, take into account the percentage of procedure and alias names known or anticipated when you design the file. The fewer aliases your site uses, the smaller the PDSIZE you can use.
PDSTRPPG specifies the maximum number of procedure entries per procedure dictionary page. The actual number of procedure entries per page is a function of the length of the names and aliases. The size of an entry is:
L + 34 for a procedure L + 7 for an alias
L is the length of the procedure or alias name.
S, the average entry size. Then compute PDSTRPPG as follows:
PDSTRPPG = 6144 / S
The default value of PDSTRPPG is 128. Its maximum is 256.
Computing PDSIZE (the size of the procedure dictionary)
The procedure dictionary is allocated in blocks of one or more contiguous pages. PDSIZE specifies the number of pages in a single block. If you know most of the procedure names when you create the file, use the following formula:
PDSIZE = 1.4 * P / PDSTRPPG
PDSIZE has a default value of 3.
If K is the number of blocks of pages, then (K * PDSIZE) is the total number of pages required for the procedure dictionary.
Sizing the access control table (ACT)
The access control table (ACT) contains entries that map user classes and procedure classes into privileges. It is used for procedure security purposes. The ACT is allocated from Table D, one page at a time, as needed. No space is allocated until Model 204 encounters the first SECURE command. The maximum number of pages possible for the ACT is five.
Determining LET (the total length of procedure class entries)
The ACT is organized by user class in ascending order. For each user class, you need to determine:
NPCLASS = number of procedure class subentries
LE, the length of the entries for each user class as follows:
LE = 4 + (2 * NPCLASS)
Thus, if user class 05 has privilege definitions set for 8 different procedure classes, the length of its entry is 20 bytes.
Then, the total length of the user class entries is:
LET = LE1 + LE2 + ... + LEn
Additional space required for a SECURE command depends upon whether an entry already exists for the particular user class in question, and upon whether subentries exist for the procedure classes in question. If the entry already exists, 2 bytes are needed for each new procedure class mapped to that user class. If the subentries already exist for the procedure classes, no additional space is required.
Determining Q (the number of pages required for the ACT)
Q, the number of pages required for the ACT is always between 0 and 5 and is calculated by Model 204. To determine how many pages Model 204 will probably use for the ACT:
Q = LET / 6144
Reorganizing the ACT
If there is no room on an ACT page to add a new user class entry or subentry, Model 204 reorganizes the entire ACT. During this automatic reorganization, N + 1 pages are allocated from Table D, where
N is the number of pages in the ACT before reorganization. The new pages need not be contiguous. Existing user class entries are redistributed across the new pages in an effort to leave some free space on each ACT page. After reorganizing, the original
N pages are released.
Note: If the ACT reaches five pages and redistributing user class entries does not produce enough space for the new entry, the entry is not added. If the old entries cannot be redistributed successfully in five pages, the ACT is left in its original state and the new entry is not added.
Sizing the reserved area
Using reserved Table D pages
Model 204 keeps a specified number of Table D pages available, primarily for transaction back out use. When a page is successfully allocated from this area, the file is marked full; processing continues, and the following warning message is issued:
M204.2486 FILENAME: TABLED FULL. PAGE ALLOCATED FROM TABLED RESERVE AREA
Marking the file full prevents other users from starting requests that update Table D, making it more likely that all requests in progress complete normally. (Only nonupdate requests can examine data in files marked full. Users attempting to update files marked full are restarted.)
In a transaction back out file, the last half of the reserved section is reserved for use during transaction back out. If an ordinary transaction attempts to get a page from the second half of the reserved area, the allocation attempt fails with a Table D full error, which causes transaction back out to be initiated. During back out any free Table D page can be used.
For transaction back out files, the DELETE RECORDS and FILE RECORDS statements establish constraints that place the pages they delete during normal processing into the reserved area, temporarily enlarging the second half of the reserved area until the transaction commits.
When no space is available in Table D, including the reserved area, either the request is canceled or the user is restarted. The file is marked broken only if it has been updated and transaction back out is impossible or unsuccessful.
The DPGSRES parameter controls the size of the Table D reserved area. To compute DPGSRES, you first need to know the value of DEST, which is the estimate of the value of the total amount of space required for Table D, not including the reserved area space.
Calculating DEST (estimated Table D size)
DEST is the sum of the space computed for the Ordered Index pages, the index lists, the preallocated field record descriptions, the procedure texts, the procedure dictionary, and the ACT:
DEST = OIT + IT + F + P + (K * PDSIZE) + Q
Setting DPGSRES (the size of the reserved area)
For files containing only procedures, set DPGRES to 0 to avoid wasting Table D space. For files that are not transaction back out files, Set DPGRES low to avoid wasting Table D space.
Unless you specify some other value, the CREATE FILE command sets DPGSRES to:
DPGSRES = min(DEST/50 + 2, 40)
That is, DPGSRES is either
(DEST/50 + 2) or 40, whichever is smaller. Since
DEST/50 + 2 = 40 when
DEST = 1900:
If DEST < 1900, DPGSRES = DEST/50 + 2
If DEST >= 1900, DPGSRES = 40
The total amount of space required for Table D is the sum of the space computed for the Ordered Index pages, the index lists, the preallocated field record descriptions, the procedure texts, the procedure dictionary, the ACT, and the reserved area.
DSIZE = OIT + IT + F + P + (K * PDSIZE) + Q + DPGSRES
DSIZE = DEST + DPGSRES
You can change the value of DSIZE using the INCREASE and DECREASE commands. DSIZE cannot exceed 16,777,216. The default value of DSIZE is 15.
Sizing Table E
The following parameters pertain to Table E sizing:
- ESIZE — The number of file pages in Table E.
- EHIGHPG — The highest active Table E page. The first page in Table E is page zero.
- EPGSUSED — The number of Table E pages currently in use.
Storing Large Object data
Each instance of a Large Object field occupies an integral number of Table E pages.
The rules of use, and sizing, are quite different depending on whether the FILEORG X'100' bit is set. For these differences, see:
ESIZE for FILEORG X'100' files
Set ESIZE as the number of Data pages.
To calculate the number of Data pages: Average the BLOB/CLOB length, divide by 6140 (The usable page size of 6144 less the 4 byte chain pointer) and round up. Then, multiply by the total number of Large Object fields (and probably add a percentage for growth (based on your knowledge of the data and application).
- If you have MINLOBE set, ignore large objects smaller than this number.
- Be sure to take that (along with the large object header) into account when sizing Tables B and X.
For more detail on the large object architecture, see Table E for FILEORG X'100' files.
ESIZE for non FILEORG X'100' files
- A Large Object field with a null value (or 0 bytes of data) occupies no Table E pages.
- Large Object field data from 1 to 6144 bytes occupies one Table E page. If the data is from 1 to 6143 bytes, the page is not completely filled. so the remainder of the page is unused.
- Large Object data of 6145 bytes requires two Table E pages.
The pages used to store a Large Object value are always contiguous in Table E. If you specify the RESERVE option when the data is stored, then enough contiguous pages are allocated to hold the full RESERVE length, even if the actual size of the data initially stored is less than that.
If possible, when space to store Large Object data is required from Table E, then the space is allocated from the pages past EHIGHPG — even if there are free pages in Table E before the EHIGHPG point. In other words, data in Table E is initially stored in entry order. Eventually, when there is insufficient space left at the end of Table E, then space is allocated from the unused pages in Table E. Unused pages are a result of deleting Large Object data.
Note: Even if the number of free pages (ESIZE minus EPGSUSED) is sufficient, it might not be possible to obtain the required Table E space. The free pages must also be contiguous.
Set ESIZE as the number of Data pages, plus the number of Bitmap pages plus two.
- To calculate the number of Data pages: Average the BLOB/CLOB length, add 6144, and divide by 6144. Then, round down the result and multiply by the number of Large Object fields.
First calculation: (Avg.-BLOB-len + 6144) / 6144 = result Second calculation: 1st Round up result Data pages = round-up-result * No.-of-BLOBs
- To calculate the number of Bitmap pages: Add 17 to (Data pages / 49152) and add 1. Then, round up the result.
17 + (Data-pages / 49152) + 1 = 2nd result Bitmap pages = 2nd round up result
- Calculate ESIZE.
ESIZE = Data pages + bitmap pages
Generally speaking, the cost of finding free space in Table E non X'100' files is very low during the initial phase, when EHIGHPG is still increasing, but more expensive later, particularly when Table E free pages are fragmented: for example, if the Large Object data stored in the file show a wide variation in size.
If the Large Object data stored in your database are volatile because of a high number of deletions and additions, Rocket Software recommends that you store the Large Object data in an individual file (or files), plus an indexed field to cross-reference the Large Object field to the data in other files (or make the file an x'100' file). This enables you to size, manage, and reorganize the Large Object data independently of your other files. This approach is particularly beneficial if you are new to using Large Object fields and find it difficult to accurately determine the Large Object data space requirement in advance.
For more detail on the large object architecture, see Table E for non FILEORG X'100' files.
Managing Large Object data
If a file was originally created with ESIZE=0, this can be changed only by recreating the file. Otherwise, you issue an
INCREASE TABLEE or
DECREASE TABLEE command to change the size of Table E — subject to the standard restrictions that apply to the INCREASE and DECREASE commands.
Data set allocation
After you have finished the preceding calculations, you can allocate data sets for the Model 204 file.
Minimum number of pages required
The minimum number of pages required for the file is equal to:
8 + ASIZE + BSIZE + CSIZE + DSIZE + ESIZE
You can allocate more disk space. When the file is created, any pages not assigned to the File Control Table (always eight pages) or Tables A through D are designated free space and can be used later to expand Tables B, D, and E.
Allocating disk space
Allocate disk space in either tracks or cylinders, without specifying a secondary allocation. The "Disk space requirements" table can help you to determine how many pages Model 204 stores on each track for your device type. The page size for all devices is 6184 bytes.
For example, a file that you calculate to need 1275 pages requires at least 183 tracks on a 3380 device.
Support for FBA devices
Model 204 also supports fixed-block-architecture devices (3370s) under the z/VM/SP and z/VSE operating systems; FBA devices require 13 blocks per page.
Guidelines for allocating data sets
The space can be allocated in one or more data sets on one or more disk packs as you see fit.
Keep the number of data sets small, if core is a problem.
In a heavily used file, you can greatly improve efficiency by distributing the tables into several data sets on several volumes, each maintained on different channels and control units.
Allocating data sets
To allocate z/OS data sets, use the IBM utility IEFBR14. For example:
//JOB IEFBR14 DELETE AND CREATE //STEP1 EXEC PGM=IEFBR14 //PEOPLE DD DSN=M204.FILE.PEOPLE,DISP=(NEW,CATLG), // SPACE=(TRK,183),UNIT=3380 //
The choice of data set names is, of course, entirely yours, as is the decision whether or not to catalog.
If a large enough piece of contiguous space is available on the disk, a slight improvement can be made by allocating the data set on contiguous tracks. For example:
The ALLOCATE utility provided with Model 204 is used to preallocate database files, the CCATEMP file, the CCAGRP file, and the CCASERVR files. It can allocate one or more of these files, as specified in control statements, during one execution. For each file referenced in the ALLOCATE control statements, provide a DLBL and EXTENT with complete information. The utility opens each of these files as output data sets to make entries into the volume table of contents.
For variable format (z/OS and z/VSE) z/VM minidisks that have been initialized using the INITIAL parameter of the M204UTIL command, allocate data sets with the ALLOCATE parameter of the M204UTIL command. An example follows:
ACCESS 201 M M204UTIL ALLOC M204 FILE PEOPLE M (P 183 TRK
The minidisk where the allocation is to be performed must be accessed before issuing M204UTIL ALLOCATE from z/VM. M204UTIL ALLOCATE does not catalog data sets. For further description of M204UTIL, see Defining the runtime environment (CCAIN).
ALLOCATE control statement
The ALLOCATE control statement format is as follows:
ALLOCATE FILE(filename1 filename2 ... filenameN)
The statement is free form and can begin in any column. You can have any number of ALLOCATE control statements in the input to the utility. Continuation from one input record to the next is indicated by a dash (minus sign) after the last parameter on the input record being continued. There is no limitation on the number of continuation statements.
The parameters filename1 through filenameN refer to the filenames on the DLBL statements in the job control stream. If a filename referenced in the ALLOCATE control does not have a corresponding DLBL statement in the JCL, an error message is written in the output audit trail.
Comment statements beginning with an asterisk in column 1 can be interspersed with the ALLOCATE control statements.
The ALLOCATE utility also runs in a mode compatible with earlier releases. If the control statements are omitted, the utility attempts to allocate a single file with a filename of NEWFILE. A DLBL and EXTENT for NEWFILE must be provided in the JCL used to run the utility.
The following sample ALLOCATE utility job stream shows that the file PEOPLE with 183 tracks of space beginning at relative track number 1000 is allocated:
// JOB ALLOCATE MODEL 204 FILE // DLBL M204CL,'M204.CORE.IMAGE.LIBRARY' // EXTENT,volser // LIBDEF CL,SEARCH=M204CL // DLBL PEOPLE,'M204.FILE.PEOPLE',99/365 // EXTENT SYS001,SYSWK1,,,1000,183 // EXEC ALLOCATE,SIZE=AUTO ALLOCATE FILE(PEOPLE) /* /&
Space estimation example
To perform a simple space calculation, assume that a simple personnel file of 90,000 records has the characteristics listed in the Personnel file characteristics sample data table.
All fields are UPDATE IN PLACE.
SSN is given the CODED attribute so that numbers that start with a zero can be stored in coded form in the preallocated space. All other
SSN values are stored as 4-byte binary. Because only a small number of values start with a zero,
SSN does not have any effect on Table A space estimates.
|Comments on distribution of KEY and
NUMERIC RANGE values
IMMED 2 LRES 15
|9||Unique to each record.|
|2||55 possible values, evenly distributed (18-72).|
|5||20,000 possible values, evenly distributed.|
Values for Personnel Dept. occur only in the first 40,000 records. Values for Accounting Dept. occur only in the last 10,000 records. The other 8 values occur evenly in the remaining 5000 records in segment 1 and the remaining 35,000 records in segment 2.
Sample Table A calculations
|Field||# of Values||Space||Overhead||Total|
|Field||LEN + 2||ANY + UP||COD/FRV||OCC||LVL||FLOAT||ORD||UNIQ||NR||Total|
|N = 191|
- 1 LEN is the length of the field name.
- 2 ANY refers to the two bytes required from page 3-5. UP refers to the one byte required for UPDATE IN PLACE fields.
- 3 Because only a small number of values in SSN start with a zero, this field does not have any effect on Table A space estimates.
A = 6 D(AGE) = 2 + 3 = 5 D(SALARY) = 5 + 3 = 8 S = 5 + 8 = 13 T = A + B + S = 6 + 65 + 13 = 84 L = average length of character strings = (V + N)/T = (405 + 191)/84 = 7.0 = 7 ASTRPPG = 6144/L = 6144/7 = 877.7 = 877 (rounded down)
The following numbers are estimated as part of ASTRPPG:
|N||Total space consumed by field names including overhead.|
|A||Number of field names.|
|S||Number of extra NUMERIC RANGE fields.|
ASTRPG = 1.1 * N/(6144 - (ASTRPPG * 2) -2) = 1.1 * 191/(6144 - (877 * 2) -2) = 1.1 * 191/4388 = 0.04 = 1 (rounded up)
ASTRPG = 1.1 * (A + S)/ASTRPG = 1.1 * (6+ 13)/877 = 0.02 = 1 (rounded up)
The following numbers are estimated as part of ASTRPPG:
|V||Total space consumed by FEW-VALUED fields including overhead|
|B||Number of FEW-VALUED fields|
FVFPG = 1.2 * V/(6144 - (ASTRPPG * 2) -2) = 1.2 * 405/(6144 - (877 * 2) -2) = 1.2 * 405/4388 = 0.11 = 1 (rounded up)
FVFPG = 1.2 * B/ASTRPPG = 1.2 * 65/877 = 0.08 = 1 (rounded up)
There are no MANY-VALUED, FRV, or CODED fields, but a minimum of one page must be allocated to each Table A section. Therefore:
MVFPG = 1
Sample Table B calculations
|Field||Bytes required per record|
|R = 48|
BRECPPG = 1.1 * (6144 - 4)/R = 1.1 * 6140/48 = 140.7 = 141 (rounded up)
BRESERVE is equal to
R, above. Therefore:
BRESERVE = R = 48
BSIZE = 1.2 * (Total # of Records)/BRECPPG = 1.2 * 90000/141 = 765.9 = 766 (rounded up)
Calculating the file size multiplier example
N = (# of Records in the file)/(8 * 6144) = 90000/49152 = 1.83 = 2 (rounded up)
Sample Table C calculations
|Field name||Vu pairs||Vn pairs||Vr entries|
|AGE (NUM RANGE)||55||22|
CSIZE = 1.2 * ((14*Vu) + 7 * (N+1)(Vn+Vr)) / (6144-4) = 1.2 * ((14*90000) + 7 * (3)(20120+74)) / 6140 = 330 (rounded up)
Sample Table D calculations
Calculating Ordered Index space
The calculations in this section use the following variables:
|NE||Number of distinct values stored in the field.|
|AB||Average number of records per value per segment..|
|OIB||Total Ordered Index B-tree entry lengths.|
|LOa||Leaf page overhead.|
|LP||Leaf node pages.|
|OI||Ordered entry B-tree pages for the field.|
|Field name||Values in:|
|Category A||Category B||Category C|
Calculating total Ordered Index B-tree entry lengths
AV = Average Value Length + 1 = 11 + 1 = 12 NE = 65500 ENa = Category A * (AV + 3) = 60000 * (12 + 3) = 900000 AB = 2 ENb = Category B * (AV + (2 * AB) + (2 * N)) = 5000 * (12 + (2 * 2) + (2 * 2)) = 100000 ENc = Category C * (AV + (5 * N)) = 500 * (12 + (5 * 2)) = 11000
Calculating Ordered Index B-tree overhead
The final values for LOe, LP, and OI are rounded up:
LOe = 6144 * (LRESERVE/100) = 6144 * (15/100) = 922 AE = OIB / NE = 1011000 / 65500 = 15 LOmin = 2 * (6144 / AE) = 2 * (6144 / 15) = 819 LOa = max(LOe, LOmin) = max(922, 819) = 922 LP = OIB / (6144 - 24 - LOa) = 1011000 / (6144 - 24 - 922) = 195 OI = LP * 1.01 = 195 * 1.91 = 197
Calculating index list space
This example assumes 2 segments containing 45,000 records each.
|If a field name = value pair appears in...||It falls into category...|
|Fewer than 900 (0.02 * 45000) records||A|
|More than 900 records||B|
For each value, the number of category A bytes required (
T') is calculated using the following equation:
T'= 2 + (2 * (Number of Records Containing the Pair))
The number of category B pages required for each field is equal to the number of distinct values of that field. For each NUMERIC RANGE value, the extra number of pages required equals ten times the number of significant digits, plus two.
The following calculations use these variables:
|T||Category A bytes|
|A||Total number of pages in Category A|
|B||Total number of pages (or values) in Category B|
|C||Total number of extra numeric range pages|
|T1 = 326216||B1 = 3||C1 = 74|
Calculations for segment 1
X = 6144 * (1-(DRESERVE / 100)) = 6144 * 0.85 = 5222 A1 = T1 / X = 326216 / 5222 = 63 (Rounded up)
|T2 = 316200||B2 = 11||C2 = 74|
Calculations for segment 2
A2 = T1 / X = 316200 / 5222 = 60.5 = 61 (Rounded up)
Total index list space
IT = A1 + B1 + C1 + A2 + B2 + C2 + 2 = 63 + 3 + 74 + 61 + 11 + 74 + 2 = 288
Determining F (space required for preallocated fields)
If you have defined any preallocated fields in a file, Model 204 uses one Table D page for the record description. Because two of the fields in this example are preallocated (have the OCCURS attribute):
F = 1
Calculating space required for procedures
The file holds approximately 50 procedures with 20-character names. This examples does not use procedure security. The calculations in this section use the following variables:
|P||Number of procedures.|
|S||Average size of procedure entry.|
|K||Number of blocks of pages.|
|Q||Number of pages required for ACT.|
P = 50 S = Name length + Overhead = 20 + 34 = 54 PDSTRPPG = 6144 / s = 6144 / 54 = 113.7 = 113 (Rounded down) PDSIZE = 1.4 (P/PDSTRPPG) = 1.4 * (50/113) = 0.61 = 1 (Rounded up) K = 1 Q = 0
Calculating space required for reserved area
DEST = OIT + IT + F + P + (K * PDSIZE) + Q = 197 + 288 + 1 + 50 + 1 + 0 = 537 DPGSRES = min(DEST/50 + 2, 40) = min(537/50 + 2, 40) = 13 (rounded up)
DSIZE = DEST + DPGSRES = 537 + 13 = 550
Sample Table E calculations
Sizing Table E
You can set the ESIZE parameter when the file is created to the default of 0, meaning that no Large Object data can be stored in the file. If you plan to have Large Object data in the file, you must set the ESIZE to a minimum value of 20.
When you initialize a file with ESIZE set to 20 or greater, the first 17 pages of Table E are used for Table E internal structures. Immediately after initialization the other Table E parameters are:
Each time you store another Large Object the data begins on the next available Table E page. There is no reuse capability for Table E. So, you must estimate in advance the size of the Large Object data and how many pages you will need.
Calculating Table E size
- The first page of Table E is reserved for the existence bitmap of page map page numbers.
- The next fifteen pages of Table E are reserved for the page map pages.
- The seventeenth page is the first bitmap page. Subsequent bitmap pages are allocated as needed and are therefore intermingled with the Large Object data pages.
Use the following steps and formulas to determine how many Table E pages you need:
Calculate the pages-to-hold-data as:
For each Large Object field, add the Large Object field data bytes to 6139, then divide by 6140 and multiply by the number of Large Object fields. For example, if a Large Object field is 7000 bytes, it will require two Table E pages. Using this calculation, determine the total pages-to-hold-data.
Example: 5,000 Large Object fields with a length of 7000:
7000/6144 rounded up = 2 pages multiplied by 5,000 fields = 10,000 pages-to-hold-data
- Calculate the pages-to-hold-bitmaps as:
17 + (pages-to-hold-data /49152) + 1
- Calculate the ESIZE setting you need as:
pages-to-hold-data + pages-to-hold-bitmaps + 2
Calculating sample total file size
The total file size for this example is:
File-size = 8 + ASIZE + BSIZE + CSIZE + DSIZE = 8 + 3 + 766 + 330 + 550 = 1657 pages (237 tracks on a 3380)
Space calculation worksheet
This worksheet lists all the equations used in this topic to calculate the number of pages needed for a Model 204 file.
1. Model 204 Usable Page Size constant = 6144
Calculate Table A size
Use the following variables in Equations 2 through 6:
|A||Number of field names.|
|B||Number of FEW-VALUED FRV or CODED values.|
|C||Number of MANY-VALUED FRV or CODED values.|
|D||Maximum number of digits in a NUMERIC RANGE field + 3.|
|S||Sum of all D's for all NUMERIC RANGE fields.|
|T||Total number of strings: A + B + S + C.|
|V||Space needed by FEW-VALUED FRV or CODED values and value overhead.|
|W||Space needed by MANY-VALUED FRV or CODED values and value overhead.|
|N||Space needed by field names and names overhead.|
- L represents the length of each string:
2. L = (V + N + W) / T
- The ASTRPPG parameter represents the character strings per Table A page:
3. ASTRPPG = 6144 / L
- The ATRPG parameter represents the number of attribute pages:
4. ATRPG = 1.1 * N / (6144 - (ASTRPPG * 2) - 2)
ATRPG = 1.1 * (A + S) / ASTRPPG
- The FVFPG parameter represents the number of FEW-VALUED pages:
5. FVFPG = 1.2 * V / (6144 - (ASTRPPG * 2) - 2)
FVFPG= 1.2 * (B / ASTRPPG)
- The MVFPG parameter represents the number of MANY-VALUED pages:
6. MVFPG = 1.2 * (V / (6144 - (ASTRPPG * 2) - 2) )
MVFPG = 1.2 * (B / ASTRPPG)
- The ASIZE parameter represents the size of Table A:
7. ASIZE = ATRPG + FVFPG + MVFPG (calculated by Model 204)
Calculate Table B size
Use the following variable in Equations 8 through 10:
|R||Average record size.|
- The BRECPPG parameter represents table records per page:
8. BRECPPG = 1.1 * 6140 / R
- The BSIZE parameter represents the size of Table B:
9. BSIZE = 1.2 * (Total-number-of-records / BRECPPG) 10. BRESERVE = Reserved Table B space
Calculate Table C size
- The N variable represents the file size multiplier:
11. N = Number-of-records-in-the-file / (8 * Page-size) 12. VU = total number of pairs in category U 13. VN = total number of pairs in category N 14. VR = total number of extra entries for all NUM RANGE retrieval fields
- The CSIZE parameter represents the size of Table C:
15. CSIZE = 1.2 * ((14 * VU) + 7 * (N+1)(VN + VR)) / 6140
Calculate Table D size
Estimate the size of the Ordered Index
- Use the following variables in Equations 16 through 24:
Value Equals... NE Total number of distinct values in the ORDERED field N Number of segments in the file
16. AV = the estimated av. length of ORDERED values + 1 17. ValA = number of values in Category A 18. ValB = number of values in Category B 19. ValC = number of values in Category C
- ENa = The total length of the Ordered Index entries placed in category A:
20. ENa = ValA * (AV + 3)
- ENb = The total length of the Ordered Index entries placed in category B:
21. ENb = ValB * (AV + (2 * AB) + (2 * N)
- ENc = The total length of the Ordered Index entries placed in category C:
22. ENc = ValC * (AV + (5 * N))
- The OIB parameter represents the total length of all Ordered Index entries:
23. IB = ENa + ENb + ENc =
- The value of LOe represents the expected leaf page overhead:
24. LOe = 6144 * (LRESERVE / 100)
LOe = 6144 * ((100 - SPLITPCT) / 100)
- The value of AE represents the average number of bytes per Ordered Index entry:
25. AE = OIB / EN
- The value of LOmin represents the minimum leaf page overhead:
26. LOmin = 2 * 6144 / AE
- The value of LOa represents the leaf page overhead:
27. LOa = max(LOe, LOmin)
- The value of LP represents the number of leaf pages required:
28. LP = OIB / (6120 - LOa)
- The value of OI represents the size of the Ordered Index for each field:
29. OI = (LP * 1.01)
- OIT = Total size of the Ordered Index:
30. OIT = OI1 + OI2 + ... + OIn
Calculate index list space
31. DRESERVE = % of space reserved for Table D expansion
- The value of T represents the number of bytes required for category A fieldname = value pairs:
32. T = 2 + (2 * (no. of records with fieldname=value pair))
- The value of X represents the total number of bytes available per Table D page:
33. X = 6144 * (1 - (DRESERVE / 100))
- The value of A represents the total number of pages required by the category A pairs for the segment:
34. A = T / X 35. B = total number of pages required by pairs in category B
- The value of T' represents the extra bytes required for NUMERIC RANGE fields:
36. T' = 2 + (2 * (Number of Records Containing the Field)) 37. B' = number of extra values 38. T" = sum of all values of T' 39. B" = sum of all values of B'
- The value of C represents the total extra pages required:
40. C = (T" / X) + B
- The value of I represents the index list space for each segment:
41. I = A + B + C
- The value of IT represents the total index list space required:
42. IT = A1 + B1 + C1 + ... + An + Bn + Cn + N
Calculate space for preallocated fields
43. F = number of Table D pages for preallocated fields
Calculate the size of the procedure dictionary
44. P = total number of procedures
45. S = average size of procedure entry
- PDSTRPPG = the maximum number of procedure entries per procedure dictionary page:
46. PDSTRPPG = 6144 / S 47. PDSIZE = 1.4 * P / PDSTRPPG
Calculate the size of the ACT
- The value of LE represents the length of entries for each user class:
48. LE = 4 + (2 * NPCLASS)
- The value of LET represents the total length of the user class entries:
49. LET = LE1 + LE2 + ... + LEn
- The value of Q represents the number of pages required for the ACT:
50. Q = LET / 6144
Calculate the final size of Table D
51. DEST = OIT + IT + F + P + (K * PDSIZE) + Q
52. If DEST < 1900, DPGSRES = DEST/50 + 2
- Or, if
DEST >= 1900and
DPGSRES = 40:
53. DSIZE = OIT + IT + F + P + (K * PDSIZE) + Q + DPGSRES
DSIZE = DEST + DPGSRES
Calculate the final size for Table E
Calculate ESIZE, EHIGHPG, and EPGSUSED as described above in Sizing Table E.
Calculate the total pages required
Total pages required = 8 + ASIZE + BSIZE + CSIZE + DSIZE + ESIZE
File description worksheet
Use the following sample worksheet when compiling a list of parameters to be set during file creation. Values for many of the parameters are computed from the formulas shown in this topic. Other parameters are discussed in their individual M204wiki pages.
File Name ____________________________________________ Record Security Field Name ___________________________ Sort/Hash Key Field Name _____________________________ Parameter Value Parameter Value FILEORG ____________ CSIZE ____________ FOPT ____________ DRESERVE ____________ FRCVOPT ____________ DPGSRES ____________ ASTRPPG ____________ PDSIZE ____________ ATRPG ____________ PDSTRPPG ____________ FVFPG ____________ DSIZE ____________ MVFPG ____________ DAUTOINC ____________ OPENCTL ____________ BRECPPG ____________ PRIVDEF ____________ BRESERVE ____________ PRCLDEF ____________ BPGPMSTR ____________ SELLVL ____________ BPFPOVFL ____________ READLVL ____________ BEXTOVFL ____________ UPDTVL ____________ BREUSE ____________ ADDLVL ____________ BSIZE ____________ ESIZE ____________ BAUTOINC ____________ BRLIMSZ ____________ XSIZE ____________ RECROPT ____________ XAUTOINC ____________