DEFINE FIELD command: Difference between revisions
Line 306: | Line 306: | ||
==CHUNK attribute== | ==CHUNK attribute== | ||
The CHUNK field attribute is available in Model 204 Relase 7.5. It improves the efficiency of range finds using ordered index processing. | |||
<p>The CHUNK attribute enables more efficient searching on ordered index numeric range [[Field_Design_(File_Management)#ORDERED_and_NON-ORDERED_attributes|(ORDERED NUMERIC)]] fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size. </p> | <p>The CHUNK attribute enables more efficient searching on ordered index numeric range [[Field_Design_(File_Management)#ORDERED_and_NON-ORDERED_attributes|(ORDERED NUMERIC)]] fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size. </p> | ||
Revision as of 22:04, 21 August 2013
Summary
- Status
- New in Model 204 release 7.5:
- fields defined within field groups and the attributes that are available when FILEORG=X’100’ is used to define the file.
- CHUNK field attribute
- Privileges
- Any user
- Function
- Defines the names and attributes of the fields in a Model 204 file
Syntax
For a field
DEFINE [FIELD] name [(attribute [attribute]...)]
or
DEFINE [FIELD] name WITH [attribute [attribute]...]
For a field within a field group
DEFINE [FIELD] name [(attribute [attribute]...)] WITH FIELDGROUP [fieldgroupname | *]
or
DEFINE [FIELD] name WITH FIELDGROUP [fieldgroupname | *] [AND attribute [attribute]...]
Where:
name or fieldgroupname | (1 to 255 characters) must begin with a letter and can contain any alphanumeric characters (including space) except:
?? ?$ ?& @ (or an alternative delete character) # (or an alternative flush character) ; name or fieldgroupname must not start with a reserved word or character. An embedded reserved word or character must be part of a string preceded and followed by a single quotation mark. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
attribute | is one of the field characteristics listed in the table below. Note that each attribute has a default. Therefore, when selected, the attribute need be specified only if it differs from the default. If no attributes are specified, the field is defined with all the defaults. See Field Design (File Management) for more information on field attributes.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
WITH FIELDGROUP fieldgroupname | identifies the containing field group. When fieldgroupname is specified as an asterisk (*), it means that the field can be included in any field group as well as existing outside of any field group. |
Conflicting attributes, listed in the table below, cannot be specified together.
This attribute... | Cannot be specified with the attribute(s)... |
---|---|
BINARY, BLOB, CLOB, or CODED | LENGTH |
CHUNK | CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-TIMEUTC, CREATE-USER, UPDATE-TIME, UPDATE-TIMEUTC, UPDATE-USER |
CONCATENATION-OF | BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, DEFAULT-VALUE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, UPDATE AT END |
COUNT-OCCURRENCES-OF | BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, DBCS, DEFAULT-VALUE, INVISIBLE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END |
CREATE-TIME | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, STRING, UNIQUE, UPDATE AT END |
CREATE-TIMEUTC | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, FLOAT, INVISIBLE, NOT UNIQUE, NUMERIC RANGE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, STRING, UPDATE AT END |
CREATE-USER | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END |
DATETIME | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, CODED, CONCATENATION-OF, CREATE-USER, KEY, OCCURS, STRING, UPDATE AT END, UPDATE-USER |
DATETIME-GE, DATETIME-GT, DATETIME-LE, DATETIME-LT | CODED, CONCATENATION-OF, CREATE-USER, KEY, OCCURS, UPDATE AT END, UPDATE-USER |
DEFAULT-VALUE | REPEATABLE |
ESCAPE | SEPARATOR NONE |
EXACTLY-ONE | CONCATENATION-OF, INVISIBLE, LITERAL, OCCURS, STORE-NULL, UNIQUE |
FEW-VALUED | MANY-VALUED |
FIELDGROUP | FRV, INVISIBLE, NUMERIC RANGE, OCCURS |
FLOAT | BINARY, INVISIBLE, NUMERIC RANGE, and STRING |
FLOAT-GE, FLOAT-GT, FLOAT-LE, FLOAT-LT | Other range attributes |
FRV | NON-KEY |
IMMED | NON-ORDERED |
INVISIBLE | NON-KEY, NON-RANGE, and NON-ORDERED;
UPDATE IN PLACE or UPDATE AT END |
LENGTH-EQ | Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, LENGTH-GE, LENGTH-LE |
LENGTH-GE, LENGTH-LLE | Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, LENGTH-EQ |
LIKE | Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT |
LRESERVE | NON-ORDERED |
MANY-VALUED | FEW-VALUED |
MINLOBE | OCCURS |
NO-CONSTRAINTS | Any constraint attribute |
NO-DEFAULT-VALUE | DEFAULT-VALUE |
NON-CODED and NON-FRV | MANY-VALUED or FEW-VALUED |
NON-KEY, NON-ORDERED, and NON-RANGE | DEFERRABLE or NON-DEFERRABLE |
NRESERVE | NON-ORDERED |
NUMERIC RANGE | OCCURS n (if n > 1) |
OCCURS | INVISIBLE |
ORDERED | FRV |
ORDERED NUMERIC | NUMERIC RANGE |
PAD character | LENGTH |
SPLITPCT | NON-ORDERED |
STORE-DEFAULT | CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-USER, UPDATE-TIME, UPDATE-USER |
STORE-NULL | BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-USER, UPDATE-TIME, UPDATE-USER |
UNIQUE | DEFERRABLE |
UPDATE-TIME | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, REPEATABLE, UNIQUE, UPDATE AT END, UPDATE-TIMEUTC |
UPDATE-TIMEUTC | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, REPEATABLE, UNIQUE, UPDATE AT END |
UPDATE-USER | BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END |
Paired attributes, listed in the table below, must be specified together.
This attribute... | Must be specified with the attribute(s)... |
---|---|
CHUNK | ORDERED NUMERIC and INVISIBLE |
CONCATENATION-OF | AT-MOST-ONE or EXACTLY-ONE or OCCURS 1 |
COUNT-OCCURRENCES-OF | AT-MOST-ONE or EXACTLY-ONE or OCCURS 1 |
CREATE-TIME | AT-MOST-ONE |
CREATE-TIMEUTC | AT-MOST-ONE |
DEFAULT-VALUE | AT-MOST-ONE or EXACTLY-ONE |
ESCAPE | CONCATENATION-OF |
FEW-VALUED | FRV or CODED |
FLOAT | LENGTH |
MANY-VALUED | FRV or CODED |
MINLOBE | BINARY LARGE OBJECT or CHARACTER LARGE OBJECT |
OCCURS | CODED, BINARY, or LENGTH |
PAD | LENGTH |
SEPARATOR | CONCATENATION-OF |
STORE-DEFAULT | DEFAULT-VALUE, AT-MOST-ONE, or EXACTLY-ONE |
UNIQUE | ORDERED |
UPDATE-TIME, UPDATE-TIMEUTC | AT-MOST-ONE |
UNIQUE | ORDERED |
Syntax notes
Attributes must be separated by commas or by one or more blanks.
You can specify DEFINE FIELD only in file context.
Example
DEFINE FIELD CUSTID WITH KEY FRV RANGE DEFINE FIELD AGE WITH KEY RANGE OCCURS 1 LENGTH 3 DEFINE FIELD EMPLOYEE
Usage notes
The DEFINE FIELD command describes the names and attributes of the fields in a Model 204 file. The file must be initialized, using the INITIALIZE command, before fields can be defined.
Note: Sort key fields for sorted files, hash key fields, and record security fields can be defined only with the INITIALIZE command, not with the DEFINE FIELD command.
Once fields have been defined, a user with the requisite privileges can perform the following commands on them:
You can define three types of floating point fields through the FLOAT and LENGTH attributes. The table below shows the characteristics of the three types.
Type | Number of bytes | Maximum number of significant digits |
---|---|---|
Short precision | 4 | 6 |
Long precision | 8 | 15 |
Extended precision | 16 | 31 |
You supply a LENGTH attribute of 4, 8, or 16. The following example defines a long-precision floating-point field:
DEFINE FIELD SALARY WITH FLOAT LEN 8
Consider making a field a floating-point field if its value is always or often numerical. Very large numbers or numbers with many significant digits use much less space in floating-point form than they do in character form. Floating-point fields are valid only for files with the format of Release 8.0 or later or that have been converted by the TRANSFORM FLOAT command. Floating-point numbers in ORDERED NUMERIC trees are discussed as follows.
Defining a field group field
A field defined within a field group supports the same default attributes as fields defined outside or independent of a field group, except for REPEATABLE. A field defined within a field group defaults to EXACTLY-ONE. You can override the default with REPEATABLE. The frequency of occurrence of FieldName can be:
- EXACTLY-ONE, the default for a field defined within a DEFINE FIELDGROUP block
- REPEATABLE, the default for a field defined outside of a field group
- AT-MOST-ONE
ORDERED attribute
The ORDERED attribute is used to define fields for the Ordered Index feature. The ORDERED attribute can be followed by a tree type to specify the kind of ordering for the index. The tree type can be either CHARACTER (CHAR) or NUMERIC (NUM). If a tree type is not provided, a default tree type is used.
Syntax
The format for the DEFINE FIELD command with solely the ORDERED option and its related attributes is as follows:
DEFINE [FIELD] name WITH [ORDERED [CHARACTER | NUMERIC] [LRESERVE n] [NRESERVE n] [SPLITPCT n] [IMMED n]]
Where:
LRESERVE | defines the percentage of space to leave free on the leaf page during deferred updates or a REORGANIZE command. The default is 15 percent. The value must be between 0 and 99. |
---|---|
NRESERVE | defines the percentage of space to leave free on the node page during deferred updates or a REORGANIZE command. The default is 15 percent. The value must be between 0 and 99. |
SPLITPCT | defines the percentage of the node data to go to the left node when a split occurs during nondeferred User Language or IFAM requests. The default is 50 percent. The value must be between 1 and 100. |
IMMED | defines the number of immediate pointers per segment in the Ordered Index. The default is 1. The value must be between 0 and 255. |
Usage notes
The LRESERVE, NRESERVE, SPLITPCT, and IMMED parameters represent approximate values. The actual movement of data during an update might not conform exactly to the value of the attribute in question.
The default ordering for a field is NON-ORDERED. Ordered Index attributes do not apply for a NON-ORDERED field.
If a field is defined as ORDERED without specifying the type of tree to create, the default depends on whether the field is STRING, BINARY, or FLOAT. If STRING, an ORDERED CHARACTER tree is defined. If FLOAT or BINARY, an ORDERED NUMERIC tree is defined.
The UNIQUE attribute specifies that there can be only one occurrence of a unique field in the file. Multiple occurrences are not allowed.
For a detailed discussion of field attributes, see Field Design (File Management).
CHUNK attribute
The CHUNK field attribute is available in Model 204 Relase 7.5. It improves the efficiency of range finds using ordered index processing.
The CHUNK attribute enables more efficient searching on ordered index numeric range (ORDERED NUMERIC) fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size.
Syntax
The format for the DEFINE FIELD command with the CHUNK option and its related attributes is as follows.
Note that the "CHUNK" keyword can be abbreviated "CNK".
DEFINE FIELD chunkFieldName WITH INVISIBLE ORDERED NUMERIC {CHUNK|CNK} chunkSize FOR chunkTargetFieldName
Where:
chunkFieldName | A standard M204 field name. |
---|---|
chunkSize | A positive integer that is an integral multiple or divisor of all CHUNK fields previously defined for chunkTargetFieldName.
Chunk size also cannot be the same as the chunk size for any other CHUNK field defined for chunkTargetFieldName. If either chunk size rule is violated, the DEFINE command is rejected with an error message. |
chunkTargetFieldName | A previously defined ORDERED NUMERIC VISIBLE field (it must be defined before any CHUNK fields that reference it). The
DEFINE command is rejected with an error message if the chunk target is not ORDERED NUMERIC VISIBLE. |
Ordered index versus hash index processing: advantages and disadvantages
Ordered index processing provides the following advantages over hash index (Table C) processing:
- File management is easier, because you don't need to determine the best index size when building or reorganizing a file: if the index (CSIZE) changes, you can add pages as needed.
- For Each Value processing is easily and efficiently done.
- Management of invisible keys is easier because invisible key values can still be retrieved.
- Once the buffer pools are big enough to hold the higher level OI nodes, ordered index processing is often less I/O intensive that Table C processing.
However, until now, the most efficient way to search on a range of data has been to use the NUMERIC RANGE field attribute and the Table C index. But using NUMERIC RANGE fields poses these problems:
- They are difficult to update, often requiring dozens of invisible key updates for a NUMERIC RANGE field update.
- They can only occur once per record
- They are not optimal from a retrieval perspective, decomposing a numeric range find into a lot of individual, under the covers invisible field finds. In addition, this decomposition process currently disables shared (resident) QTBL processing.
Range finds on a date field using the ordered index
While one can do range finds currently using the ordered index, these finds often have surprisingly poor performance. To see why, consider the case of a date field (YYYYMMDD). Finding all the records that have a date time stamp between, for example, 20121211 and 20130205 requires traversing around 55 ordered index values and ORing together the records for each of those values. As you can imagine, this can get very expensive. As the range gets bigger the cost continues to go up and a large range would be exorbitantly expensive.
Range finds on a date field using the CHUNK attribute
The solution is to chunk the data using the DEFINE FIELD command and a new field attribute called CHUNK. CHUNK is followed by a number and then a FOR <fieldname>.
So, to illustrate:
DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMM WITH ORDERED NUMERIC INVISIBLE CHUNK 100 FOR YYYYMMDD
This requires a FILEORG X'100' file because FILEORG X'100' provides a nice structure for creating automatic fields and fields that are related to other fields. A CHUNK field is automatically maintained based on updates to the base (FOR) field. The stored value is the value of the base field rounded down to the CHUNK size. So, for example, if you stored a value of 20121225 into YYYYMMDD, the underlying code would automatically store the value 20121200 into YYYYMM. Similarly, if you stored 20121203 into YYYYMMDD, a 20121200 value would also be stored into YYYYMM.
When doing a range find, Model 204 decomposes the find into range finds based on the CHUNK field and the base field. So, for example, our find for values between 20121211 and 20130205 would be decomposed into a find for:
- YYYYMMDD between 20121211 and 20121299 (ignoring for the moment issues with non-integer values and demonstrating that CHUNK doesn't really know or care if the field is a date)
- OR YYYYMM between 20130100 and 20130100 (or more precisely, exactly equal to 20130100)
- OR YYYYMMDD between 20130200 and 20130205
This reduces the number of scanned ordered index entries by a factor of 2. Results would often be much better than that. For example, it probably would not be atypical for a range to land on month boundaries, resulting in improvements closer to a factor of 30. And, as the range got bigger and bigger, more and more of the range processing would occur on the CHUNK field, with additional improvement in processing.
Multiple CHUNK fields
To get even better performance, you could define multiple CHUNK fields for a base field:
DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMMD WITH INVISIBLE ORDERED NUMERIC CHUNK 10 FOR YYYYMMDD DEFINE FIELD YYYYMM WITH INVISIBLE ORDERED NUMERIC CHUNK 100 FOR YYYYMMDD DEFINE FIELD YYYY WITH INVISIBLE ORDERED NUMERIC CHUNK 10000 FOR YYYYMMDD
The chunk size for a CHUNK field must be a greater-than-1 integral multiple of all smaller chunk sizes for the same base field. The following chunking is not accepted because 100 is not an integral multiple of 7:
INVALID:
DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMMD WITH INVISIBLE ORDERED NUMERIC CHUNK 7 FOR YYYYMMDD DEFINE FIELD YYYYMM WITH INVISIBLE ORDERED NUMERIC CHUNK 100 FOR YYYYMMDD DEFINE FIELD YYYY WITH INVISIBLE ORDERED NUMERIC CHUNK 10000 FOR YYYYMMDD
This restriction makes decomposition of a range find on a base field relatively straightforward.
Note that you could reference the CHUNK fields directly in non-updating statements, as shown in this example:
fd to %recset yyyy eq %year end find
In any case, the number and size of the chunk fields allows a DBA to trade off the overhead of doing updates against the efficiency of doing range finds.
Usage notes
- CHUNK fields are an AUTOMATIC field type. The CHUNK type is mutually exclusive with all other AUTOMATIC field types (CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-TIMEUTC, CREATE-USER, UPDATE-TIME, UPDATE-TIMEUTC, UPDATE-USER).
- CHUNK fields must be ORDERED NUMERIC INVISIBLE.
- CHUNK fields can only be defined for ORDERED NUMERIC VISIBLE fields.
- Any given CHUNK target field can have at most 20 CHUNK fields defined for it.
File Restrictions When Using CHUNK fields
- CHUNK fields can only be defined in a FILEORG X'100' file.
- Once a file has CHUNK fields defined, it cannot be opened with a Model 204 release prior to 7.5.
- A CHUNK field cannot be defined in a non-emtpy file.
Commands for Chunk Fields
The following other commands can use chunk fields:
- DISPLAY FIELD
- REDEFINE FIELD. You cannot REDEFINE a non-chunk field to a chunk field or vice versa. See REDEFINE FIELD for additional rules.
Date/time stamp field
Specifying the date/time stamp field name
The system parameter DTSFN specifies the date/time stamp field name. This field name is the same for all files participating in DTS processing in a run. Though this is a system-level parameter, the feature is a file-level feature.
The date/time stamp field name you specify is not edited for validity.
The DTSFN value can be specified in CUST source code so you do not have to change all your jobs to specify the parameter. By default, Model 204 does not provide a date/time stamp field name.
Specifying a date/time stamp field definition
A date/time stamp field is defined as is any other Model 204 field but has the following field attribute restrictions:
-
A date/time field cannot be INVISIBLE or UNIQUE.
-
If the date/time stamp field is preallocated, OCCURS cannot be greater than 1.
Disregarding these restrictions causes the following error:
M204.2728: DATE TIME STAMP FIELD MAY NOT BE INVISIBLE, UNIQUE, NOR HAVE MULTIPLE OCCURRENCES
Rocket Software recommends using the ORDERED CHARACTER attribute. By default Model 204 generates a character string date/time stamp field value.
Defining the date/time stamp field value format
Your custom code formats the data entered into the field. The format for the default date/time stamp string is:
CCYYMMDDHHMMSSMMM
Note: The DTSFN field can have differing definitions in different Model 204 databases.