DEFINE FIELD command: Difference between revisions

From m204wiki
Jump to navigation Jump to search
Line 306: Line 306:


==CHUNK attribute==
==CHUNK attribute==
The CHUNK field attribute is available in Model 204 Relase 7.5.  It improves the efficiency of range finds using ordered index processing.
<p>The CHUNK attribute enables more efficient searching on ordered index numeric range [[Field_Design_(File_Management)#ORDERED_and_NON-ORDERED_attributes|(ORDERED NUMERIC)]] fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size.  </p>
<p>The CHUNK attribute enables more efficient searching on ordered index numeric range [[Field_Design_(File_Management)#ORDERED_and_NON-ORDERED_attributes|(ORDERED NUMERIC)]] fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size.  </p>



Revision as of 22:04, 21 August 2013

Summary

Status
New in Model 204 release 7.5:
  • fields defined within field groups and the attributes that are available when FILEORG=X’100’ is used to define the file.
  • CHUNK field attribute
Privileges
Any user
Function
Defines the names and attributes of the fields in a Model 204 file

Syntax

For a field

DEFINE [FIELD] name [(attribute [attribute]...)]

or

DEFINE [FIELD] name WITH [attribute [attribute]...]


For a field within a field group

DEFINE [FIELD] name [(attribute [attribute]...)] WITH FIELDGROUP [fieldgroupname | *]

or

DEFINE [FIELD] name WITH FIELDGROUP [fieldgroupname | *] [AND attribute [attribute]...]


Where:

name or fieldgroupname (1 to 255 characters) must begin with a letter and can contain any alphanumeric characters (including space) except:

??

?$

?&

@ (or an alternative delete character)

# (or an alternative flush character)

 ;

name or fieldgroupname must not start with a reserved word or character. An embedded reserved word or character must be part of a string preceded and followed by a single quotation mark.

attribute is one of the field characteristics listed in the table below. Note that each attribute has a default. Therefore, when selected, the attribute need be specified only if it differs from the default. If no attributes are specified, the field is defined with all the defaults. See Field Design (File Management) for more information on field attributes.


Field attributes
Attribute Abbreviation Default
AT-MOST-ONE

REPEATABLE

ONE

REPT

REPT (except in 1NF files)
BINARY BIN Not a binary field
BINARY-LARGE-OBJECT BLOB Not binary large object
CHARACTER-LARGE-OBJECT CLOB Not character large object
CODED

NON-CODED

COD

NCOD

NON-CODED
DATETIME DT Data format YYYYMMDDHHMMSSXXXXXX
(the nearest millionth of a second)
DEFERRABLE

NON-DEFERRABLE

DEF

NDEF

DEFERRABLE
FEW-VALUED

MANY-VALUED

FV

MV

MANY-VALUED
FLOAT FL Not a floating point field
FRV (for-each-value)

NON-FRV

--

NFRV

NON-FRV

KEY

NON-KEY

--

NKEY

NON-KEY

LENGTH m LEN No preallocated length
LEVEL k LVL No level
LRESERVE LRES 15
NRESERVE NRES 15
NUMERIC RANGE

NON-RANGE

RANGE, NR

NNR

NON-RANGE
OCCURS n OCC No preallocated occurrences
ORDERED

NON-ORDERED

ORD

NON-ORD

Non-ordered field
ORDERED CHARACTER ORD CHAR --
ORDERED NUMERIC ORD NUM --
SPLITPCT SPLT 50
IMMED IMM 1
PAD character -- PAD X'00'
STRING

BINARY

STR

BIN

STRING
UPDATE IN PLACE

UPDATE AT END

UP

UE

UPDATE IN PLACE
VISIBLE

INVISIBLE

VIS

INV

VISIBLE
UNIQUE

NON-UNIQUE

UNIQ

NUNIQ

NON-UNIQUE
WITH FIELDGROUP fieldgroupname identifies the containing field group. When fieldgroupname is specified as an asterisk (*), it means that the field can be included in any field group as well as existing outside of any field group.

Conflicting attributes, listed in the table below, cannot be specified together.

Conflicting field attributes
This attribute... Cannot be specified with the attribute(s)...
BINARY, BLOB, CLOB, or CODED LENGTH
CHUNK CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-TIMEUTC, CREATE-USER, UPDATE-TIME, UPDATE-TIMEUTC, UPDATE-USER
CONCATENATION-OF BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, DEFAULT-VALUE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, UPDATE AT END
COUNT-OCCURRENCES-OF BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, DBCS, DEFAULT-VALUE, INVISIBLE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END
CREATE-TIME BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, STRING, UNIQUE, UPDATE AT END
CREATE-TIMEUTC BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, FLOAT, INVISIBLE, NOT UNIQUE, NUMERIC RANGE, OCCURS > 1, REPEATABLE, STORE-DEFAULT, STORE-NULL, STRING, UPDATE AT END
CREATE-USER BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END
DATETIME BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, CODED, CONCATENATION-OF, CREATE-USER, KEY, OCCURS, STRING, UPDATE AT END, UPDATE-USER
DATETIME-GE, DATETIME-GT, DATETIME-LE, DATETIME-LT CODED, CONCATENATION-OF, CREATE-USER, KEY, OCCURS, UPDATE AT END, UPDATE-USER
DEFAULT-VALUE REPEATABLE
ESCAPE SEPARATOR NONE
EXACTLY-ONE CONCATENATION-OF, INVISIBLE, LITERAL, OCCURS, STORE-NULL, UNIQUE
FEW-VALUED MANY-VALUED
FIELDGROUP FRV, INVISIBLE, NUMERIC RANGE, OCCURS
FLOAT BINARY, INVISIBLE, NUMERIC RANGE, and STRING
FLOAT-GE, FLOAT-GT, FLOAT-LE, FLOAT-LT Other range attributes
FRV NON-KEY
IMMED NON-ORDERED
INVISIBLE NON-KEY, NON-RANGE, and NON-ORDERED;

UPDATE IN PLACE or UPDATE AT END

LENGTH-EQ Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, LENGTH-GE, LENGTH-LE
LENGTH-GE, LENGTH-LLE Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, LENGTH-EQ
LIKE Any attribute that is automatically maintained by the system; BINARY LARGE OBJECT, CHARACTER LARGE OBJECT
LRESERVE NON-ORDERED
MANY-VALUED FEW-VALUED
MINLOBE OCCURS
NO-CONSTRAINTS Any constraint attribute
NO-DEFAULT-VALUE DEFAULT-VALUE
NON-CODED and NON-FRV MANY-VALUED or FEW-VALUED
NON-KEY, NON-ORDERED, and NON-RANGE DEFERRABLE or NON-DEFERRABLE
NRESERVE NON-ORDERED
NUMERIC RANGE OCCURS n (if n > 1)
OCCURS INVISIBLE
ORDERED FRV
ORDERED NUMERIC NUMERIC RANGE
PAD character LENGTH
SPLITPCT NON-ORDERED
STORE-DEFAULT CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-USER, UPDATE-TIME, UPDATE-USER
STORE-NULL BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-USER, UPDATE-TIME, UPDATE-USER
UNIQUE DEFERRABLE
UPDATE-TIME BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, REPEATABLE, UNIQUE, UPDATE AT END, UPDATE-TIMEUTC
UPDATE-TIMEUTC BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DBCS, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, REPEATABLE, UNIQUE, UPDATE AT END
UPDATE-USER BINARY, BINARY LARGE OBJECT, CHARACTER LARGE OBJECT, DATETIME, FLOAT, INVISIBLE, NUMERIC RANGE, OCCURS > 1, STORE-DEFAULT, STORE-NULL, UNIQUE, UPDATE AT END

Paired attributes, listed in the table below, must be specified together.

Paired field attributes
This attribute... Must be specified with the attribute(s)...
CHUNK ORDERED NUMERIC and INVISIBLE
CONCATENATION-OF AT-MOST-ONE or EXACTLY-ONE or OCCURS 1
COUNT-OCCURRENCES-OF AT-MOST-ONE or EXACTLY-ONE or OCCURS 1
CREATE-TIME AT-MOST-ONE
CREATE-TIMEUTC AT-MOST-ONE
DEFAULT-VALUE AT-MOST-ONE or EXACTLY-ONE
ESCAPE CONCATENATION-OF
FEW-VALUED FRV or CODED
FLOAT LENGTH
MANY-VALUED FRV or CODED
MINLOBE BINARY LARGE OBJECT or CHARACTER LARGE OBJECT
OCCURS CODED, BINARY, or LENGTH
PAD LENGTH
SEPARATOR CONCATENATION-OF
STORE-DEFAULT DEFAULT-VALUE, AT-MOST-ONE, or EXACTLY-ONE
UNIQUE ORDERED
UPDATE-TIME, UPDATE-TIMEUTC AT-MOST-ONE
UNIQUE ORDERED

Syntax notes

Attributes must be separated by commas or by one or more blanks.

You can specify DEFINE FIELD only in file context.

Example

DEFINE FIELD CUSTID WITH KEY FRV RANGE DEFINE FIELD AGE WITH KEY RANGE OCCURS 1 LENGTH 3 DEFINE FIELD EMPLOYEE

Usage notes

The DEFINE FIELD command describes the names and attributes of the fields in a Model 204 file. The file must be initialized, using the INITIALIZE command, before fields can be defined.

Note: Sort key fields for sorted files, hash key fields, and record security fields can be defined only with the INITIALIZE command, not with the DEFINE FIELD command.

Once fields have been defined, a user with the requisite privileges can perform the following commands on them:

You can define three types of floating point fields through the FLOAT and LENGTH attributes. The table below shows the characteristics of the three types.

Types of floating point fields
Type Number of bytes Maximum number of significant digits
Short precision 4 6
Long precision 8 15
Extended precision 16 31

You supply a LENGTH attribute of 4, 8, or 16. The following example defines a long-precision floating-point field:

DEFINE FIELD SALARY WITH FLOAT LEN 8

Consider making a field a floating-point field if its value is always or often numerical. Very large numbers or numbers with many significant digits use much less space in floating-point form than they do in character form. Floating-point fields are valid only for files with the format of Release 8.0 or later or that have been converted by the TRANSFORM FLOAT command. Floating-point numbers in ORDERED NUMERIC trees are discussed as follows.

Defining a field group field

A field defined within a field group supports the same default attributes as fields defined outside or independent of a field group, except for REPEATABLE. A field defined within a field group defaults to EXACTLY-ONE. You can override the default with REPEATABLE. The frequency of occurrence of FieldName can be:

  • EXACTLY-ONE, the default for a field defined within a DEFINE FIELDGROUP block
  • REPEATABLE, the default for a field defined outside of a field group
  • AT-MOST-ONE

ORDERED attribute

The ORDERED attribute is used to define fields for the Ordered Index feature. The ORDERED attribute can be followed by a tree type to specify the kind of ordering for the index. The tree type can be either CHARACTER (CHAR) or NUMERIC (NUM). If a tree type is not provided, a default tree type is used.

Syntax

The format for the DEFINE FIELD command with solely the ORDERED option and its related attributes is as follows:

DEFINE [FIELD] name WITH [ORDERED [CHARACTER | NUMERIC] [LRESERVE n] [NRESERVE n] [SPLITPCT n] [IMMED n]]

Where:

LRESERVE defines the percentage of space to leave free on the leaf page during deferred updates or a REORGANIZE command. The default is 15 percent. The value must be between 0 and 99.
NRESERVE defines the percentage of space to leave free on the node page during deferred updates or a REORGANIZE command. The default is 15 percent. The value must be between 0 and 99.
SPLITPCT defines the percentage of the node data to go to the left node when a split occurs during nondeferred User Language or IFAM requests. The default is 50 percent. The value must be between 1 and 100.
IMMED defines the number of immediate pointers per segment in the Ordered Index. The default is 1. The value must be between 0 and 255.

Usage notes

The LRESERVE, NRESERVE, SPLITPCT, and IMMED parameters represent approximate values. The actual movement of data during an update might not conform exactly to the value of the attribute in question.

The default ordering for a field is NON-ORDERED. Ordered Index attributes do not apply for a NON-ORDERED field.

If a field is defined as ORDERED without specifying the type of tree to create, the default depends on whether the field is STRING, BINARY, or FLOAT. If STRING, an ORDERED CHARACTER tree is defined. If FLOAT or BINARY, an ORDERED NUMERIC tree is defined.

The UNIQUE attribute specifies that there can be only one occurrence of a unique field in the file. Multiple occurrences are not allowed.

For a detailed discussion of field attributes, see Field Design (File Management).

CHUNK attribute

The CHUNK field attribute is available in Model 204 Relase 7.5. It improves the efficiency of range finds using ordered index processing.

The CHUNK attribute enables more efficient searching on ordered index numeric range (ORDERED NUMERIC) fields by specifying data "chunks." CHUNK is used to define a related field containing data from the original base field rounded down to the CHUNK size.

Syntax

The format for the DEFINE FIELD command with the CHUNK option and its related attributes is as follows.

Note that the "CHUNK" keyword can be abbreviated "CNK".

DEFINE FIELD chunkFieldName WITH INVISIBLE ORDERED NUMERIC {CHUNK|CNK} chunkSize FOR chunkTargetFieldName

Where:

chunkFieldName A standard M204 field name.
chunkSize A positive integer that is an integral multiple or divisor of all CHUNK fields previously defined for chunkTargetFieldName.

Chunk size also cannot be the same as the chunk size for any other CHUNK field defined for chunkTargetFieldName. If either chunk size

rule is violated, the DEFINE command is rejected with an error message.
chunkTargetFieldName A previously defined ORDERED NUMERIC VISIBLE field (it must be defined before any CHUNK fields that reference it). The

DEFINE command is rejected with an error message if the chunk target is not ORDERED NUMERIC VISIBLE.

Ordered index versus hash index processing: advantages and disadvantages

Ordered index processing provides the following advantages over hash index (Table C) processing:

  • File management is easier, because you don't need to determine the best index size when building or reorganizing a file: if the index (CSIZE) changes, you can add pages as needed.
  • For Each Value processing is easily and efficiently done.
  • Management of invisible keys is easier because invisible key values can still be retrieved.
  • Once the buffer pools are big enough to hold the higher level OI nodes, ordered index processing is often less I/O intensive that Table C processing.

However, until now, the most efficient way to search on a range of data has been to use the NUMERIC RANGE field attribute and the Table C index. But using NUMERIC RANGE fields poses these problems:

  • They are difficult to update, often requiring dozens of invisible key updates for a NUMERIC RANGE field update.
  • They can only occur once per record
  • They are not optimal from a retrieval perspective, decomposing a numeric range find into a lot of individual, under the covers invisible field finds. In addition, this decomposition process currently disables shared (resident) QTBL processing.

Range finds on a date field using the ordered index

While one can do range finds currently using the ordered index, these finds often have surprisingly poor performance. To see why, consider the case of a date field (YYYYMMDD). Finding all the records that have a date time stamp between, for example, 20121211 and 20130205 requires traversing around 55 ordered index values and ORing together the records for each of those values. As you can imagine, this can get very expensive. As the range gets bigger the cost continues to go up and a large range would be exorbitantly expensive.

Range finds on a date field using the CHUNK attribute

The solution is to chunk the data using the DEFINE FIELD command and a new field attribute called CHUNK. CHUNK is followed by a number and then a FOR <fieldname>.

So, to illustrate:

DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMM WITH ORDERED NUMERIC INVISIBLE CHUNK 100 FOR YYYYMMDD

This requires a FILEORG X'100' file because FILEORG X'100' provides a nice structure for creating automatic fields and fields that are related to other fields. A CHUNK field is automatically maintained based on updates to the base (FOR) field. The stored value is the value of the base field rounded down to the CHUNK size. So, for example, if you stored a value of 20121225 into YYYYMMDD, the underlying code would automatically store the value 20121200 into YYYYMM. Similarly, if you stored 20121203 into YYYYMMDD, a 20121200 value would also be stored into YYYYMM.

When doing a range find, Model 204 decomposes the find into range finds based on the CHUNK field and the base field. So, for example, our find for values between 20121211 and 20130205 would be decomposed into a find for:

  • YYYYMMDD between 20121211 and 20121299 (ignoring for the moment issues with non-integer values and demonstrating that CHUNK doesn't really know or care if the field is a date)
  • OR YYYYMM between 20130100 and 20130100 (or more precisely, exactly equal to 20130100)
  • OR YYYYMMDD between 20130200 and 20130205

This reduces the number of scanned ordered index entries by a factor of 2. Results would often be much better than that. For example, it probably would not be atypical for a range to land on month boundaries, resulting in improvements closer to a factor of 30. And, as the range got bigger and bigger, more and more of the range processing would occur on the CHUNK field, with additional improvement in processing.

Multiple CHUNK fields

To get even better performance, you could define multiple CHUNK fields for a base field:

DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMMD WITH INVISIBLE ORDERED NUMERIC CHUNK 10 FOR YYYYMMDD DEFINE FIELD YYYYMM WITH INVISIBLE ORDERED NUMERIC CHUNK 100 FOR YYYYMMDD DEFINE FIELD YYYY WITH INVISIBLE ORDERED NUMERIC CHUNK 10000 FOR YYYYMMDD

The chunk size for a CHUNK field must be a greater-than-1 integral multiple of all smaller chunk sizes for the same base field. The following chunking is not accepted because 100 is not an integral multiple of 7:

INVALID:

DEFINE FIELD YYYYMMDD WITH ORDERED NUMERIC DEFINE FIELD YYYYMMD WITH INVISIBLE ORDERED NUMERIC CHUNK 7 FOR YYYYMMDD DEFINE FIELD YYYYMM WITH INVISIBLE ORDERED NUMERIC CHUNK 100 FOR YYYYMMDD DEFINE FIELD YYYY WITH INVISIBLE ORDERED NUMERIC CHUNK 10000 FOR YYYYMMDD

This restriction makes decomposition of a range find on a base field relatively straightforward.

Note that you could reference the CHUNK fields directly in non-updating statements, as shown in this example:

fd to %recset yyyy eq %year end find

In any case, the number and size of the chunk fields allows a DBA to trade off the overhead of doing updates against the efficiency of doing range finds.

Usage notes

  • CHUNK fields are an AUTOMATIC field type. The CHUNK type is mutually exclusive with all other AUTOMATIC field types (CONCATENATION-OF, COUNT-OCCURRENCES-OF, CREATE-TIME, CREATE-TIMEUTC, CREATE-USER, UPDATE-TIME, UPDATE-TIMEUTC, UPDATE-USER).
  • CHUNK fields must be ORDERED NUMERIC INVISIBLE.
  • CHUNK fields can only be defined for ORDERED NUMERIC VISIBLE fields.
  • Any given CHUNK target field can have at most 20 CHUNK fields defined for it.

File Restrictions When Using CHUNK fields

  • CHUNK fields can only be defined in a FILEORG X'100' file.
  • Once a file has CHUNK fields defined, it cannot be opened with a Model 204 release prior to 7.5.
  • A CHUNK field cannot be defined in a non-emtpy file.

Commands for Chunk Fields

The following other commands can use chunk fields:

  • DISPLAY FIELD
  • REDEFINE FIELD. You cannot REDEFINE a non-chunk field to a chunk field or vice versa. See REDEFINE FIELD for additional rules.

Date/time stamp field

Specifying the date/time stamp field name

The system parameter DTSFN specifies the date/time stamp field name. This field name is the same for all files participating in DTS processing in a run. Though this is a system-level parameter, the feature is a file-level feature.

The date/time stamp field name you specify is not edited for validity.

The DTSFN value can be specified in CUST source code so you do not have to change all your jobs to specify the parameter. By default, Model 204 does not provide a date/time stamp field name.

Specifying a date/time stamp field definition

A date/time stamp field is defined as is any other Model 204 field but has the following field attribute restrictions:

  • A date/time field cannot be INVISIBLE or UNIQUE.

  • If the date/time stamp field is preallocated, OCCURS cannot be greater than 1.

Disregarding these restrictions causes the following error:

M204.2728: DATE TIME STAMP FIELD MAY NOT BE INVISIBLE, UNIQUE, NOR HAVE MULTIPLE OCCURRENCES

Rocket Software recommends using the ORDERED CHARACTER attribute. By default Model 204 generates a character string date/time stamp field value.

Defining the date/time stamp field value format

Your custom code formats the data entered into the field. The format for the default date/time stamp string is:

CCYYMMDDHHMMSSMMM

Note: The DTSFN field can have differing definitions in different Model 204 databases.