Fast/Unload datetime processing considerations
This page presents date processing issues, including usage of Fast/Unload past the year 1999, an explanation of its processing of dates, and any rules and restrictions you must follow to achieve correct results using date values with Fast/Unload.
Fast/Unload uses dates in the following ways:
- To examine the CPU clock (as returned by the STCK hardware instruction) to determine the current date, in case Fast/Unload is under a rental or trial agreement
- As arguments to various #functions, and returned values from them
Please note that in addition to the above date processing performed by Fast/Unload, it also unloads Model 204 files and allows manipulation of other values which might contain two-digit year date values. The customer must ensure that any application using that data has an algorithm or rule for unambiguously determining the correct century for the values.
For example, the UAI statement with the SORT clause allows you to sort by a Model 204 field; if you are sorting by a two-digit year date field, you need to supply information to enable the sort program to determine the century. You can do this using the FORMAT keyword in the UAI SORT items, as described in UNLOAD ALL INFORMATION or UAI.
For headers on pages or rows that occur on printed pages or displayed screens, Rocket Software products generally use a full four-digit year format, although they may display dates with two-digit years in circumstances where the proper century can be inferred from the context.
You must examine all uses of date values in your applications to ensure that each of your applications produces correct results. Furthermore, both the operating system and Model 204 must correctly process and transmit dates beyond 1999 in order for Fast/Unload to operate properly.
Most Fast/Unload date processing involves the use of datetime #functions. Occasionally, we refer to the "#DATExxx" functions; this is meant to also include #TIME and the #Nxxx2DATE functions.
In operational terms, there are two classes of datetime #functions:
- #Functions using a numeric value to represent a datetime,
where 0 represents 12:00 AM, 1 January 1900; for example,
#DATE2NM and #NM2DATE (number of milliseconds since
the start of 1900).
These #functions perform non-strict matching of date strings to date formats; for example, a leading blank is allowed for the HH token.
- Other #functions that only manipulate strings and associated
datetime formats;
for example, #DATECHG (add number of days to given date).
These #functions perform strict matching of date strings to date formats; for example, a leading blank is not allowed for the HH token. These #functions generally produce the same results as the same SOUL $DATExxx functions, with additional enhancements.
See Strict and non-strict format matching for a discussion of strict and non-strict format matching, including a technique for accomplishing strict date checking using the non-strict #functions.
Notes:
- All #DATExxx functions that can have argument errors
(that is, all #functions except #DATEFMT) accept an optional "return
code" argument.
If an argument error occurs and the return code argument is absent,
Fast/Unload terminates; if the return code argment is present, an error will
set the return code to a non-zero number and the result of the #function
is the MISSING value.
The User Language $DATExxx and $SIR_DATExxx functions take a different approach to error handling; each uses a special return value (or class of values) to indicate an argument error.
- The default format for #DATE is "YYYY-MM-DD"; the default for $DATE and $SIR_DATE is "YY-MM-DD".
The rest of this page contains a discussion of datetime formats, valid datetime strings, processing of two-digit year values, and datetime error handling. It also contains example datetime formats and corresponding example datetime strings. Finally, there is a list of benefits of Fast/Unload datetime processing.
Datetime formats
The representation of a date is determined by a datetime format. This value is a character string, composed of the concatenation of tokens (for example, "YYYY" for a four-digit year, and "MI" for minutes) and separator characters (for example, "/" in "MM/DD/YY" for two-digit month, day, and year separated by slashes).
These datetime format strings are used in many products in addition to Fast/Unload. The products using datetime format strings are:
- Fast/Unload
- Janus Open Client
- Janus Open Server
- Janus Specialty Data Store
- Janus Web Server
- SirDBA
- Sirius Functions
- Sir2000 Field Migration Facility
- Sir2000 User Language Tools
The rules for these datetime format strings are consistent throughout all these products, though certain uses of these strings might impose extra restrictions. For example, a leading blank is allowed for the HH, DD, and MM parts of a date argument using a non-strict date #function, such as #DATE2NS, but is not allowed for the strict date #functions.
There are certain rules applied to determine if a format is valid. The basic rules are:
- If a format string contains a numeric datetime token (that is "ND", "NM", or "NS"), then the format string must consist of only one token. Numeric datetime tokens are only supported in format strings for the Sir2000 Field Migration Facility.
- You must specify at least one time, weekday, or date token.
- Except for "weekday", you can't specify redundant information.
More specifically this means
- Except for "I", no token can be specified twice.
- At most one year format (contains Y) can be specified.
- At most one month format (contains MON, Mon, or MM) can be specified.
- At most one day format (DD or Day) can be specified.
- At most one weekday format (WKD, Wkd, WKDAY, or Wkday) can be specified.
- If AM is specified, then PM can not be specified.
- At most one fractions-of-a-second format (contains X) can be specified.
- If DDD is specified, then neither a day nor month format can be.
- If ZYY is specified in a format string, no other token that denotes a variable-length value may be used.
- If a format string contains other tokens that denote variable length values, then an * token may only appear as the last character of the format string.
- The DAY token may not be immediately followed by another token whose value may be numeric, regardless of whether the following token repsents a variable length value. Thus, DAY may not be followed by *, I, YY, YYYY, CYY, MM, HH, MI, SS, X, XX, or XXX; DAY may not be followed by a decimal digit separator, and DAY may not be followed by a quote followed by a decimal digit.
- When a pair of format strings are used for transforming date values,
for example for #DATECNV or processing of updates to SIRFIELD RELATEd fields,
additional rules apply to the pattern matching tokens:
- If one of the format strings includes one or more "I" tokens, then the other format string must contain the same number of "I" tokens. Note that the placement of "I" tokens within the format strings is not restricted. The "I" tokens are processed left to right, with each character from the input string that corresponds to the nth "I" token in the input format being copied unchanged to the character position in the output string that corresponds to the nth "I" token in the output format.
- If one of the format strings contains an asterisk ( * ) token, then the other format string must also contain an asterisk token. All of the characters from the input string that correspond to the asterisk token in the input format, if any, are copied unaltered to the output string, begining in the position that corresponds to the asterisk token in the output format.
SIRFIELD is part of the Sir2000 Field Migration Facility.
- The maximum length of a format string is 100 characters.
Note: A common mistake is to use "MM" for minutes; it should be "MI".
The valid tokens in a date format are shown in the following list. In general, the output format rule for a token is shown. For some of the #functions, the input format rule for a token is the same as the output format rule; this is the definition of "strict date format matching." However, non-strict #functions sometimes allow a string to match a token on input that would not be produced by that token on output.
All of the tokens that match alphabetic strings (for example, "MON") match any case for non-strict matching. All other tokens that have differing strict and non-strict matching rules are listed under "Special date format rules" in the index at the back of the manual, and usage notes for them are contained in Datetime and format examples. Each input datetime format argument in the description of a #function specificies whether the use of the format observes strict or non-strict format matching. See Strict and non-strict format matching.
NM | numeric datetime value containing the number of milliseconds (1/1000 of a second) since January 1, 1900 at 12:00 AM. (This token is allowed only in the Sir2000 Field Migration Facility.) |
---|---|
NS | numeric datetime value containing the number seconds since January 1, 1900 at 12:00 AM. (This token is allowed only in the Sir2000 Field Migration Facility.) |
ND | numeric date value containing the number of days since January 1, 1900. (This token is allowed only in the Sir2000 Field Migration Facility.) |
* | Ignore entire variable-length substring matching pattern, if any, when only retrieving a date value. Substitute with null string when only creating a date value. When copying date values, copy entire variable-length substring matching pattern, if any, from input value to location identified by * token in output string. See Datetime and format examples. |
I | Ignore corresponding input character when only retrieving a date value. Store a blank in corresponding output character when only creating a date value. When copying date values, copy each character matching an I token from from the input value to location in the output string identified by the corresping I token in the output format. See Datetime and format examples. |
" | Following character is "quoted", that is, it acts as a separator character. See Datetime and format examples. |
YYYY | Four-digit year |
YY | Two-digit year |
CYY | Year minus 1900 (three digits, including any leading zero). See Datetime and format examples. |
ZYY | Year minus 1900, two-digit or three-digit year number, excluding any leading zero (variable length data). Non-strict #functions allow a three-digit number with leading zero on input, but any number less than 100 always produces a two-digit number on output. See Datetime and format examples. |
MONTH | Full-month name (uppercase variable length). Non-strict #functions allow any mixture of uppercase and lowercase on input, but all uppercase is always produced on output. |
Month | Full-month name (mixed-case variable length). Non-strict #functions allow any mixture of uppercase and lowercase on input, but an initial uppercase letter followed by all lowercase is always produced on output. |
MON | Three-character month abbreviation (uppercase). Non-strict #functions allow any mixture of upper and lowercase on input, but all uppercase is always produced on output. |
Mon | Three-character month abbreviation (mixed case). Non-strict #functions allow any mixture of upper and lower case on input, but initial upper case letter followed by all lowercase is always produced on output. |
MM | Two-digit month number. Non-strict #functions allow a two-character number with leading blank on input, but two decimal digits are always produced on output. See Datetime and format examples. |
BM | Two-character month number; if less than 10, first character is blank. Non-strict #functions allow a two-digit number with leading zero on input, but any number less than 10 always produces a blank followed by a decimal digit on output. See Datetime and format examples. |
DDD | Three-digit Julian day number |
DD | Two-digit day number. Non-strict #functions allow a two-character number with leading blank on input, but two decimal digits are always produced on output. See Datetime and format examples. |
BD | Two-character day number; if less than 10, first character is blank. Non-strict #functions allow a two-digit number with leading zero on input, but any number less than 10 always produces a blank followed by a decimal digit on output. See Datetime and format examples. |
DAY | One-digit or two-digit day number (variable length data). Non-strict #functions allow a two-digit number with leading zero on input, but any number less than 10 always produces a one-digit number on output. See Datetime and format examples. |
WKDAY | Full day-of-week name (uppercase variable length). Non-strict #functions allow any mixture of uppercase and lowercase on input, but all uppercase is always produced on output. |
Wkday | Full day-of-week name (mixed-case variable length). Non-strict #functions allow any mixture of uppercase and lowercase on input, but initial upper case letter followed by all lowercase is always produced on output. |
WKD | Three-character day-of-week abbreviation (uppercase). Non-strict #functions allow any mixture of uppercase and lowercase on input, but all uppercase is always produced on output. |
Wkd | Three-character day-of-week abbreviation (mixed case). Non-strict #functions allow any mixture of uppercase and lowercase on input, but initial upper case letter followed by all lowercase is always produced on output. |
HH | Two-digit hour number. Non-strict #functions allow a two-character number with leading blank on input, but two decimal digits are always produced on output. See Datetime and format examples. |
BH | Two-character hour number; if less than 10, first character is blank. Non-strict #functions allow a two-digit number with leading zero on input, but any number less than 10 always produces a blank followed by a decimal digit on output. See Datetime and format examples. |
MI | Two-digit minute number |
SS | Two-digit second number |
X | Tenths of a second |
XX | Hundredths of a second |
XXX | Thousandths of a second (milliseconds) |
AM | AM/PM indicator |
PM | AM/PM indicator |
The valid separators in a date format are:
- blank (" ")
- apostrophe ("'")
- slash ("/")
- colon (":")
- hyphen ("-")
- back slash ("\")
- period (".")
- comma (",")
- underscore ("_")
- left parenthesis ("(")
- right parenthesis (")")
- plus ("+")
- vertical bar ("|")
- equals ("=")
- ampersand ("&")
- at sign ("@")
- sharp ("#")
- the decimal digits ("0" - "9").
In addition, any character may be a separator character if preceeded by the quoting character (").
See Datetime and format examples for examples which include use of various separator characters.
Valid datetimes
For a datetime string to be valid it must meet the following criteria:
- Its length must be less than 128 characters.
- It must be compatible with its corresponding format string.
- It must represent a valid date and/or time.
For example, at most 23:59:59.999 for a time, 01-12 for a month, 01-31
or less (depending on the month) for a day, February 29 is only valid
in leap years (only centuries divisible by 4 are leap years: 2000 is
but neither 1800, 1900, nor 2100 are).
Note: Weekdays are not checked for consistency against the date;
for example, both Saturday, 02/15/97 and Friday, 02/15/97 are valid.
- It must be within the date range allowed for the corresponding format. A datetime string used with a CYY or ZYY format can only represent dates from 1900 to 2899, inclusive. A datetime string used with a YY format can only represent dates in a range of 100 or less years, as determined by CENTSPAN and SPANSIZE. The valid range of dates for all other formats is from 1 January 1753 thru 31 December 9999.
Processing dates with two-digit year values
A date field with only two digits for the year value is capable of representing a range of up to one hundred years. When we compare a pair of two-digit year values we are accustomed to thinking of the century as fixed, so that all dates are either "19xx" or "20xx". However, a date field with two-digit year values can actually represent dates from two different centuries, provided that the range of dates does not exceed 100 years.
CENTSPAN
CENTSPAN provides a mechanism for unambiguously converting dates with two-digit year values into dates with four-digit year values. The CENTSPAN mechanism allows two-digit year values to span two centuries without confusion. CENTSPAN identifies the four-digit year value that is the start of a range of years represented by the two-digit year values.
CENTSPAN may be specified as an absolute unsigned four digit value between 1753 and 9999, or it may be specified as a relative signed value between -99 and +99, inclusive. A relative CENTSPAN value is dynamically converted to an effective absolute value before it is used to perform a YY to YYYY conversion. The effective CENTSPAN value is formed by adding the relative CENTSPAN to the current four-digit year value at the time the relative value is converted.
-- get picture centspan --
A simple algorithm is used to convert a two-digit year value (YY) to a four-digit year value, using a four-digit absolute or effective CENTSPAN value (HHLL). If the two-digit year value is less than the low-order two digits of the CENTSPAN value, then the resulting century is one greater than the high-order two digits of the CENTSPAN value. Otherwise the resulting century is the same as the high-order two digits of the CENTSPAN value.
Using all one hundred available years for mapping two-digit year values can cause significant confusion and result in data integrity errors: dates just above and just below the 100-year window are mapped to the other end of the window. From the previous example, the date "47" will be intepreted as 1947, when it could have conceivably been 2047. Similarly, the date "46" will be intepreted as 2046, when it might have been 1946.
-- get picture spam1 --
If CENTSPAN is set to a value that is too high, dates that are just prior to CENTSPAN will appear to occur 100 years hence. If CENTSPAN is set to a value that is too low, dates that fall just after CENTSPAN+99 will appear to have occured 100 years earlier. A full one-hundred year window also can not detect attempts to represent more than one hundred years of values with a two-digit year.
SPANSIZE
There is a method to protect from the ambiguities that can occur at each end of the 100-year window defined by CENTSPAN. SPANSIZE is used to restrict the size of the window used for mapping two-digit year values. The effect is to create two guard bands, one just below the date window and one just above. An attempt to represent a date value that lands in a guard band produces an error.
Each guard band contains CENTSPAN-SPANSIZE years, hence a SPANSIZE of 100 removes the protection. SPANSIZE is a value which you can customize in your load module; see CENTSPAN and SPANSIZE. If you do not customize it, the value of SPANSIZE is 90, which provides protection for two ten year windows: one below the CENTSPAN setting and one starting at CENTSPAN+90. Note that in Fast/Unload version 3.0, SPANSIZE is 100 (and it can not be customized).
From our previous example:
-- get picture spam2 --
An attempt to represent the values "37" through "46" will be rejected. This protects the range 1937 through 1946 as well as the range 2037 through 2046. Note that an intended value of 2047, expressed as "47" will be accepted and interpreted as 1947. In general a smaller SPANSIZE provides the highest assurance of correct mappings. However, any setting of SPANSIZE less than 100 will probably detect the case where a range greater than one hundred years is being used.
Strict and non-strict format matching
As mentioned in Datetime formats, for some of the #functions, the input format rule for a token is the same as the output format rule; this is the definition of "strict date format matching". However, non-strict #functions sometimes allow a string to match a token on input that would not be produced by that token on output. The types of strict matching are as follows:
Alpha tokens | For alphabetic tokens (for example, Month), a strict match requires the input value to be the correct case. For example, the "MON" token is strictly matched by "JAN" but not by "Jan", and the reverse is true for the "Mon" token. For non-strict matching, the alphabetic tokens are matched by any combination of uppercase and lowercase input. |
---|---|
HH, MM, DD | For these tokens, a strict match requires a leading zero for values less than 10. For non-strict matching, a value less than 10 can also be represented by a leading blank followed by a single numeric digit. |
BH, BM, BD | For these tokens, a strict match requires a leading blank for values less than 10. For non-strict matching, a value less than 10 can also be represented by a leading zero followed by a numeric digit. |
DAY | For this token, a strict match requires a single digit for values less than 10. For non-strict matching, a value less than 10 can also be represented by a leading zero followed by a numeric digit. |
ZYY | For this token, a strict match requires two digits for values less than 100. For non-strict matching, a value less than 100 can also be represented by a leading zero followed by a two numeric digits. |
If you want to check a datetime string using strict rules, you can use the following technique with the non-strict date #functions:
IF date EQ '' OR date NE #NM2DATE(- #DATE2NM(date, fmt), - fmt) THEN error handling END IF
Datetime and format examples
There is an extensive set of format tokens, as shown in Datetime formats. These tokens and the various separator characters can be combined in almost limitless possibility, giving rise to an extremely large set of datetime formats. This section provides examples of some common datetime formats, and also tries to explain the use of some of the format tokens which might not be obvious. It also has examples for formats which have usage with the Fast/Unload which differs from their usage with other Model 204 products. These are noted in the examples and are indexed at the back of this manual under the heading "Special date format rules". Each example format is explained and also presented with some matching datetimes; again, bear in mind that these tokens can be combined in very many ways and only a very few are shown here. It is assumed that these examples are invoked sometime between the years 1998-2040, as the basis for relative CENTSPAN calculations.
YYMMDD | This is the common 6-digit date format which supports sort order if all dates are within a single century. The following FUEL fragment
%X = #DATE2ND('960229', 'YYMMDD') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the value "OK". |
---|---|
YYYYMMDD | This is the common 8-digit date format which supports sort order with dates in 2 centuries. The following FUEL fragment
%N = #DATE2ND('921212', 'YYMMDD') %N = #ND2DATE(%N, 'YYYYMMDD') REPORT %N prints the value 19921212. |
MM/DD/YY | This is the U.S. 6-digit date format for display. The following FUEL fragment
%X = #DATE2ND('12/14/94', 'MM/DD/YY') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the value "OK". Notes:
|
DD.MM.YY | This is a European 6-digit date format for display. The following FUEL fragment
%X = #DATE2ND('14.12.94', 'DD.MM.YY') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the value "OK". Notes:
|
Wkday, DAY Month YYYY "A"T HH:MI | This is a format which could be used for report headers. The following FUEL fragment
%N = #DATE - ('Wkday, DAY Month YYYY "A"T HH:MI') REPORT %N prints a value like "Friday, 7 February 1998 AT 21:33". Notes:
|
YYIIII | This is a format which could be used for data which contains a 2-digit year prefixing other information, such as a sequence number. The following FUEL fragment
%D = #DATE2ND('92ABCD', 'YYIIII') %D = %D + 10*365.25 + .8 %N = #ND2DATE(%D, 'YY') REPORT %N prints the value "02". Note:
|
YY* | This is a format which could be used for data which contains a 2-digit year prefixing other information, such as a sequence number, when the other information is variable length. The following FUEL fragment
%X = #DATE2ND('92', 'YY*') IF %X > -9.E12 THEN REPORT 'OK' END IF %X = #DATE2ND('1992ABC', 'YYYY*') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the values "OK" and "OK". Notes:
|
CYYDDD | This is a compact 6-digit date format with explicit century information, from 1900 through and including 2899. The following FUEL fragment
%X = #DATE2ND('097031', 'CYYDDD') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the value "OK". |
ZYYMMDD | This is a compact 6- or 7-digit date format with explicit century information, from 1900 through and including 2899, that can often be used with "old" YYMMDD date values in the 1900's. The following FUEL fragment
* Check 1 Dec, 1997: %X = #DATE2ND('971201', 'ZYYMMDD') IF %X > -9.E12 THEN REPORT 'OK' END IF * Check 1 Dec, 2000: %X = #DATE2ND('1001201', 'ZYYMMDD') IF %X > -9.E12 THEN REPORT 'OK' END IF prints the values "OK" and "OK". Notes:
|
YY0000 | Decimal digits can be used as separator characters. The following FUEL fragment
%N = #DATE2ND('92000', 'YY000') %N = #ND2DATE(%N, 'YYYY"N"A') REPORT %N prints the value "1992NA". Notes:
|
Datetime error handling
Due to an invalid argument value to a datetime #function, any of the following errors can occur:
- invalid datetime format specification
- datetime string not matching format
- datetime out of range for the format
- invalid CENTSPAN value
- datetime out of range for CENTSPAN/SPANSIZE combination
One way to detect these errors is to check for the appropriate error return value:
- # Functions using a numeric value to represent a datetime, and #TIME and #DATE, have error return values of -9.E12 or a null string for numeric or string result #functions, respectively.
- # Functions (other than #TIME and #DATE) that only manipulate strings and associated datetime formats have error return values of a variable number of asterisks (or, in the case of #DATEDIF the value 99,999,999).
Most of the standard #DATExxx functions have an optional output "return code" argument (see Run-time errors during standard #function calls). If you specify, for example, an invalid CENTSPAN argument and you specify the return code argument, you can test the return code for CENTSPAN errors. If you specify an invalid CENTSPAN argument and you do not specify the return code argument, the Fast/Unload run terminates with an error message indicating the type of error and the line number being executed; the argument values are dumped as well.
#DATExxx functions CENTSPAN argument
Many of the #DATExxx functions accept an optional argument containing a CENTSPAN value to be used for the call. The default value of any CENTSPAN argument is -50. You can customize the default value of CENTSPAN in your load module; see CENTSPAN and SPANSIZE. Note that in version 3.0 of Fast/Unload, the default CENTSPAN argument can not be customized. The default value should be adequate in most cases; if you have carefully determined it should be different in some application, code the value on the relevant #function invocations.
For a different approach, see the description of the CENTSPLT and DEFCENT parameters (for example, the [[CENTSPLT parameter) and $function arguments.
Note that the CENTSPAN argument may not be specified as an entity whose value is MISSING. For most #function numeric arguments, the MISSING value is allowed if a value of zero for the argument is allowed. Zero is allowed for CENTSPAN, but since it is an unusual CENTSPAN value, the MISSING value may not be supplied.
Benefits of Fast/Unload datetime processing
Following is a list of benefits offered by Fast/Unload datetime processing. To provide concrete comparisons, there are some references to some SOUL date $functions.
SPANSIZE | The SPANSIZE processing creates a very strong barrier to detecting otherwise un-noticed 2-digit year processing errors. |
---|---|
Relative CENTSPAN | The relative CENTSPAN specification (for example, "-50") allows you to maintain a flexible "rolling" window for 2-digit year processing. |
Default CENTSPAN | One significant advantage of a relative CENTSPAN is that it allows the default (-50) of a reasonable value without parameter changes in all batch and online jobs. |
Format tokens | There is a very large set of tokens in the Fast/Unload datetime formats. For example, there are 4 different tokens representing the day of the week, and time of day can be represented. Standard User Language date formats do not have any day of week nor time of day tokens, and other standard User Language token variations, for example, CYY vs. ZYY, is done by a complex argument setting. |
Pattern match tokens | The Fast/Unload datetime formats can contain single-character ("I") or variable length character ("*") match-any tokens in datetime formats. For example, you can specify that a string has an imbedded year, and process that year as a date. |
Format-free representations | Non-string datetime values allow you to pass around dates simply as numbers, without the complexities of carrying the corresponding string format (you only need to establish the scale to operate on a value). |
Operating on numeric representations | Numeric date values can be operated on directly with FUEL, especially allowing you to add datetime differences (for example, "+"), rather than calling a DATECHG #function and providing a format. |
Time | All Fast/Unload datetime #functions allow any reference to a "date" to include time of day. The only standard User Language datetime $function which provides a time of day is $TIME, the current time of day, in one fixed format. |
#DATE formats | #DATE allows you to specify any format to return the current date and time; $DATE has only a few numeric codes for a few formats. |
Error control args | Fast/Unload provides error handling control that allows you to identify the specific cause of any datetime error. |
Error values of numeric date #functions | The #functions that use non-string datetime values provide very uniform error return values: -9.E12 or a null string for numeric or string result #functions, respectively. |