SirDBA: Difference between revisions

From m204wiki
Jump to navigation Jump to search
Line 485: Line 485:
</ul></td></tr>
</ul></td></tr>


<tr><th>Wkday, DAY Month YYYY "A"T HH:MI</th>
<tr><th nowrap>Wkday, DAY Month YYYY "A"T HH:MI</th>
<td>This is a format that would be quite unlikely in a file, but the value <code>Friday, 7 February 1998 AT 21:33</code> matches this format.
<td>This is a format that would be quite unlikely in a file, but the value <code>Friday, 7 February 1998 AT 21:33</code> matches this format.
<p>Notes:</p>
<p>Notes:</p>

Revision as of 21:47, 29 January 2018

Overview

SirDBA maps the fields in Model 204 files and groups onto standard data types, and it stores the mappings in a database file. When a field is selected from the SirDBA field display, you are launched into SirPro's string search facility to hunt for procedures that reference the specified field name.

SirDBA is designed to assist in database conversions, and for conversions involving a large volume of changes to User Language code, when the changes are centered around Model 204 field names.

For Year 2000 conversion, SirDBA goes to extra lengths to determine when a field contains date data. When it does, SirDBA stores a datatype of DATETIME and a format specification of the datetime field.

Because SirDBA is an adjunct product to SirPro, programmers should familiarize themselves with the features listed in the SorPro topics. Using SirDBA and SirPro, a SOUL development team has the tools for targetted large scale procedure updates centered around references to Mode 204 fields.

SirDBA is part of the RKTools family of products. As with all RKTools products, SirDBA formerly required the installation of the Sirius Mods, with a version equal to or higher than the SirDBA version. As of Model 204 V7.5, however, the functions and facilities necessary to run RKTools are built into the Model 204 kernel.

SirDBA setup

The installation process for SirDBA is described in RKTools installation.

In addition, because SirDBA is an adjunct product to SirPro, the SirPro application must be installed in the same Mode 204 region(s).

For information about product error messages, see MSIR. messages.

For information about Model 204 and RKTools product changes, see the Release notes.

System administration

Field mapping

If Model 204 supported strong data typing at the field level, SirDBA could simply allow programmers to work from a list of field characteristics. Instead, most Mode 204 fields are nominally formatted as character strings, but they may contain anything the application places in them.

SirDBA's mapping facility processes the actual file contents to determine the type of data each field contains. The mapping facility requires the user to select the data file or group, and to specify a few characteristics of the mapping job. SirDBA then analyzes a sample of the file's records to determine what's actually in the fields, and it stores an entry for the file/field combination in its internal database. These database entries are used by the SirDBA screens to let you select fields to work on.

To run SirDBA and map the fields in a Mode 204 data file, you simply:

  1. Open the target database file.
  2. Specify the number of records to scan and any specific date formats known to be in the file.

    The SirDBA mapping facility will typically take a few minutes to run.

  3. Select fields from the SirDBA display lists (which transfers them into a SirPro edit session for the procedures that contain references to the selected field).

Opening files

When you first enter SirDBA, no files will have been mapped. Rather than display an empty selection list, the first user must specify the first file to be mapped. You do so using the Open a file or group screen. This screen is also presented when a file is added to the SirDBA database and SirDBA cannot open the file with high enough default privileges.

The screen contains these fields:

Context Specify FILE or GROUP for the target Mode 204 entity.
File/Group name Name of the Mode 204 data file or data group whose fields are to be scanned.
Password If a password is not required, SirDBA will usually open the file without sending the user to this screen.
Current Privileges Displays the hexadecimal privilege switches, if the target file is already open by the user without sufficient privileges for SirDBA to complete the mapping.
Required Privileges Hexadecimal privilege level required for SirDBA to perform field mappings. Currently hardcoded to 0041.
PF keys:
PF1 Help.
PF3 Quit / Return to previous screen.

Specifying field mapping criteria

Field mapping requires the user to specify the number of records to scan, a "fudge" factor for estimating field lengths, and some date formats. The SIRDBA Model 204 Field Analyzer screen for specifying these settings is presented once the target file is open.

These are the input and display options on this screen:

Map tables for FILE/GROUP name Displays the name of the current target data file.
Records to scan The number of records SirDBA will analyze to determine the data formats and lengths for each field. Analyzed records are selected evenly throughout the file. So, for instance, if the target file contains 60,000 records, and the user specifies that 1000 should be analyzed, every 60th record is examined (using Mode 204 internal record numbers).

Enough records should be analyzed to guarantee that each field in the file is represented in the SirDBA database. For instance, if a file contains a million records, and some fields exist only on a record type that occurs 1000 times in the file, you will need to sample more than every thousandth record in order to capture the contents of that record type.

The number of records may be set from 1 to 99999. There is no penalty for oversampling, except that it will take SirDBA longer to complete the analysis.

CHAR fudge factor SIRDBA calculates the length of each column when building tables by keeping track of the average and longest lengths for every observed occurrence of the field. It also calculates the standard deviation for the set of observations.

The default column length is set to the observed mean plus two standard deviation units, which should be an adequate length for 99% to 100% of occurrences of the field.

Note: The algorithm using +2 standard deviations works even when fields are a fixed length, as there will be no variance from the mean, and the standard deviation will be 0. The "fudge factor" of 2 standard deviations is almost always the preferred value.

Date/time formats to scan for in file While scanning records to build the field mapping, SirDBA will map a field to a datetime column if the contents of the field map consistently to a supported datetime format. SirDBA defaults to search for the following format fields:

YYYYMMDD YYYYDDD YYMMDD YYDDD YYYY MONTH DAY YY MONTH DAY DD MON YYYY MON DD YYYY YYIMMIDDIHHIMMISS YYYYMMDDHHMMSS

You can override the defaults and specify as many as 12 date formats for SirDBA to look for. Date/Time formats must be 31 characters or shorter; see Datetime formats for a description of the possible formats.

PF keys
PF1 Help.
PF3 Quit / Return to previous screeen.
PF9 Quit / Reset the system defaults (the default number of records to scan, CHAR fudge factor = 2, and the 10 default date formats).
PF12 Submit the request to map the file.

Displaying the mapped field

The SIRDBA Field Mappings by File screen shows files and fields that have been mapped onto data types. This is also the main screen that displays when SirDBA is entered and any file has already been mapped.

From here, if you place an S in the selection field in front of any line, you transfer into a SirPro edit session for procedures that contain references to that field.

To add a new file to SirDBA, enter ADD filename on the command line at the top left of the screen, or press PF 12.

The input/display options on this screen are:

File/Group Name of the file or group in which SirDBA found the field.
Field Name Name of the field. Although the entire field name is stored in the SirDBA database, only the leftmost 25 characters of the field name are displayed.
Type Type of data SirDBA found in the field when scanning the file or group. Data types are:
CHAR Character data. The data length is shown in parentheses.
DATETIME Date/Time data.
FLOAT Numeric data with a decimal.
INT Integer numeric data.

Since SirDBA scans only the number of records you tell it to, it can make bad guesses about data types if you scan too few records and certain field values are clustered near the end of the file.

Format Formats for DATETIME fields are displayed.

Format is only shown when the data type is DATETIME. See Datetime formats for a description of the possible formats.

Prefix commands

S Selects the field and transfers the user to SirPro to begin editing procedures that use it.

Command line commands

The following commands may be entered on the command line:

ADD [FILE | GROUP] fgname ADD is used to invoke the AUTOBUILD feature to add field mappings for a designated file or group. Groups must be permanent groups.
QUIT, STOP, END, or LOG Perform the same function as PF3, which exits SirDBA.

PF keys

PF1 Help.
PF2 Sort the displayed list by Field Name.
PF3 Quit.
PF4 Sort the displayed list by File/Grp name, then Field Name.
PF6 Sort the displayed list by Type of field.
PF7 Scroll the display up.
PF8 Scroll the display down.
PF9 Repeat the last command
PF10 Refresh the list.
PF12 Map the fields in a file to SirDBA field types.

Datetime processing considerations

This section presents date processing issues, including usage of SirDBA past the year 1999, an explanation of its processing of dates, and any rules and restrictions you must follow to achieve correct results using date values with SirDBA.

SirDBA uses dates in the following ways:

  • To examine the CPU clock (as returned by the STCK hardware instruction) to determine the current date, in case SirDBA is under a rental or trial agreement
  • As values to check for field type mapping
  • To display the current date, as returned by the TIME SVC, modified by the SYSDATE parameter or the the Sir2000 User Language Tools APPDATE clock, as page headers in various end-user displays

The rest of section contains a discussion of datetime formats, valid datetime strings, and processing of two-digit year values. It also contains example datetime formats and corresponding example datetime strings.

Datetime formats

The representation of a date is determined by a datetime format. This value is a character string, composed of the concatenation of tokens (for example, "YYYY" for a four-digit year, and "MI" for minutes) and separator characters (for example, "/" in "MM/DD/YY" for two-digit month, day, and year separated by slashes).

These datetime format strings are used in several Rocket Model 204 add-on products in addition to SirDBA. The additional products using datetime format strings are:

The rules for these datetime format strings are consistent throughout all these products, though certain uses of these strings might impose extra restrictions. For example, a leading blank may match an the HH, DD, and MM token in SirDBA, but it may not in some cases in other products.

Basic rules

There are certain rules applied to determine if a format is valid. The basic rules are:

  1. If a format string contains a numeric datetime token (that is, ND, NM, or Ns), then the format string must consist of only one token. Numeric datetime tokens are only supported in format strings for the Sir2000 Field Migration Facility.
  2. You must specify at least one time, weekday, or date token.
  3. Except for "weekday", you can't specify redundant information. More specifically this means:
    • Except for I, no token can be specified twice.
    • At most one year format (contains Y) can be specified.
    • At most one month format (contains MON, Mon, or MM) can be specified.
    • At most one day format (DD or Day) can be specified.
    • At most one weekday format (WKD, Wkd, WKDAY, or Wkday) can be specified.
    • If AM is specified, then PM cannot be specified.
    • At most one fractions-of-a-second format (contains X) can be specified.
    • If DDD is specified, then neither a day nor month format can be.
  4. If ZYY is specified in a format string, no other token that denotes a variable-length value may be used.
  5. If a format string contains other tokens that denote variable length values, then an * token may only appear as the last character of the format string.
  6. The DAY token may not be immediately followed by another token whose value may be numeric, regardless of whether the following token represents a variable length value. Thus, DAY may not be followed by *, I, YY, YYYY, CYY, MM, HH, MI, SS, X, XX, or XXX. DAY may not be followed by a decimal digit separator, and DAY may not be followed by a quote followed by a decimal digit.

    Note: A common mistake is to use MM for minutes; it should be MI.

  7. The maximum length of a format string in SirDBA is 31 characters; in most other Sirius products it is 100 characters.

Valid tokens

The valid tokens in a date format are shown in the following list. All of the tokens that match alpabetic strings (for example, MON) match any case (for example, "jan" or "JAN" or "Jan").

NM Numeric datetime value containing the number of milliseconds (1/1000 of a second) since January 1, 1900 at 12:00 AM. (This token is allowed only in the Sir2000 Field Migration Facility.)
NS Numeric datetime value containing the number of seconds since January 1, 1900 at 12:00 AM. (This token is allowed only in the Sir2000 Field Migration Facility.)
ND Numeric date value containing the number of days since January 1, 1900. (This token is allowed only in the Sir2000 Field Migration Facility.)
* Ignore entire variable-length substring matching pattern, if any. See Datetime and format examples.
I Ignore corresponding input character. See Datetime and format examples.
" Following character is "quoted", that is, it acts as a separator character. See Datetime and format examples.
YYYY Four-digit year
YY Two-digit year
CYY Year minus 1900 (three digits, including any leading zero). See Datetime and format examples.
ZYY Year minus 1900, two-digit or three-digit year number (variable length data). See Datetime and format examples.
MONTH Full-month name (uppercase variable length). For SirDBA, this is the same as Month.
Month Full-month name (mixed-case variable length). For SirDBA, this is the same as MONTH.
MON Three-character month abbreviation (uppercase). For SirDBA, this is the same as Mon.
Mon Three-character month abbreviation (mixed case). For SirDBA, this is the same as MON.
MM Two-digit month number. For SirDBA, this is the same as BM (leading blank is allowed). See Datetime and format examples.
BM Two-character month number; for SirDBA, this is the same as MM. See Datetime and format examples.
DDD Three-digit Julian day number
DD Two-digit day number. For SirDBA, this is the same as BD (leading blank is allowed). See Datetime and format examples.
BD Two-character day number; For SirDBA, this is the same as DD. See Datetime and format examples.
DAY One-digit or two-digit day number (variable length data). See Datetime and format examples.
WKDAY Full day of week name (uppercase variable length). For SirDBA, this is the same as Wkday.
Wkday Full day of week name (mixed case variable length). For SirDBA, this is the same as WKDAY.
WKD Three-character day of week abbreviation (uppercase). For SirDBA, this is the same as Wkd.
Wkd Three-character day of week abbreviation (mixed case). For SirDBA, this is the same as WKD.
HH Two-digit hour number. For SirDBA, this is the same as BH (leading blank is allowed). See Datetime and format examples.
BH Two-character hour number; For SirDBA, this is the same as HH. See Datetime and format examples.
MI Two-digit minute number
SS Two-digit second number
X Tenths of a second
XX Hundredths of a second
XXX Thousandths of a second (milliseconds)
AM AM/PM indicator
PM AM/PM indicator

Valid separators

The valid separators in a date format are:

  • blank (" ")
  • apostrophe (')
  • slash (/)
  • colon (:)
  • hyphen (-)
  • back slash (\)
  • period (.)
  • comma (,)
  • underscore (_)
  • left parenthesis (()
  • right parenthesis ())
  • plus (+)
  • vertical bar (|)
  • equals (=)
  • ampersand (&)
  • at sign (@)
  • sharp (#)
  • the decimal digits (0 - 9)

In addition, any character may be a separator character if preceeded by the quoting character (").

See Datetime and format examples for examples that include use of various separator characters.

Valid datetime strings

For a datetime string to be valid, it must meet the following criteria:

  • Its length must be less than 128 characters.
  • It must be compatible with its corresponding format string.
  • It must represent a valid date and/or time.

    For example, at most 23:59:59.999 for a time, 01-12 for a month, 01-31 or less (depending on the month) for a day, February 29 is only valid in leap years (only centuries divisible by 4 are leap years: 2000 is but neither 1800, 1900, nor 2100 are).

    Note: weekdays are not checked for consistency against the date; for example, both Saturday, 02/15/97 and Friday, 02/15/97 are valid.

  • It must be within the date range allowed for the corresponding format.

    A datetime string used with a CYY or ZYY format can only represent dates from 1900 to 2899, inclusive. The valid range of dates for all other formats is from 1 January 1753 thru 31 December 9999.

Datetime and format examples

There is an extensive set of format tokens, as shown in Datetime formats. These tokens and the various separator characters can be combined in almost limitless possibility, giving rise to an extremely large set of datetime formats.

The following table provides examples of some common datetime formats, and it also tries to explain uses of some of the format tokens that might not be obvious. Each example format description includes matching datetimes. Again, these tokens can be combined in many ways and only a very few are shown here.

Format Comment
YYMMDD This is a common 6-digit date format that supports sort order if all dates are within a single century. The value 960229 matches this format.
YYYYMMDD This is a common 8-digit date format that supports sort order with dates in two centuries. The value 19991212 matches this format.
MM/DD/YY This is the U.S. 6-digit date format for display. The value 12/14/97 matches this format.

Notes:

  • The leading zero corresponding to an MM token may be given as a blank, thus allowing " 7/15/98".
  • The BM token can be used instead of MM.
DD.MM.YY This is a European 6-digit date format for display. The value 14.12.97 matches this format.

Notes:

  • The leading zero corresponding to a DD token may be given as a blank, thus allowing " 7.04.89".
  • The BD token can be used instead of DD.
Wkday, DAY Month YYYY "A"T HH:MI This is a format that would be quite unlikely in a file, but the value Friday, 7 February 1998 AT 21:33 matches this format.

Notes:

  • If a format contains AM or PM, then the time (HH:MI) must be between 00:01 and 12:00 and must be accompanied by either AM or PM.
  • The day number (string matching DAY) may have a leading zero.
  • The leading zero corresponding to an HH token may be given as a blank, thus allowing:

    Friday, 31 February 1998 AT 8:33

  • The BH token can be used instead of HH.
YYIIII This is a format which could be used for a field that contains a 2-digit year prefixing other information, such as a sequence number. The value 92A123 matches this format.
YY* This is a format that could be used for a field that contains a 2-digit year prefixing other information, such as a sequence number, when the other information is variable length. The values 92 and 92ABC123 match this format.

Note: At most one occurrence of the * token may appear in a datetime format.

CYYDDD This is a compact 6-digit date format with explicit century information, from 1900 through and including 2899. The value 097031 (representing 31 January 1997) matches this format.
ZYYMMDD This is a compact 6- or 7-digit date format with explicit century information, from 1900 through and including 2899, which can often be used with "old" YYMMDD date values in the 1900s. The values 970501 (representing 1 May 1997) and 1000501 (representing 1 May 2000) match this format.

Note: The century and year number (string matching ZYY) may have a leading zero.

YY0000 Decimal digits can be used as separator characters. The value 980000 matches this format.

Note: Numeric separators, unlike alphabetic separators, do not need to be preceeded by a quote character (").