Model 204 language support

From m204wiki
Revision as of 18:15, 8 March 2016 by ELowell (talk | contribs) (deleted $CHKPAT description; linked to $Chkpat page instead)
Jump to navigation Jump to search

Model 204 contains a language support feature for customers who sort and display Model 204 data using single-byte character sets other than U.S. English or Japanese double-byte character set (DBCS).

This feature is included in the SOUL (User Language), HLI, and SQL interfaces. This topic describes the facilities that perform this language-specific processing for Model 204 data display, sequencing, and collating.

Overview

Language support in computer data storage means being able to receive, store, and redisplay differing character sets and devising algorithms to handle the correct sorting procedures. Worldwide use of the computer to store, transmit, share, and compare data exposed the need to:

  • Analyze the character sets used by written languages to determine which characters are shared and which characters are unique to a character set.
  • Respect the collating sequence or order of precedence rules used by a written language.

Language support terminology

A character set is a set of symbols or marks used in a writing system, such as a letter of the alphabet. Character sets differ in the number of characters, the specific characters included, and their collating sequence.

Once a character set is identified, the next task is handling the collating sequence. Collating sequence is the sequence in which characters are ordered for sorting, merging, and comparing. Specifically it is the order assigned to the characters of a character set (in computers, for example, ASCII, U.S. English, and EBCDIC) used for sequencing purposes. Usage determines the correct collating sequence for each writing system. The commonplace examples of collating sequences are telephone directories and dictionaries.

Language support documentation

For a thorough discussion of the decisions surrounding language support, consult the following documents:

  • Canadian Alphanumeric Ordering Standard for Character Sets of CSA Standard CAN/CSA-Z243.4, Canadian Standards Assoc., Rexdale (Toronto), Ontario, Canada, 1992.
  • The Unicode Standard: Worldwide Character Encoding, Version 1.0, Volume 1, The Unicode Consortium, Addison-Wesley, Reading, MA, 1991.

Adding other languages

Adding a language to those already developed for Model 204 language processing is a cooperative venture between a customer and Rocket Software. If you are interested, please consult your sales representative.

A note about User Language and SOUL

Model 204 versions 7.5 and higher provide a significantly enhanced, object-oriented, version of User Language called SOUL. All existing User Language programs will continue to work under SOUL, so User Language can be considered to be a subset of SOUL, though the name "User Language" is now deprecated. In this topic, the name "User Language" has been replaced with "SOUL."

Collating sequence support

The language support feature in Model 204 currently sorts using the expected collating sequence for U.S. English and limited support of Japanese.

NLANG, the language support module

Model 204 modules are linked with a set of language support tables in the NLANG module that define written languages. A Model 204 supported language consists of translation tables and flag tables containing information about:

  • Alphabetic characters, lowercase to uppercase
  • Alphabetic characters, uppercase to lowercase
  • LANGSORT tables
  • Pattern matcher
  • Characters you can enter at the keyboard
  • Characters you can display on the terminal
  • ASCII to EBCDIC translation

Supported languages

After installing Model 204, you can select one of five variations of the internal language table. The LANGUSER and LANGFILE parameter settings you select sets the terminal and print capabilities. The NLANG module contains internal language tables for the following languages:

  • Cyrillic
  • French Canadian
  • Japanese
  • Turkish
  • US English

The internal language table provides the same input and output translation tables, uppercase or lowercase translation, and $ALPHA support as the coordinating Model 204 parameters, LANGUSER and LANGFILE, but does not determine collating sequences for sorting or B-tree indexes.

Language support parameters

After installation set the correct LANGFILE and LANGUSER parameter options to support applications for your language requirements. The value of the LANGFILE and LANGUSER parameters determine which internal language table in NLANG that Model 204 consults for collating sequence, character storage, and uppercase or lowercase translation.

IBM code pages

IBM assigns a code page number to correspond to various sets of characters. Each IBM code page assigns a particular set of character shapes to a corresponding binary code. Model 204 depends on the binary code definition in the IBM code page to handle language support.

The following table lists the Model 204-supported character sets and designated IBM code pages.

Character sets supported in Model 204
Written language Parameter value Refers to IBM code page
Cyrillic CYRILLIC 880
French Canadian FRENCHC 037
Japanese JAPAN 290
Turkish TURKISH 1026
US English US (the default) 1047

LANGFILE: Choosing a character set definition for a file

Class

FPARMS

Default

US, meaning U.S. English

Setting

During file creation or resettable by file manager

Meaning

Use the LANGFILE parameter to specify the language for file processing operations such as the ordering of data and processing LIKE and LANGLIKE patterns. The LANGFILE parameter determines the valid character set in a file.

The value of LANGFILE must be one of the following, listed in this table.

Valid character sets
Written language Model 204 LANGFILE value
Cyrillic CYRILLIC
French Canadian FRENCHC
Japanese JAPAN
Turkish TURKISH
US English US (the default)

Note: You cannot specify a LANGFILE parameter setting other than US for sorted files (FILEORG X'01' setting).

LANGUSER: Setting the language definition of a user thread

Class:

USER

Default:

US, meaning U.S. English

Setting:

On the user's parameter line, resettable

Meaning:

Use the LANGUSER parameter to specify the language that is in use by the thread's I/O device. Different terminals in the same Model 204 run can use different languages. HLI or SQL threads can use different languages from each other and from SOUL or terminal threads.

The value of LANGUSER must be one listed in the following table:

Valid languages for a thread's I/O device
Written language Model 204 LANGUSER value
Cyrillic CYRILLIC
French Canadian FRENCHC
Japanese JAPAN
Turkish TURKISH
US English US (the default)

Data Management Language enhancements

This section describes language support enhancements to Data Management Language.

SQL Server

SQL Server ordering operations for an SQL table use the collating sequence specified by the file's LANGFILE parameter. Model 204 SQL does not permit joins across files that do not have the same LANGFILE parameter settings.

SQL language support requires an additional four bytes in QTBL per compiled query. You can set QTBL on the User 0 line or with the UTABLE command.

Statements that support language-specific ordering

The following SOUL statements and their corresponding HLI calls provide language-specific ordering:

  • FIND (various inequality operators such as index and direct search)
  • FOR EACH RECORD IN ORDER BY (including FROM and TO clauses)
  • FOR EACH VALUE IN ORDER
  • FOR EACH VALUE (in group context)
  • SORT RECORDS/RECORD KEYS
  • SORT VALUES
  • Pattern matcher (LIKE or LANGLIKE clause) range specifications

Note: Sorted file operations (with FILEORG = X'01') are not supported.

Pattern matching using the LANGLIKE operator

The SOUL operator LANGLIKE supports parsing and evaluation of patterns according to the tables provided with the LANGUSER and LANGFILE parameters.

The LANGLIKE syntax is the same as LIKE syntax. See the topic on value loops for more details.

  • The LIKE operator employs U.S. English for parsing the pattern and the value of LANGFILE for evaluating the pattern.
  • The LANGLIKE operator uses the value of LANGUSER for parsing the pattern and the value of LANGFILE for evaluating the pattern.

The parsing language, LANGUSER, is used for checking the syntax of the pattern and for determining the value of:

  • Special pattern escape character
  • Hexadecimal character
  • Alphabetic character

The evaluation language, LANGFILE, is used to match the pattern against the data. In particular, if a range of characters is defined in the pattern, then the collating sequence is determined by the evaluation language, LANGFILE.

Syntax

The format of the FIND statement used to perform pattern matching is:

FIND [ALL] RECORDS {FOR WHICH | WITH} fieldname IS [NOT] LANGLIKE 'pattern'

Where

The LANGLIKE keyword indicates that pattern is the set of characters to match, using LANGUSER and LANGFILE as previously described.

The pattern argument must be enclosed in quotation marks. The characters that you can use in a pattern and the methods of optimizing a pattern retrieval are described in Record loops wiki topic.

SOUL $functions for language support

The $functions in the following table include language-specific processing capabilities.

$Functions for language-specific processing
$function Description
$Alpha Verifies that a string is composed of only characters that are valid in the specified or default language.
$Alphnum Verifies that a string is composed of only characters and digits 0 through 9, which are valid in the specified or default language.
$ChkPat Verifies the syntax of a pattern.
$LANGSPC Returns a string containing the language-specific hexadecimal value of a special character on a particular terminal.
$LANGSRT Transforms a string into a language-specific sequence value.
$LANGUST Restores a transformed string back to its original value.
$LIKE Controls parsing and evaluation languages used in pattern matching.
$LOWCASE Translates an uppercase case or mixed-case string into a lowercase string.
$UPCASE Translates a lowercase or mixed-case string into an uppercase string.

$LANGSPC

The $LANGSPC function returns a string containing the hexadecimal value of the specified character in the specified language. You can use $LANGSPC to scan user input for a special character in a language-independent manner. A print-out or display of the returned value will be the character representation based on the language argument.

You can also use the $LANGSPC function to ensure that any special character that has a different hexadecimal code value is displayed correctly.

Syntax

$LANGSPC('charname'[,language])

Where

The charname argument is a string containing one of the following values:

Valid charname US character Description
AT @ At sign
BACKSLSH \ Backslash
DOLLAR $ Dollar sign
DQUOTE " Double quotation mark
EXCLAMAT ! Exclamation point
NOT ¬ Not sign
RBRACE ] Closing square brace or right square brace
SHARP # Number sign or pound sign
VERTICAL | Vertical bar

The optional language argument specifies which language to use to obtain the desired hexadecimal code for the specified character. The request is canceled with an error message if the name is not found in NLANG. The language argument is handled as follows:

  • When you omit the language argument, Model 204 performs the validation in U.S. English, even if the value of the LANGUSER parameter is not US, and lowercase characters are not recognized.
  • An asterisk enclosed in quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • You can enter the name of a valid language enclosed in quotation marks or a %variable containing a valid language. If the value you enter is not supported, the request is canceled with an error message. See LANGUSER for valid values.

Example

In the following example, the %PATH variable, supplied by the user from the terminal, is searched for the backslash character in the code table designated by the user's LANGUSER value:

%BACKSLASH IS STRING LEN 1 %BACKSLASH = $LANGSPC('BACKSLSH','*') %DIR = $SUBSTR(%PATH,$INDEX(%PATH,%BACKSLASH)+1)

$LANGSRT

The $LANGSRT function translates a given string according to the specified language into a language-neutral hexadecimal string against which you can sort. A print-out or display of the returned value will be the character representation based on the language argument.

By determining whether one string is greater or less than another string, you can use the $LANGSRT function to compare two strings. First apply the $LANGSRT function to the strings and then compare them using the SOUL greater-than (GT) and less-than-or-equal-to (LE) operators.

Syntax

$LANGSRT('string'[,language])

Where

The string argument is a literal enclosed in quotation marks or a %variable containing the original data to be translated into collating sequence.

The optional language argument is the name of one of the defined languages, which specifies which collating sequence to use. The language argument is handled as follows:

  • When you omit the language argument, Model 204 performs the validation in U.S. English, even if the value of the LANGUSER parameter is not US, and lowercase characters are not recognized.
  • An asterisk enclosed in quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • You can enter the name of a valid language enclosed in quotation marks or a %variable containing a valid language. If you enter a value that is not supported, the request is canceled with an error message. See LANGUSER for valid values.

Note: The $LANGSRT function returns the string unchanged when the language is U.S. English.

Example

The following procedure stores the value of NAME from each record into array %STR.
The $LANGSRT function translates each value of NAME into a language-specific collating sequence and stores the value into the array %SORTSTR.
The procedure then calls a user written subroutine, MYSORT, that sorts the %SORTSTR array in ascending order.
At this point the procedure invokes the $LANGUST function to translate the collating string back to its original form and prints the names in language-specific order.

BEGIN DECLARE SUBROUTINE MYSORT (STRING LEN 20 ARRAY(*)) %STR STRING LEN 20 ARRAY (20) NO FS %SORTSTR STRING LEN 20 ARRAY (20) NO FS * FD1: IN DATA FIND ALL RECORDS END FIND %I = 1 FOR EACH RECORD IN FD1 %STR(%I) = NAME %I = %I + 1 END FOR FOR %J FROM 1 TO %I-1 %SORTSTR(%J) = $LANGSRT(%STR(%J),'TURKISH') END FOR * * SORT NAMES * CALL MYSORT(%SORTSTR) * FOR %J FROM 1 TO %I-1 %STR(%J) = $LANGUST(%SORTSTR(%J),'TURKISH') PRINT %STR(%J) END FOR END

$LANGUST

The $LANGUST function translates back to its original form a string previously translated by $LANGSRT processing, which is useful for applications that maintain sorted arrays of data and need to display the values.

Syntax

$LANGUST('string'[,language])

Where

The string argument is a literal enclosed in quotation marks or a %variable containing the data in collating sequence to be translated back to its original form.

The optional language argument is the name of one of the defined languages, specifying which collating sequence to use. The language argument is handled as follows:

  • You can enter the name of a valid language enclosed in quotation marks or a %variable containing a valid language. If the value you enter is not supported, the request is canceled with an error message. See LANGUSER for valid values.
  • An asterisk enclosed in quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • When you omit the language argument, Model 204 performs the validation in U.S. English, even if the value of the LANGUSER parameter is not US, and lowercase characters are not recognized.

Example

If your site maintains more than one type of terminal and keyboard that store and display the same character set, individual characters might be assigned differing hexadecimal codes on different keyboards. You can translate the character equivalents back and forth as follows:

$LANGUST($LANGSRT(string,source-language), target-language)

A character without an equivalent converts to its base character. A special character without an equivalent converts to a space.

$LIKE

The $LIKE function provides user control over the parsing and evaluation languages used in pattern matching. It has two language arguments: one to assign the parsing language, LANGUSER, and one to assign the evaluation language, LANGFILE.

The LANGLIKE operator and $LIKE function in expressions coordinate to provide consistency between the FIND statement and the IF statement, and avoid complicating the interpretation of the evaluation language parameter.

Syntax

$LIKE(string,pattern[,parse-lang][,eval-lang])

Where

The string argument represents the characters to verify. It must be one of the following:

  • A literal enclosed in quotation marks.
  • A %variable.
  • A field name without quotation marks. In this case, the function call must be embedded in a FOR EACH RECORD loop where the current value of the field is verified.

The required pattern argument is the string of characters to verify, which you can specify as a literal enclosed in quotation marks or as a %variable.

The optional parse-lang argument specifies the language to use for parsing. The parse-lang argument is handled as follows:

  • Omitting this argument instructs Model 204 to use U.S. English parsing rules, even if the value of the LANGUSER parameter is not US.
  • An asterisk enclosed in quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • You can enter the literal name of a valid language enclosed in quotation marks. If you enter a name that is not supported, the request is canceled with an error message. See LANGUSER for valid values.

The optional eval-lang argument specifies the language to use for evaluation. Its requirements are identical to the parse-lang argument.

Example

In the following example, we are matching the value of field NAME against the pattern (A-Z)*@ using US as the parsing language and TURKISH as the evaluation language.

  • The parsing language, LANGUSER, determines the special characters that can be used in a pattern. The pattern is checked for syntax against these characters. The evaluation language, LANGFILE, is used when the pattern is matched against the data.
  • The evaluation language, LANGFILE, determines the collating sequence and the definition of alphabetic characters.

In this example, the evaluation language is Turkish. therefore all character matching is done against the Turkish alphabet and the range operation, (A-Z), uses the collating sequence of the Turkish language.

BEGIN FD1: IN DATA FIND ALL RECORDS END FIND %PAT='(A-Z)*@' FOR EACH RECORD IN FD1 %RC = $LIKE(NAME,%PAT,'US','TURKISH') IF %RC EQ 0 THEN PRINT 'STRING: 'WITH NAME WITH 'DOES NOT MATCH PATTERN: - ' WITH %PAT END IF END FOR END

$LOWCASE

The $LOWCASE function translates an uppercase or mixed-case string into a lowercase string. The translation affects only characters with uppercase and lowercase pairs, for example, A to a through Z to z in U.S. English. These are not strictly keyboard pairs. If the first character in the string is alphabetic, the character is converted to uppercase.

Syntax

$LOWCASE(string[,language])

Where

The string argument represents the characters to verify, which must be entered as follows:

  • A literal enclosed in quotation marks.
  • A %variable.
  • A field name without quotation marks. In this case, the function call must be embedded in a FOR EACH RECORD loop where the current value of the field is verified.

The optional language argument specifies the language to use, which is handled as follows:

  • Omitting the language argument instructs Model 204 to perform the validation in U.S. English, even if the value of the LANGUSER parameter is not US.
  • An asterisk enclosed by quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • You can enter a literal name of a valid language enclosed in quotation marks. If the name you enter is not supported, the request is canceled with an error message. See LANGUSER for valid values.

Example

The following example returns the string 'Name and address' in U.S. English:

$LOWCASE('NAME AND ADDRESS')

The following example returns the string 'Çà et là' in French Canadian:

$LOWCASE('ÇÀ ET LÀ','FRENCHC')

$UPCASE

The $UPCASE function translates a lowercase or mixed-case string into an uppercase-only string. The translation affects only the uppercase letters of character pairs in the specified language.

Syntax

$UPCASE(string[,language])

Where

The string argument represents the characters to verify. which must be entered as follows:

  • A literal enclosed by quotation marks.
  • A %variable.
  • A field name without quotation marks. In this case, the function call must be embedded in a FOR EACH RECORD loop where the current value of the field is verified.

The optional language argument specifies the language to use. The language argument is handled as follows:

  • Omitting this argument instructs Model 204 to perform the validation for U.S. English, even if the value of the LANGUSER parameter is not US.
  • An asterisk enclosed in quotation marks ('*') instructs Model 204 to use the value of the LANGUSER parameter.
  • A literal name of a valid language enclosed in quotation marks. If the name you enter is not supported, the request is canceled with an error message. See LANGUSER for the valid values.

Examples

The following examples return uppercase strings for mixed case entries.

Function code... Returns... Language
$UPCASE('Name and address') 'NAME AND ADDRESS' U.S. English
$UPCASE('Île d'Orléans','FRENCHC') 'ÎLE D'ORLÉANS' French Canadian
$UPCASE('Île d'Orléans') 'ILE D'ORLeANS U.S. English

Note: In U.S. English no accented characters have case translation.

Terminal interface requirements

Output validation on 3270 full-screen threads uses the list of displayable characters that is specified in the thread's language table, specified by LANGUSER or by the default language, US.

If no such list is supplied, then no output validation is performed, regardless of the setting of the FSTRMOPT parameter.

If there is a list of displayable characters, then output validation is performed when the FSTRMOPT parameter setting allows it; that is, when the X'01' bit is off.

The *UPPER and *LOWER commands, which set case translation, use the case translation rules specified in the thread's language table. If no case translation rules are specified, then no case translation is performed, regardless of the *UPPER or *LOWER command setting.

Using the JAPAN language table

The JAPAN language table is designed to handle Katakana terminal display and to provide upward compatibility with DBCS support in previous releases of Model 204. In particular, case translation and 3270 output validation are disabled.

Using DBCSENV for uppercase translation

When the DBCSENV parameter is set to a nonzero value, the LANGUSER parameter is automatically set to JAPAN. See the Model 204 DBCS Support Summary for the use and setting of this parameter.

Uppercase translation depends on the DBCSENV parameter. In the non-DBCS environment, when an *UPPER command is in effect, Model 204 converts data received from the user to uppercase for:

  • Full-screen editor commands
  • Screen input items not specified as mixed case
  • Line mode input (for example, command and $READ input)

Extended text lines for full-screen editor

Full-screen editor users in the Fujitsu environment can now input extended text lines of up to 255 display positions. If the storage requirement of such a line exceeds 255, the line is truncated cleanly. The screen is resent to the user with the truncated line highlighted, and an error message is displayed in the full-screen editor's message window.

Control characters

For a list of the control characters found in IBM computers and the sequence in which they are sorted, see Control characters.

Special characters

For a list of the special characters found in text, such as punctuation marks, diacritic (or accent) marks, currency symbols, arithmetic and mathematical marks, building blocks for screen forms, and Optical Character Recognition characters:

See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.

Latin Alphabet, Diacritics, Ligatures, and Numerals

For a list of the characters used to build words in U.S. English, French Canadian, and other written languages utilizing the Latin and extended Latin character set:

See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.

Language support topics

The Model 204 language support documentation consists of the pages listed below. This list is also available as a "See also" link from each of the pages.