Model 204 language support

From m204wiki
Revision as of 23:24, 6 December 2016 by ELowell (talk | contribs)
Jump to navigation Jump to search

Model 204 contains a language support feature for customers who sort and display Model 204 data using single-byte character sets other than U.S. English or Japanese double-byte character set (DBCS).

This feature is included in the SOUL (User Language), HLI, and SQL interfaces. This topic describes the facilities that perform this language-specific processing for Model 204 data display, sequencing, and collating.

Overview

Language support in computer data storage means being able to receive, store, and redisplay differing character sets and devising algorithms to handle the correct sorting procedures. Worldwide use of the computer to store, transmit, share, and compare data exposed the need to:

  • Analyze the character sets used by written languages to determine which characters are shared and which characters are unique to a character set.
  • Respect the collating sequence or order of precedence rules used by a written language.

Language support terminology

A character set is a set of symbols or marks used in a writing system, such as a letter of the alphabet. Character sets differ in the number of characters, the specific characters included, and their collating sequence.

Once a character set is identified, the next task is handling the collating sequence. Collating sequence is the sequence in which characters are ordered for sorting, merging, and comparing. Specifically it is the order assigned to the characters of a character set (in computers, for example, ASCII, U.S. English, and EBCDIC) used for sequencing purposes. Usage determines the correct collating sequence for each writing system. The commonplace examples of collating sequences are telephone directories and dictionaries.

Language support documentation

For a thorough discussion of the decisions surrounding language support, consult the following documents:

  • Canadian Alphanumeric Ordering Standard for Character Sets of CSA Standard CAN/CSA-Z243.4, Canadian Standards Assoc., Rexdale (Toronto), Ontario, Canada, 1992.
  • The Unicode Standard: Worldwide Character Encoding, Version 1.0, Volume 1, The Unicode Consortium, Addison-Wesley, Reading, MA, 1991.

Adding other languages

Adding a language to those already developed for Model 204 language processing is a cooperative venture between a customer and Rocket Software. If you are interested, please consult your sales representative.

A note about User Language and SOUL

Model 204 versions 7.5 and higher provide a significantly enhanced, object-oriented, version of User Language called SOUL. All existing User Language programs will continue to work under SOUL, so User Language can be considered to be a subset of SOUL, though the name "User Language" is now deprecated. In this topic, the name "User Language" has been replaced with "SOUL."

Collating sequence support

The language support feature in Model 204 currently sorts using the expected collating sequence for U.S. English and limited support of Japanese.

NLANG, the language support module

Model 204 modules are linked with a set of language support tables in the NLANG module that define written languages. A Model 204 supported language consists of translation tables and flag tables containing information about:

  • Alphabetic characters, lowercase to uppercase
  • Alphabetic characters, uppercase to lowercase
  • LANGSORT tables
  • Pattern matcher
  • Characters you can enter at the keyboard
  • Characters you can display on the terminal
  • ASCII to EBCDIC translation

Supported languages

After installing Model 204, you can select one of five variations of the internal language table. The LANGUSER and LANGFILE parameter settings you select sets the terminal and print capabilities. The NLANG module contains internal language tables for the following languages:

  • Cyrillic
  • French Canadian
  • Japanese
  • Turkish
  • US English

The internal language table provides the same input and output translation tables, uppercase or lowercase translation, and $ALPHA support as the coordinating Model 204 parameters, LANGUSER and LANGFILE, but does not determine collating sequences for sorting or B-tree indexes.

Language support parameters

After installation set the correct LANGFILE and LANGUSER parameter options to support applications for your language requirements. The value of the LANGFILE and LANGUSER parameters determine which internal language table in NLANG that Model 204 consults for collating sequence, character storage, and uppercase or lowercase translation.

IBM code pages

IBM assigns a code page number to correspond to various sets of characters. Each IBM code page assigns a particular set of character shapes to a corresponding binary code. Model 204 depends on the binary code definition in the IBM code page to handle language support.

The following table lists the Model 204-supported character sets and designated IBM code pages.

Character sets supported in Model 204
Written language Parameter value Refers to IBM code page
Cyrillic CYRILLIC 880
French Canadian FRENCHC 037
Japanese JAPAN 290
Turkish TURKISH 1026
US English US (the default) 1047

LANGFILE: Choosing a character set definition for a file

Class

FPARMS

Default

US, meaning U.S. English

Setting

During file creation or resettable by file manager

Meaning

Use the LANGFILE parameter to specify the language for file processing operations such as the ordering of data and processing LIKE and LANGLIKE patterns. The LANGFILE parameter determines the valid character set in a file.

The value of LANGFILE must be one of the following, listed in this table.

Valid character sets
Written language Model 204 LANGFILE value
Cyrillic CYRILLIC
French Canadian FRENCHC
Japanese JAPAN
Turkish TURKISH
US English US (the default)

Note: You cannot specify a LANGFILE parameter setting other than US for sorted files (FILEORG X'01' setting).

LANGUSER: Setting the language definition of a user thread

Class:

USER

Default:

US, meaning U.S. English

Setting:

On the user's parameter line, resettable

Meaning:

Use the LANGUSER parameter to specify the language that is in use by the thread's I/O device. Different terminals in the same Model 204 run can use different languages. HLI or SQL threads can use different languages from each other and from SOUL or terminal threads.

The value of LANGUSER must be one listed in the following table:

Valid languages for a thread's I/O device
Written language Model 204 LANGUSER value
Cyrillic CYRILLIC
French Canadian FRENCHC
Japanese JAPAN
Turkish TURKISH
US English US (the default)

Data Management Language enhancements

This section describes language support enhancements to Data Management Language.

SQL Server

SQL Server ordering operations for an SQL table use the collating sequence specified by the file's LANGFILE parameter. Model 204 SQL does not permit joins across files that do not have the same LANGFILE parameter settings.

SQL language support requires an additional four bytes in QTBL per compiled query. You can set QTBL on the User 0 line or with the UTABLE command.

Statements that support language-specific ordering

The following SOUL statements and their corresponding HLI calls provide language-specific ordering:

  • FIND (various inequality operators such as index and direct search)
  • FOR EACH RECORD IN ORDER BY (including FROM and TO clauses)
  • FOR EACH VALUE IN ORDER
  • FOR EACH VALUE (in group context)
  • SORT RECORDS/RECORD KEYS
  • SORT VALUES
  • Pattern matcher (LIKE or LANGLIKE clause) range specifications

Note: Sorted file operations (with FILEORG = X'01') are not supported.

Pattern matching using the LANGLIKE operator

The SOUL operator LANGLIKE supports parsing and evaluation of patterns according to the tables provided with the LANGUSER and LANGFILE parameters.

The LANGLIKE syntax is the same as LIKE syntax. See the topic on value loops for more details.

  • The LIKE operator employs U.S. English for parsing the pattern and the value of LANGFILE for evaluating the pattern.
  • The LANGLIKE operator uses the value of LANGUSER for parsing the pattern and the value of LANGFILE for evaluating the pattern.

The parsing language, LANGUSER, is used for checking the syntax of the pattern and for determining the value of:

  • Special pattern escape character
  • Hexadecimal character
  • Alphabetic character

The evaluation language, LANGFILE, is used to match the pattern against the data. In particular, if a range of characters is defined in the pattern, then the collating sequence is determined by the evaluation language, LANGFILE.

Syntax

The format of the FIND statement used to perform pattern matching is:

FIND [ALL] RECORDS {FOR WHICH | WITH} fieldname IS [NOT] LANGLIKE 'pattern'

where:

The LANGLIKE keyword indicates that pattern is the set of characters to match, using LANGUSER and LANGFILE as previously described.

The pattern argument must be enclosed in quotation marks. The characters that you can use in a pattern and the methods of optimizing a pattern retrieval are described in Record loops wiki topic.

SOUL $functions for language support

The $functions in the following table include language-specific processing capabilities.

$Functions for language-specific processing
$function Description
$Alpha Verifies that a string is composed of only characters that are valid in the specified or default language.
$Alphnum Verifies that a string is composed of only characters and digits 0 through 9, which are valid in the specified or default language.
$ChkPat Verifies the syntax of a pattern.
$LangSpc Returns a string containing the language-specific hexadecimal value of a special character on a particular terminal.
$LangSrt Transforms a string into a language-specific sequence value.
$LangUst Restores a transformed string back to its original value.
$LIKE Controls parsing and evaluation languages used in pattern matching.
$Lowcase Translates an uppercase case or mixed-case string into a lowercase string.
$Upcase Translates a lowercase or mixed-case string into an uppercase string.

Terminal interface requirements

Output validation on 3270 full-screen threads uses the list of displayable characters that is specified in the thread's language table, specified by LANGUSER or by the default language, US.

If no such list is supplied, then no output validation is performed, regardless of the setting of the FSTRMOPT parameter.

If there is a list of displayable characters, then output validation is performed when the FSTRMOPT parameter setting allows it; that is, when the X'01' bit is off.

The *UPPER and *LOWER commands, which set case translation, use the case translation rules specified in the thread's language table. If no case translation rules are specified, then no case translation is performed, regardless of the *UPPER or *LOWER command setting.

Using the JAPAN language table

The JAPAN language table is designed to handle Katakana terminal display and to provide upward compatibility with DBCS support in previous releases of Model 204. In particular, case translation and 3270 output validation are disabled.

Using DBCSENV for uppercase translation

When the DBCSENV parameter is set to a nonzero value, the LANGUSER parameter is automatically set to JAPAN. See the Model 204 DBCS Support Summary for the use and setting of this parameter.

Uppercase translation depends on the DBCSENV parameter. In the non-DBCS environment, when an *UPPER command is in effect, Model 204 converts data received from the user to uppercase for:

  • Full-screen editor commands
  • Screen input items not specified as mixed case
  • Line mode input (for example, command and $READ input)

Extended text lines for full-screen editor

Full-screen editor users in the Fujitsu environment can now input extended text lines of up to 255 display positions. If the storage requirement of such a line exceeds 255, the line is truncated cleanly. The screen is resent to the user with the truncated line highlighted, and an error message is displayed in the full-screen editor's message window.

Control characters

For a list of the control characters found in IBM computers and the sequence in which they are sorted, see Control characters.

Special characters

For a list of the special characters found in text, such as punctuation marks, diacritic (or accent) marks, currency symbols, arithmetic and mathematical marks, building blocks for screen forms, and Optical Character Recognition characters:

See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.

Latin Alphabet, Diacritics, Ligatures, and Numerals

For a list of the characters used to build words in U.S. English, French Canadian, and other written languages utilizing the Latin and extended Latin character set:

See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.

Language support topics

The Model 204 language support documentation consists of the pages listed below. This list is also available as a "See also" link from each of the pages.