Model 204 language support: Difference between revisions
(deleted $Lowcase description; linked to $Lowcase page instead) |
(deleted $Upcase description; linked to $Upcase page instead) |
||
Line 289: | Line 289: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>[[ | <td>[[$Upcase|$Upcase]]</td> | ||
<td>Translates a lowercase or mixed-case string into an uppercase string.</td> | <td>Translates a lowercase or mixed-case string into an uppercase string.</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
==Terminal interface requirements== | ==Terminal interface requirements== |
Revision as of 01:10, 9 March 2016
Model 204 contains a language support feature for customers who sort and display Model 204 data using single-byte character sets other than U.S. English or Japanese double-byte character set (DBCS).
This feature is included in the SOUL (User Language), HLI, and SQL interfaces. This topic describes the facilities that perform this language-specific processing for Model 204 data display, sequencing, and collating.
Overview
Language support in computer data storage means being able to receive, store, and redisplay differing character sets and devising algorithms to handle the correct sorting procedures. Worldwide use of the computer to store, transmit, share, and compare data exposed the need to:
- Analyze the character sets used by written languages to determine which characters are shared and which characters are unique to a character set.
- Respect the collating sequence or order of precedence rules used by a written language.
Language support terminology
A character set is a set of symbols or marks used in a writing system, such as a letter of the alphabet. Character sets differ in the number of characters, the specific characters included, and their collating sequence.
Once a character set is identified, the next task is handling the collating sequence. Collating sequence is the sequence in which characters are ordered for sorting, merging, and comparing. Specifically it is the order assigned to the characters of a character set (in computers, for example, ASCII, U.S. English, and EBCDIC) used for sequencing purposes. Usage determines the correct collating sequence for each writing system. The commonplace examples of collating sequences are telephone directories and dictionaries.
Language support documentation
For a thorough discussion of the decisions surrounding language support, consult the following documents:
- Canadian Alphanumeric Ordering Standard for Character Sets of CSA Standard CAN/CSA-Z243.4, Canadian Standards Assoc., Rexdale (Toronto), Ontario, Canada, 1992.
- The Unicode Standard: Worldwide Character Encoding, Version 1.0, Volume 1, The Unicode Consortium, Addison-Wesley, Reading, MA, 1991.
Adding other languages
Adding a language to those already developed for Model 204 language processing is a cooperative venture between a customer and Rocket Software. If you are interested, please consult your sales representative.
A note about User Language and SOUL
Model 204 versions 7.5 and higher provide a significantly enhanced, object-oriented, version of User Language called SOUL. All existing User Language programs will continue to work under SOUL, so User Language can be considered to be a subset of SOUL, though the name "User Language" is now deprecated. In this topic, the name "User Language" has been replaced with "SOUL."
Collating sequence support
The language support feature in Model 204 currently sorts using the expected collating sequence for U.S. English and limited support of Japanese.
NLANG, the language support module
Model 204 modules are linked with a set of language support tables in the NLANG module that define written languages. A Model 204 supported language consists of translation tables and flag tables containing information about:
- Alphabetic characters, lowercase to uppercase
- Alphabetic characters, uppercase to lowercase
- LANGSORT tables
- Pattern matcher
- Characters you can enter at the keyboard
- Characters you can display on the terminal
- ASCII to EBCDIC translation
Supported languages
After installing Model 204, you can select one of five variations of the internal language table. The LANGUSER and LANGFILE parameter settings you select sets the terminal and print capabilities. The NLANG module contains internal language tables for the following languages:
- Cyrillic
- French Canadian
- Japanese
- Turkish
- US English
The internal language table provides the same input and output translation tables, uppercase or lowercase translation, and $ALPHA support as the coordinating Model 204 parameters, LANGUSER and LANGFILE, but does not determine collating sequences for sorting or B-tree indexes.
Language support parameters
After installation set the correct LANGFILE and LANGUSER parameter options to support applications for your language requirements. The value of the LANGFILE and LANGUSER parameters determine which internal language table in NLANG that Model 204 consults for collating sequence, character storage, and uppercase or lowercase translation.
IBM code pages
IBM assigns a code page number to correspond to various sets of characters. Each IBM code page assigns a particular set of character shapes to a corresponding binary code. Model 204 depends on the binary code definition in the IBM code page to handle language support.
The following table lists the Model 204-supported character sets and designated IBM code pages.
Written language | Parameter value | Refers to IBM code page |
---|---|---|
Cyrillic | CYRILLIC | 880 |
French Canadian | FRENCHC | 037 |
Japanese | JAPAN | 290 |
Turkish | TURKISH | 1026 |
US English | US (the default) | 1047 |
LANGFILE: Choosing a character set definition for a file
Class
FPARMS
Default
US, meaning U.S. English
Setting
During file creation or resettable by file manager
Meaning
Use the LANGFILE parameter to specify the language for file processing operations such as the ordering of data and processing LIKE and LANGLIKE patterns. The LANGFILE parameter determines the valid character set in a file.
The value of LANGFILE must be one of the following, listed in this table.
Written language | Model 204 LANGFILE value |
---|---|
Cyrillic | CYRILLIC |
French Canadian | FRENCHC |
Japanese | JAPAN |
Turkish | TURKISH |
US English | US (the default) |
Note: You cannot specify a LANGFILE parameter setting other than US for sorted files (FILEORG X'01' setting).
LANGUSER: Setting the language definition of a user thread
Class:
USER
Default:
US, meaning U.S. English
Setting:
On the user's parameter line, resettable
Meaning:
Use the LANGUSER parameter to specify the language that is in use by the thread's I/O device. Different terminals in the same Model 204 run can use different languages. HLI or SQL threads can use different languages from each other and from SOUL or terminal threads.
The value of LANGUSER must be one listed in the following table:
Written language | Model 204 LANGUSER value |
---|---|
Cyrillic | CYRILLIC |
French Canadian | FRENCHC |
Japanese | JAPAN |
Turkish | TURKISH |
US English | US (the default) |
Data Management Language enhancements
This section describes language support enhancements to Data Management Language.
SQL Server
SQL Server ordering operations for an SQL table use the collating sequence specified by the file's LANGFILE parameter. Model 204 SQL does not permit joins across files that do not have the same LANGFILE parameter settings.
SQL language support requires an additional four bytes in QTBL per compiled query. You can set QTBL on the User 0 line or with the UTABLE command.
Statements that support language-specific ordering
The following SOUL statements and their corresponding HLI calls provide language-specific ordering:
- FIND (various inequality operators such as index and direct search)
- FOR EACH RECORD IN ORDER BY (including FROM and TO clauses)
- FOR EACH VALUE IN ORDER
- FOR EACH VALUE (in group context)
- SORT RECORDS/RECORD KEYS
- SORT VALUES
- Pattern matcher (LIKE or LANGLIKE clause) range specifications
Note: Sorted file operations (with FILEORG = X'01') are not supported.
Pattern matching using the LANGLIKE operator
The SOUL operator LANGLIKE supports parsing and evaluation of patterns according to the tables provided with the LANGUSER and LANGFILE parameters.
The LANGLIKE syntax is the same as LIKE syntax. See the topic on value loops for more details.
- The LIKE operator employs U.S. English for parsing the pattern and the value of LANGFILE for evaluating the pattern.
- The LANGLIKE operator uses the value of LANGUSER for parsing the pattern and the value of LANGFILE for evaluating the pattern.
The parsing language, LANGUSER, is used for checking the syntax of the pattern and for determining the value of:
- Special pattern escape character
- Hexadecimal character
- Alphabetic character
The evaluation language, LANGFILE, is used to match the pattern against the data. In particular, if a range of characters is defined in the pattern, then the collating sequence is determined by the evaluation language, LANGFILE.
Syntax
The format of the FIND statement used to perform pattern matching is:
FIND [ALL] RECORDS {FOR WHICH | WITH} fieldname IS [NOT] LANGLIKE 'pattern'
Where
The LANGLIKE keyword indicates that pattern is the set of characters to match, using LANGUSER and LANGFILE as previously described.
The pattern argument must be enclosed in quotation marks. The characters that you can use in a pattern and the methods of optimizing a pattern retrieval are described in Record loops wiki topic.
SOUL $functions for language support
The $functions in the following table include language-specific processing capabilities.
$function | Description |
---|---|
$Alpha | Verifies that a string is composed of only characters that are valid in the specified or default language. |
$Alphnum | Verifies that a string is composed of only characters and digits 0 through 9, which are valid in the specified or default language. |
$ChkPat | Verifies the syntax of a pattern. |
$LangSpc | Returns a string containing the language-specific hexadecimal value of a special character on a particular terminal. |
$LangSrt | Transforms a string into a language-specific sequence value. |
$LangUst | Restores a transformed string back to its original value. |
$LIKE | Controls parsing and evaluation languages used in pattern matching. |
$Lowcase | Translates an uppercase case or mixed-case string into a lowercase string. |
$Upcase | Translates a lowercase or mixed-case string into an uppercase string. |
Terminal interface requirements
Output validation on 3270 full-screen threads uses the list of displayable characters that is specified in the thread's language table, specified by LANGUSER or by the default language, US.
If no such list is supplied, then no output validation is performed, regardless of the setting of the FSTRMOPT parameter.
If there is a list of displayable characters, then output validation is performed when the FSTRMOPT parameter setting allows it; that is, when the X'01' bit is off.
The *UPPER and *LOWER commands, which set case translation, use the case translation rules specified in the thread's language table. If no case translation rules are specified, then no case translation is performed, regardless of the *UPPER or *LOWER command setting.
Using the JAPAN language table
The JAPAN language table is designed to handle Katakana terminal display and to provide upward compatibility with DBCS support in previous releases of Model 204. In particular, case translation and 3270 output validation are disabled.
Using DBCSENV for uppercase translation
When the DBCSENV parameter is set to a nonzero value, the LANGUSER parameter is automatically set to JAPAN. See the Model 204 DBCS Support Summary for the use and setting of this parameter.
Uppercase translation depends on the DBCSENV parameter. In the non-DBCS environment, when an *UPPER command is in effect, Model 204 converts data received from the user to uppercase for:
- Full-screen editor commands
- Screen input items not specified as mixed case
- Line mode input (for example, command and $READ input)
Extended text lines for full-screen editor
Full-screen editor users in the Fujitsu environment can now input extended text lines of up to 255 display positions. If the storage requirement of such a line exceeds 255, the line is truncated cleanly. The screen is resent to the user with the truncated line highlighted, and an error message is displayed in the full-screen editor's message window.
Control characters
For a list of the control characters found in IBM computers and the sequence in which they are sorted, see Control characters.
Special characters
For a list of the special characters found in text, such as punctuation marks, diacritic (or accent) marks, currency symbols, arithmetic and mathematical marks, building blocks for screen forms, and Optical Character Recognition characters:
See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.
Latin Alphabet, Diacritics, Ligatures, and Numerals
For a list of the characters used to build words in U.S. English, French Canadian, and other written languages utilizing the Latin and extended Latin character set:
See the Unicode code charts or the Unicode Standard Worldwide Character Encoding, Version 1.0, Volume 1.
Language support topics
The Model 204 language support documentation consists of the pages listed below. This list is also available as a "See also" link from each of the pages.