Intrinsic classes

From m204wiki
(Redirected from Unicode class)
Jump to: navigation, search

You may want to go directly to:

Definition of Intrinsic classes

In “pure” object-oriented languages, all datatypes are objects, that is, all datatypes are extension classes of some base object class. This means that even things that one might not immediately think of as objects, such as character strings or numbers, are, in fact, considered to be objects.

There are two aspects to the assertion that strings or numbers are objects:

  • They are internally managed in exactly the same was as any other objects. That is, a string or numeric variable is actually a reference to an object that contains the string or numeric value, rather than itself directly containing the string or numeric value.
  • They are syntactically identical to other objects. That is, the syntax for manipulating strings or numbers is the same as the syntax for manipulating other objects. More specifically, methods are applied to strings or numbers using the exact same syntax as when methods are applied to other objects.

Since the first aspect is concerned with the internal management of strings or numbers, it is largely irrelevant from the perspective of a programmer using a “pure” object- oriented language. In fact, many languages that profess to be “pure” actually “cheat,” internally, and they special-case string and numeric data. They do so because string and numeric data is so important, and so heavily used in all programming languages, that insisting on not treating it specially (internally) is pedantry that has a cost in performance. To see why, consider a statement that adds two numbers together (the language for the example is, of course, SOUL):

%x is float %y is float ... %x = %y + 13

Insisting that numbers are no different from any other object means that this statement must get the value referenced by %y, add it to the number referenced by the literal 13, create a new Float instance and set %x to reference that new instance. Clearly, this would be less efficient than taking the number in %y, adding 13 to it, and setting %x to the result. Fortunately, the way this is actually done is purely an under-the-covers implementation issue, so one can always pretend that “pure” object-oriented processing is being used. This is so because numbers and strings are an example of a special class of objects called immutable objects. Immutable objects, as the name suggests, cannot be changed once they are created.

To illustrate the difference between immutable objects and mutable objects, consider two variables:

%count is float %account is object bankAccount

If %account were set to a new object instance, it would not be surprising if, after it's set, the object changed. For example, it wouldn't be surprising if the account balance for %account changed, even if the change was done via a different variable:

print %account:balance %myAccount = %account %myAccount:addToBalance(23) print %account:balance

In this example, it is expected that the balance displayed in the second Print statement will be 23 greater than that displayed by the first. On the other hand, given the following:

print %count %myCount = %count %myCount:addToValue(23) print %count

It would be surprising for some operation on %myCount to have changed the value in %count. In general, for variables that contain numeric or string values, one would only expect the value to change when a new value is assigned to it. In fact, one would not expect to see methods like AddToValue that modify the method object for a numeric datatype.

More typical for a numeric datatype would be a method that manipulates the value and produces a new value (object):

%myCount =  %myCount:addToValue(23)

Although even pure object-oriented languages have special syntax for the common operation of addition:

%myCount =  %myCount + 23

Absence of methods that modify a value is what make a class immutable, and in “pure” object-oriented languages, basic string and numeric datatypes are always immutable.

Intrinsic methods in SOUL

SOUL is a legacy programming language for which object-oriented capabilities are a relatively recent addition. So, one might expect that strings and numbers are not maintained internally as objects, since their existence pre-dates objects. This is in fact the case, though as was shown, this is largely irrelevant from a SOUL programmer's perspective, and even if object-oriented capabilities were present in SOUL from its inception, strings and numbers might have been special-cased for efficiency anyway.

Float See List of Float methods
String See List of String methods
Unicode See List of Unicode methods

The benefits of object-oriented syntax in SOUL

Beyond the internal representation of strings and numbers, a distinction between SOUL and pure object-oriented languages was that the object:method syntax was not used to manipulate strings and numbers; instead, strings and numbers were manipulated via $functions. This was changed to allow the object:method syntax against string and numeric variables and constants.

The following code fragment illustrates how the same operation can be accomplished with traditional $functions and with object-oriented syntax:

%name is string len 32 ... print $len(%name) print %name:length

While this might not seem very significant, it provides considerable value:

  • It allows SOUL to be used as a “pure” object-oriented language. This might be especially appealing to programmers who were trained in a pure object-oriented language.
  • It provides the benefit that expressions can generally be read in the natural left-to-right manner rather than the inside-to-outside manner required to understand an expression coded with $functions.
  • It provides capabilities to methods that operate on strings or numbers that are not available with $functions. These include support for named parameters and the ability to take objects as input parameters and to produce objects as results. While it would have been possible to extend $functions to have this same functionality (much as Sirius Mods provided callable $function support), it makes more sense to provide it using true object-oriented syntax. Given that this has now been done, it is unlikely that these capabilities will ever be added to $functions.

Two generic intrinsic classes: string and numeric

SOUL traditionally allowed declarations of many different datatypes. These included Strings of a specified length with a possible DP value, Fixed numerics (also with a possible DP), and Float numerics. In addition, Sirius Mods provided support for Longstring datatypes. And images provide support for a variety of additional datatypes, including packed and zoned formats.

Essentially, however, there are two categories of datatypes in SOUL: string and numeric. In SOUL one can easily assign values from one datatype to another, even between numeric and string types. This is intuitive and useful. Typically, in the “real world” one doesn't distinguish between the string representation of a number and its numeric value used for calculation. Similarly, except for performance purposes and, perhaps, value limits, one is typically not concerned about how a value is stored internally.

This is not the case in some “pure” object-oriented languages where the paradigm of strong-datatyping is extended to numeric and string datatypes. This means that if one wants to display the value of a numeric variable, one must explicitly convert it to a string (since what one displays is a string). Similarly, if one wishes to do a calculation using a value in a string (perhaps read from an external data source), one must explicitly convert the string to a number. It is asserted here that strong-datatyping for strings and numbers is a mistake, allowing some vision of purity to prevent programmers from doing something completely natural (treating numbers as strings, and strings as numbers) and something that is done hundreds of times in any program of any size. This view is reinforced by the fact that the general industry trend is away from strong-datatyping for strings and numbers and toward implicit conversion between these types.

It is worth pointing out, however, that loose-datatyping between strings and numbers does not imply loose-datatyping between strings and numbers and other classes. That is, there are generally no implicit conversions of strings or numbers to or from non-string, non-numeric objects.

Because loose-datatyping and the existing SOUL datatypes work quite well in facilitating rapid development of reasonably tight and efficient code, support for methods against string and numeric types does not add any new datatypes to SOUL. And, because of loose-datatyping, the numeric and string methods can be applied to any of the standard SOUL string and numeric datatypes. Because they apply to these intrinsic SOUL datatypes, these methods are called intrinsic methods.

Intrinsic methods can be further classified into these subsets:

Float Methods that perform numeric manipulation on values in Model 204's Float format.
Fixed Methods that perform numeric manipulation on values in Model 204's Fixed format (with an assumed DP of 0).
String Methods that perform string manipulation, with Longstring capability assumed.
UnicodeMethods that perform Unicode string manipulation, return Unicode results, or are based on the Unicode tables.

The Float intrinsic class can really be thought of as a generic numeric class, as it is a convenient way of representing both integer and decimal data. This is different from most other programming languages where a Float datatype suffers from inconvenient behavior, especially for decimal numbers.

The Float class is so convenient that all numeric parameters to system methods and $functions are treated as Float parameters. Also, because of the convenience of the Float class, there are currently no methods in the Fixed class. As such, the intrinsic methods can be thought of broadly as belonging to two intrinsic classes: the Numeric class and the String class. These classes behave largely as if their inputs were SOUL Float and Longstring datatypes, respectively. That is, for all intents and purposes, the following are true:

  • Any non-Float values, including Unicode, are converted to Float values before being processed by a Numeric method. This includes the conversion of any non-numeric strings into 0 that occurs in other SOUL contexts that take a numeric input.
  • For the purposes of truncation and the With operation, all String method string inputs or outputs behave as Longstrings.

Strings and numbers as method objects

The term "method object" (or "intrinsic method object") is used for the value or variable to which an intrinsic method is applied, even though the value or variable isn't really an object. This is justified because, as noted above, strings and numbers can be considered immutable objects, regardless of their history or internal representation.

For example, in the following case, %str is the intrinsic method object for the Length method:

%str is string len 32 ... print %str:Length

Intrinsic methods can be applied to constants, in addition to variables. For example, the following assigns the length of the string literal Whatever to %x:

%x = 'Whatever':length

Intrinsic methods can take the output from another method, intrinsic or not, as its input. For example, the following uses the Right method (which gets the rightmost characters of a string) against the output of a Stringlist Item method:

%list is object stringlist ... %value = %list:item(%i):right(8)

Since it is unnecessary to explicitly specify the Item method for a Stringlist, the above can also be written as:

%list is object stringlist ... %value = %list(%i):right(8)

Intrinsic methods can also be applied to SOUL Image and Screen items:

image michigan romulus is string len 32 ... end image ... %value = %michigan:romulus:right(8)

And Intrinsic methods can be applied to the output from a $function:

%seconds = $time:right(2)

Intrinsic methods can even be applied to the results of expressions:

... %value = (%tweedledum with %tweedledee):right(8)

Automatic "implicit" conversion of intrinsic values

As with other methods, the colon that separates the intrinsic method object from the method name can be optionally preceded or followed by spaces. This could be done to enhance readability, or even to split a long line. The following four statements are all equivalent:

%value = %list:item(%i):right(8) %value = %list: item(%i): right(8) %value = %list :item(%i) :right(8) %value = %list : item(%i) : - right(8)

As with most other uses of intrinsic variables or values, if the method object is of a different type than the datatype on which the method operates, the input value is automatically converted into the target datatype. For example, the expression 7/3 clearly produces a numeric value, but the Left method operates on strings. So, in the following statement:

%i = (7/3):left(4)

The result of the division is converted into a string and then passed to Left, which produces 2.33 as its result.

Because of this automatic conversion, the specific class (String, Float, Unicode) of an intrinsic method cannot be determined from the datatype of its method object. This means that the class must be determined only from the name of the method, which means that method names must be unique among all intrinsic classes (that is, among String, Float, and Unicode). For example, there may not be a Length method in both the String and Float intrinsic classes. In the case of Length, the method is an intrinsic String method, there is no comparable Float method, and UnicodeLength is the comparable Unicode method.

Note: Because some EBCDIC characters are not translatable to Unicode, and vice versa, automatic conversions involving Unicode values may cause request cancellation. The explicit intrinsic conversion methods (like EbcdicToUnicode and EbcdicToAscii and their counterparts UnicodeToEbcdic and AsciiToEbcdic), however, have a parameter (as of Sirius Mods version 7.6) that lets you decode or encode untranslatable characters.

Intrinsic method syntax: special cases

SOUL is a legacy programming language that supports some unusual syntax. Some of this syntax causes problems for the use of intrinsic methods, as is described for the following three cases:

Intrinsic methods against database field names

Field names have very loose naming rules, and names can contain colons and spaces. Because of this, a field name must be contained inside of parentheses to be used as a method object. For example, the field big fat greek field is used as the input to the intrinsic Substring method:

for each record in %recordset ...  %value = (big fat greek field):substring(2, 3) ... end for

Intrinsic methods against percent variables and images that have the same name

SOUL allows one to declare images and percent variables with the same name in the same scope. That is, inside a method, you can declare an Image called holland and a %variable called %holland. References to items in the image and to the percent variable both start with %holland:

image holland ... factory is float ... end image ... %holland is string len 10 ... %a = %holland:factory ... %a = %holland

Historically, this has not been a problem because a percent variable would never be followed by a colon. However, with intrinsic methods, this is no longer the case. If the Holland image contained an item called Length, it would be impossible to tell whether %holland:length referred to the Length item in the Holland Image or the intrinsic Length method applied to %holland.

Because of this ambiguity, intrinsic methods cannot be used against percent variables without a blank between the variable name and the colon if there is an Image with the same name as the percent variable. This is true regardless of whether or not there is an actual conflict between the method name and an image item name.

So, using the above example, to apply the length method to %holland, you must do one of the following:

%a = %holland :length %a = %holland : length

Specifying %holland: length would not work. A space after the variable name indicates that the reference is not to the Image, because Image item references do not allow any blanks between the Image name, colon, and Image item name.

You can also simply wrap the variable name in parentheses:

%a = (%holland):length

Of course, the best solution is to avoid using the same name for images and percent variables, as this is generally somewhat confusing, anyway.

Intrinsic methods in a Print, Audit, or Trace statement

The SOUL Print, Audit, and Trace statements all use somewhat unusual syntax. While initially these statements appear to operate on expressions, just like an assignment, this is not really the case. For example, the following statement gets a compilation error:

print 3*4

Because of this, one has to be careful using intrinsic methods in a Print, Audit, or Trace statement. The general recommendation is to use PrintText, AuditText, or TraceText, as described below.

There are several specific syntax problems with Print, Audit, and Trace:

  • Because blanks are treated specially in these statements, you may not put a blank before the colon in an intrinsic method invocation. That is, the following is incorrect:

    print %x :length

    But the following two statements are allowed:

    print %x:length print %x: length

  • String and literal constants are treated specially by these statements, so you cannot issue methods against constants in a Print, Audit, or Trace statement. That is, the following are incorrect:

    print 'Foobar':length print 22:squareRoot

  • Although you might be tempted to use parentheses to get around some of these issues, leading parentheses are not allowed in a Print, Audit, or Trace statement token. That is, the following is incorrect:

    print (fieldname):length

    Coupled with the fact that applying intrinsic methods to fields generally requires use of parentheses, this means that you cannot display the result of an intrinsic method applied to a field with the Print, Audit, and Trace methods.

Fortunately, the newer PrintText, AuditText, and TraceText statements (see Targeted Text statements) are direct analogs of Print, Audit, and Trace, respectively, but they use a more consistent syntax. Specifically, the newer statements treat everything as literal text, except for that which is enclosed within the expression start and end characters (which default to curly braces: {...}), which is treated as a standard SOUL expression. This means that the syntax used for variable parts of PrintText, AuditText, and Tracetext statements is identical to the syntax allowed on the right side of a variable assignment.

So, the following is valid:

printText {3*4}

And this is valid:

printText {%x :length}

These are valid statements:

printText {'Foobar':length} printText {22:squareRoot}

And this is valid:

printText {(fieldname):length}

The Text statement also provides functionality comparable to the PrintText, AuditText, and TraceText statements, and it is especially useful for displaying multiple lines of data.

So it is recommended that you discontinue the use of the Print, Audit, and Trace statements in favor of PrintText, AuditText, and TraceText.

See also