Intrinsic classes: Difference between revisions

From m204wiki
Jump to navigation Jump to search
No edit summary
mNo edit summary
Line 178: Line 178:
their counterparts UnicodeToEbcdic and AsciiToEbcdic), however, have a parameter (as
their counterparts UnicodeToEbcdic and AsciiToEbcdic), however, have a parameter (as
of Sirius Mods version 7.6) that lets you decode or encode untranslatable characters.
of Sirius Mods version 7.6) that lets you decode or encode untranslatable characters.
[[Category:System classes]]

Revision as of 03:30, 18 December 2010

See also:

Definition of Intrinsic Classes

In “pure” object-oriented languages, all datatypes are objects, that is, all datatypes are extension classes of some base object class. This means that even things that one might not immediately think of as objects, such as character strings or numbers, are, in fact, considered to be objects. There are two aspects to the assertion that strings or numbers are objects:

  • They are internally managed in exactly the same was as any other objects. That is, a string or numeric variable is actually a reference to an object that contains the string or numeric value, rather than itself directly containing the string or numeric value.
  • They are syntactically identical to other objects. That is, the syntax for manipulating strings or numbers is the same as the syntax for manipulating other objects. More specifically, methods are applied to strings or numbers using the exact same syntax as when methods are applied to other objects.

Since the first aspect is concerned with the internal management of strings or numbers, it is largely irrelevant from the perspective of a programmer using a “pure” object- oriented language. In fact, many languages that profess to be “pure” actually “cheat,” internally, and they special-case string and numeric data. They do so because string and numeric data is so important, and so heavily used in all programming languages, that insisting on not treating it specially (internally) is pedantry that has a cost in performance. To see why, consider a statement that adds two numbers together (the language for the example is, of course, User Language):

      %x is float
      %y is float
       ...
      %x = %y + 13

Insisting that numbers are no different from any other object means that this statement must get the value referenced by %y, add it to the number referenced by the literal “13”, create a new Float instance and set %x to reference that new instance. Clearly, this would be less efficient than taking the number in %y, adding 13 to it and setting %x to the result. Fortunately, which way this is actually done is purely an under-the-covers implementation issue, so one can always pretend that “pure” object-oriented processing is being used. This is so because numbers and strings are an example of a special class of objects called immutable objects. Immutable objects, as the name suggests, cannot be changed once they are created. To illustrate the difference between immutable objects and mutable objects, consider two variables:

     %count        is float
     %account      is object bankAccount

If %account were set to a new object instance, it would not be surprising if, after it's set, the object changed. For example, it wouldn't be surprising if the account balance for %account changed, even if the change was done via a different variable:

     print %account:balance
     %myAccount = %account
     %myAccount:addToBalance(23)
     print %account:balance

In this example, it is expected that the balance displayed in the second Print statement will be 23 greater than that displayed by the first. On the other hand, given the following:

     print %count
     %myCount = %count
     %myCount:addToValue(23)
     print %count

It would be surprising for some operation on %myCount to have changed the value in %count. In general, for variables that contain numeric or string values, one would only expect the value to change when a new value is assigned to it. In fact, one would not expect to see methods like AddToValue that modify the method object for a numeric datatype. More typical for a numeric datatype would be a method that manipulates the value and produces a new value (object):

     %myCount =       %myCount:addToValue(23)

Although even pure object-oriented languages have special syntax for the common operation of addition:

     %myCount =       %myCount + 23

Absence of methods that modify a value is what make a class immutable, and in “pure” object-oriented languages, basic string and numeric datatypes are always immutable.

Intrinsic methods in User Language

User Language is a legacy programming language for which object-oriented capabilities are a relatively recent addition. So, one might expect that strings and numbers are not maintained internally as objects, since their existence pre-dates objects. This is in fact the case, though as was shown, this is largely irrelevant from a User Language programmer's perspective, And even if object-oriented capabilities were present in User Language from its inception, strings and numbers might have been special-cased for efficiency anyway.

The benefits of object-oriented syntax in User Language

Beyond the internal representation of strings and numbers, a distinction between User Language and pure object-oriented languages was that the object:method syntax was not used to manipulate strings and numbers; instead, strings and numbers were manipulated via $functions. Sirius Mods 7.2 changed this to allow the object:method syntax against string and numeric variables and constants. The following code fragment illustrates how the same operation can be accomplished via traditional $functions and via object-oriented syntax:

    %name         is string len 32
      ...
    print $len(%name)
    print %name:length

While this might not seem very significant, it provides considerable value:

  • It allows User Language to be used as a “pure” object-oriented language. This might be especially appealing to programmers who were trained in a pure object-oriented language.
  • It provides the benefit that expressions can generally be read in the natural left-to-right manner rather than the inside-to-outside manner required to understand an expression coded with $functions.
  • It provides capabilities to methods that operate on strings or numbers that are not available with $functions. These include support for named parameters and the ability to take objects as input parameters and to produce objects as results. While it would have been possible to extend $functions to have this same functionality (much as Sirius Mods provided callable $function support), it makes more sense to provide it using true object-oriented syntax. Given that this has now been done, it is unlikely that these capabilities will ever be added to $functions.

Two generic intrinsic classes: string and numeric

User Language traditionally allowed declarations of many different datatypes. These included Strings of a specified length with a possible DP value, Fixed numerics (also with a possible DP), and Float numerics. In addition, Sirius Mods provided support for Longstring datatypes. And images provide support for a variety of additional datatypes, including packed and zoned formats. Essentially, however, there are two categories of datatypes in User Language: string and numeric. In User Language one can easily assign values from one datatype to another, even between numeric and string types. This is intuitive and useful. Typically, in the “real world” one doesn't distinguish between the string representation of a number and its numeric value used for calculation. Similarly, except for performance purposes and, perhaps, value limits, one is typically not concerned about how a value is stored internally.

This is not the case in some “pure” object-oriented languages where the paradigm of strong-datatyping is extended to numeric and string datatypes. This means that if one wants to display the value of a numeric variable, one must explicitly convert it to a string (since what one displays is a string). Similarly, if one wishes to do a calculation using a value in a string (perhaps read from an external data source), one must explicitly convert the string to a number. It is asserted here that strong-datatyping for strings and numbers is a mistake, allowing some vision of purity to prevent programmers from doing something completely natural (treating numbers as strings, and strings as numbers) and something that is done hundreds of times in any program of any size. This view is reinforced by the fact that the general industry trend is away from strong-datatyping for strings and numbers and toward implicit conversion between these types.

It is worth pointing out, however, that loose-datatyping between strings and numbers does not imply loose-datatyping between strings and numbers and other classes. That is, there are generally no implicit conversions of strings or numbers to or from non-string, non-numeric objects.

Because loose-datatyping and the existing User Language datatypes work quite well in facilitating rapid development of reasonably tight and efficient code, the Janus SOAP ULI support for methods against string and numeric types does not add any new datatypes to User Language. And, because of loose-datatyping, the numeric and string methods can be applied to any of the standard User Language string and numeric datatypes. Because they apply to these intrinsic User Language datatypes, these methods are called intrinsic methods.

Intrinsic methods can be further classified into these subsets:

Float
Methods that perform numeric manipulation on values in Model 204's Float format.
Fixed
Methods that perform numeric manipulation on values in Model 204's Fixed format (with an assumed DP of 0).
String
Methods that perform string manipulation, with Longstring capability assumed.
Unicode
Methods that perform Unicode string string manipulation, return Unicode results, or are based on the Unicode tables.

The Float intrinsic class can really be thought of as a generic numeric class, as it is a convenient way of representing both integer and decimal data. This is different from most other programming languages where a Float datatype suffers from inconvenient behavior, especially for decimal numbers.

The Float class is so convenient that all numeric parameters to system methods and $functions are treated as Float parameters. Also, because of the convenience of the Float class, there are currently no methods in the Fixed class. As such, the intrinsic methods can be thought of broadly as belonging to two intrinsic classes: the Numeric class and the String class. These classes behave largely as if their inputs were User Language Float and Longstring datatypes, respectively. That is, for all intents and purposes, the following are true:

  • Any non-Float values, including Unicode, are converted to Float values before being processed by a Numeric method. This includes the conversion of any non-numeric strings into 0 that occurs in other User Language contexts that take a numeric input.
  • For the purposes of truncation and the With operation, all String method string inputs or outputs behave as longstrings.

Automatic “implicit” conversion of intrinsic values

As with other methods, the colon that separates the method object from the method name can be optionally preceded or followed by spaces. This could be done to enhance readability, or even to split a long line. The following four statements are all equivalent:

    %value     = %list:item(%i):right(8)
    %value     = %list: item(%i): right(8)
    %value     = %list :item(%i) :right(8)
    %value     = %list : item(%i) : -
                         right(8)

As with most other uses of intrinsic variables or values, if the method object is of a different type than the datatype on which the method operates, the input value is automatically converted into the target datatype. For example, the expression 7/3 clearly produces a numeric value, but the Left method operates on strings. So, in the statement:

     %i   = (7/3):left(4)

the result of the division is converted into a string and then passed to Left, which produces 2.33 as its result. Because of this automatic conversion, the specific class (String, Float, Unicode) of an intrinsic method cannot be determined from the datatype of its method object. This means that the class must be determined only from the name of the method, which means that method names must be unique among all intrinsic classes (that is, among String, Float, and Unicode). For example, there may not be a Length method in both the String and Float intrinsic classes. In the case of Length, the method is an intrinsic String method, there is no comparable Float method, and UnicodeLength is the comparable Unicode method.

Note: Because some EBCDIC characters are not translatable to Unicode, and vice versa, automatic conversions involving Unicode values may cause request cancellation. The explicit intrinsic conversion methods (like EbcdicToUnicode and EbcdicToAscii and their counterparts UnicodeToEbcdic and AsciiToEbcdic), however, have a parameter (as of Sirius Mods version 7.6) that lets you decode or encode untranslatable characters.