Collections

From m204wiki
Jump to navigation Jump to search

It is common in programming to have groups of things that all have some common characteristic and are used fairly interchangeably. For example, one might have a number of items in an order, or a parent might have a number of children. Traditionally, this type of processing is dealt with using arrays. Traditional User Language arrays are enhanced by SOUL in Model 204 V7.5 to include support for arrays of structures and objects.

Arrays, however, have some limitations:

  • They are static. That is, array instances and their size are set at compile-time.
  • There are relatively few things you can do with an array. Typically, you are limited to setting and retrieving the value of an array item.

Since both of these problems with arrays do not apply to objects, another way to view the problem with arrays is that they are not objects, As a remedy, SOUL has a special kind of object that is very much like an array but has all the advantages of an object. These objects are called collections.

Many other object-oriented languages also have a concept of collections or collection objects but with a slight difference: object-oriented languages that have collection objects usually implement them as collections of generic objects, that is objects with no compile-time class (or more accurately, a very generic class). While this provides much flexibility, it also defeats one of the advantages of traditional arrays — compile-time knowledge of the array element datatypes — so it sacrifices compile-time error checking and more efficient compiled code.

The SOUL implementation of typed collections provides the best of both worlds: collection objects along with compile-time declaration of the type of the collection elements.

Collection object declaration syntax

{variable} [Is] [Collection] - collectionType Of [collectionType Of] type - [ Global [(globalName)] ]

variable The name of the object variable that refers to a collection of the indicated type. If outside a class declaration block and structure, the variable must begin with a percent sign. If inside a structure declaration, the variable must not begin with a percent sign. If inside a class declaration block, the variable cannot start with a percent sign and must be preceded by the word Variable.
collectionType The name of one of the system collection classes (Arraylist, NamedArraylist, FloatNamedArraylist, UnicodeNamedArraylist). Since collections are always system classes, these names may be preceded with System:.
type Any basic SOUL datatype, including String (with a length and optional DP), Fixed (with an optional DP), Float, Longstring, Unicode, and Object (followed by the object class).

Arrays and structures are not supported as collection items, but collections can themselves be collection items. If type is a collection, you specify both the collection and item datatypes. For example: Arraylist of string len 16

globalname The optional global name if the collection variable is not a class or structure variable. If Global is specified without globalName, the name of the variable without a percent sign is used as the global name.

Collection object declarations do not contain parameters for size or for number of items. The number of items in a collection is completely dynamic, so no arbitrary limit needs to be set on their size (although SOUL will not support collections with more than 2**31, or about 2 billion, items).

Collections have one VTBL/STBL slot to hold a single collection element, and all other collection items are kept in CCATEMP.

Note: For a collection of objects, it is the object references that are kept in CCATEMP, not the actual objects. So even though only one object reference in the collection will be in VTBL, many of the objects referenced in the collection may be in VTBL/STBL. The collection item in VTBL/STBL is the last referenced item, and an immediately subsequent reference to that same item will use the item directly from VTBL/STBL rather than loading it from CCATEMP. For this reason, consecutive references to the same collection item are quite efficient.

Here are sample collection object declarations:

%scores is collection arraylist of float %costs is arraylist of fixed dp 2 %basket is floatNamedArraylist of object order %staff is collection unicodeNamedArraylist of string len 64 %value is namedArraylist of namedArraylist of longstring

Coding considerations for collections

Although collections vary by collection type, they are more alike than they are different. The remainder of this article contains topics that address features and methods common to all the collection types. The characteristics and methods of each individual collection type are described in separate articles; links to those groups of articles are contained in the "See also" section below.

Operating on collection items

All collections have at least the two standard properties described below.

Count Number of items in collection.
Item A specific item in the collection.

The Item property always takes at least one parameter indicating which item is being referenced, though the type of the parameter can vary with the collection class.

Some collection classes allow references to items that have not been added to the collection with some other method; other collection classes do not, and cause a request cancellation on such a reference.

The Item property can be both set and retrieved in all collections.

The following example illustrates a simple use of the Count and Item properties:

%scores is collection arraylist of float ... %scores = new ... for %i from 1 to %scores:count %scores:item(%i) = %scores:item(%i) + 10 end for

The Item property name may be left off:

for %i from 1 to %scores:count %scores(%i) = %scores(%i) + 10 end for

If the %scores Arraylist is just one of the members to be updated in the %allScores collection of Arraylists, you might have consecutive omissions of the Item name (%allScores(%i)(%j)):

for %i from 1 to %allScores:count for %j from 1 to %allScores(%i):count %allScores(%i)(%j) = %allScores(%i)(%j) + 10 end for end for

Explicitly specifying the method name Item, is not required because:

  • It is very convenient to omit it, since the Item property is so heavily used.
  • The Item property always has at least one parameter, so a collection name followed by a parenthesis can be taken to clearly imply the Item property.
  • It facilitates conversion of existing arrays to collections.

Collections also have methods that depend on the collection types. The methods associated with the SOUL collection types (Arraylist, NamedArraylist, FloatNamedArraylist, UnicodeNamedArraylist) are described in subsequent sections.

In the descriptions of the methods, the term item means a value or variable of the same type as specified on the collection declaration, or a value or variable convertible to that type. For example, if a collection is declared as:

%stooges is arraylist of object stooge

An Add method applied to %stooges can only add an object of class stooge:

%moe is object stooge ... %stooges:add(%moe)

On the other hand, if a collection is declared as

%value is Arraylist of string len 16

you can add anything that can be converted to a string:

%long is longstring ... %value:add('A string') %value:add(22) %value:add(%long)

Note: In the example above, as is usual with longstrings, a truncation of the longstring on addition would cause a request cancellation error.

Comparing and assigning new collection variables

Collection variables, being a special variety of object variables, can be assigned to each other and compared, just like other object variables:

%alist is arraylist of longstring %blist is arraylist of longstring ... %blist = %alist ... if %alist eq %blist then

For assignment or comparison to be allowed, however, both the collection type (Arraylist, for example) and the collection item datatype must be identical. As just two examples, you cannot assign the contents of an Arraylist variable to a NamedArraylist variable, and you cannot assign the contents of an arraylist of longstring to an arraylist of float.

Note:

When assigning to a collection variable and using the syntax of the New function that explicitly indicates the class for collections, both the collection and item datatypes must be specified just as on the collection variable declaration:

%alist is collection floatNamedArraylist of longstring %alist = %(floatNamedArraylist of longstring):new

Printing a collection

Formerly, the standard way to view the entire contents of a collection was to loop through the list items and display each one using a SOUL Print statement (or Audit or Trace). For a NamedArraylist, for example, you used a method for the item subscript name and a method for the item content:

%nal is namedArraylist of float ... %nal = new %nal('Chicago') = 22 %nal('New York') = -999 %nal('Los Angeles') = 3.1415926 %nal('Philadelphia') = 1099 for %i from 1 to %nal:count print %nal:nameByNumber(%i) and %nal:itemByNumber(%i) end for

This is the result:

Chicago 22 Los Angeles 3.1415926 New York -999 Philadelphia 1099

Now, the Print method for any collection does the work of the loop in the preceding example, and more. Supplied for debugging purposes, Print (or the essentially identical Audit or Trace method) would produce the following output using the example collection above (that is: %nal:print):

1: Chicago: 22 2: Los Angeles: 3.1415926 3: New York: -999 4: Philadelphia: 1099

Notice that Print outputs all the collection items (or, optionally, a range of items), and it also includes:

  • The ordinal, or position, number for each item
  • A separator string after the item position number and also after the item name (if a named collection)

Print also has optional parameters that let you specify:

  • The lengths for the item name and number
  • A label string to precede each output line
  • The number of items to display

Note:

The Print method applies a ToString method (by default) to each item value (and always to each item name), to produce its result. Applying Print to a collection whose item types are not system classes will work only if at least one of the following is true:

  • The user class contains a ToString method.
  • The Print method includes an appropriate "method parameter," as described below.

General syntax of Print (Audit or Trace) for a collection

%coll:Print (method, numWidth, nameWidth, - separator, start, maxItems, label)

All parameters are optional and all except method have required names (which match the names used in the syntax above). The parameters are described briefly below and in greater detail in the individual method descriptions for the appropriate collection type.

method The method applied to collection items to produce the printed output.

The method must take no parameters and produce an intrinsic (Float, String, Fixed, Unicode) value. It may be a system or user-written method, a class variable or property, a

local method, or a method variable. The default is the ToString method.
numWidth The number of bytes for the item number in the output. If 0, the default, the item number is not printed.
nameWidth The number of bytes for the item name (ignored if an Arraylist). If -1, the default, the entire name is fit exactly. If 0, the item name is not printed.
separator A string that follows the item number and that repeats after the item name. The default is a colon. A blank follows each instance of separator.
start The number of the collection item from which to start the output display. By default, the display begins from item one.
maxItems The maximum number of collection items to print. By default, all items are displayed.
label A string, null by default, marking the beginning of each item's line of output.

Examples using width or local method arguments

For the NamedArraylist in the first example in "Printing a collection", but issuing %nal:print(numWidth=3, nameWidth=14), this is the result:

1: Chicago  : 22 2: Los Angeles  : 3.1415926 3: New York  : -999 4: Philadelphia  : 1099

If you issue %nal:print(numWidth=3, nameWidth=7), the result is:

1: Chicago: 22 2: Los Ang: 3.1415926 3: New Yor: -999 4: Philade: 1099

You can define a local function to display your output:

local function (string):quote is longstring return '''' with %this with '''' end function

Now you issue:

%nal:print(quote)

And you get this result:

1: Chicago: '22' 2: Los Angeles: '3.1415926' 3: New York: '-999' 4: Philadelphia: '1099'

If you named your local method toString instead of quote, it would not need to be specified on the Print method. This is shown in the following example.

Examples using class or local ToString

In the following request, the method parameter used with the Arraylist Print method is a class variable:

b class python public variable surname is string len 30 variable givenName is string len 30 variable routine is string len 30 constructor new(%surname is string len 30, - %givenName is string len 30, - %routine is string len 30) end public constructor new(%surname is string len 30, - %givenName is string len 30, - %routine is string len 30) %this:surname = %surname %this:givenName = %givenName %this:routine = %routine end constructor end class %pythons is arraylist of object python %pythons = list(new('Cleese', 'John', 'Dead Parrot'), - new('Palin', 'Michael', 'Lumberjack'), - new('Idle', 'Eric', 'Nudge nudge'), - new('Chapman', 'Graham', 'Throat Wobbler Mangrove'), - new('Jones', 'Terry', 'Mouse Organ')) %pythons:print(surname) end

The request result is:

1: Cleese 2: Palin 3: Idle 4: Chapman 5: Jones

If you create the following ToString method in the class:

function toString is longstring return 'surname=' with %this:surname with ', ' with - 'givenName=' with %this:givenName with ', ' with - 'routine=' with %this:routine end function

And you issue this Print method call, which implicitly invokes your toString method:

%pythons:print(start=2, maxItems=3)

The result is:

2: surname=Palin, givenName=Michael, routine=Lumberjack 3: surname=Idle, givenName=Eric, routine=Nudge nudge 4: surname=Chapman, givenName=Graham, routine=Throat Wobbler Mangrove

Examples of subscript display format for named collections

When printing the name subscript for a NamedArraylist, the subscript is left as is. For a FloatNamedArraylist, the Float subscripts are displayed as strings. For a UnicodeNamedAraylist, the Unicode subscripts are translated from Unicode to EBCDIC, (character-entity-encoding any non-translatable characters), as in the following:

%transcendental is unicodeNamedArraylist of float ... %transcendental = new %transcendental('π':U) = 3.1415926 %transcendental:print

The Print result shows the encoded form of the Unicode item name:

1: π: 3.1415926

Note: If you specified %transcendental:print(namewidth=4), for example, the item name is truncated:

1: &#x: 3.1415926

Finding collection maxima and minima, and sorting

In addition to Count and Item methods, all collections also have Maximum and Minimum methods. These are methods that let you find the collection item that returns the highest or lowest value, respectively, for the attribute you want to evaluate. That attribute must be in the form of a function you specify that is defined to operate on the type of items in the collection and to return a simple string or numeric value.

Arraylist and NamedArraylist collections also have sorting methods that are similar to Maximum and Minimum but which sort a collection by the attribute function you specify.

What distinguishes the maximum/minimum and sorting methods is:

  • The use of a function you specify as a parameter to apply to the collection items
  • The variety of types of function parameter you can specify

The following series of examples introduces the maximum/minimum and sorting methods and displays many of the types of function parameter you can implement.

Finding a maximum using a system method parameter

In the following request, the Maximum method first applies the Stringlist Count function to each item in an Arraylist of Stringlist items. The Count function is specified as a parameter to the Maximum method. Maximum then returns the position in the Arraylist of the Stringlist that has the most items. The List function simplifies the construction of the lists.

b %I is float %list1 is object stringlist %list2 is object stringlist %list3 is object stringlist %list1 = List('the', 'quick', 'brown') %list2 = List('fox', 'jumped', 'over', 'the') %list3 = List('lazy', 'dog', 'yesterday', 'two', 'times') %arrayl is collection arraylist of object stringlist %arrayl = List(%list1, %list2, %list3) print 'The longest list is item ' %arrayl:maximum(count) end

The result is:

The longest list is item 3

Finding a minimum using a class Variable parameter

The function you apply to the collection items is not restricted to system class methods. More precisely, the function parameter is a method value: it can be the name of a system method, as above, or a user method or local method. Or the function parameter can be a Function variable, including a class Variable or Property. Any of these types of method value are valid as long as they a) operate on the item type in the collection, and b) return an intrinsic (number, string, unicode) value.

In the following example, the Minimum method applies a class variable to determine the Arraylist item with the minimum value. The List function simplifies the construction of the Arraylist.

b class python public variable firstname is string len 16 variable surname is string len 16 variable birthdate is float constructor newpy (%sname is string len 16, - %name is string len 16, %bd is float) end public constructor newpy (%sname is string len 16, - %name is string len 16, %bd is float) %this:firstname = %name %this:surname = %sname %this:birthdate = %bd end constructor end class %lp is arraylist of object python %lp = list(newpy('Gilliam', 'Terry', '19401122'), - newpy('Cleese', 'John', '19391027'), - newpy('Idle', 'Eric', '19430329'), - newpy('Palin', 'Michael', '19430505'), - newpy('Chapman', 'Graham', '19410108'), - newpy('Jones', 'Terry', '19420201') ) print 'Item ' %lp:minimum(birthdate) ' has eldest Python' end

The result is:

Item 2 has eldest Python

Sorting an Arraylist using one sort criterion

The Arraylist of python objects above can be readily sorted by birthdate as well. The Arraylist Sort and SortNew methods take as input the sorting criteria, a combination of a sorting order direction (Ascending or Descending) and its sort key parameter (a function just like that in Maximum/Minimum which operates on the collection items). This direction and sort key combination is also known as a SortOrder.

For example, to sort the Arraylist of python objects in ascending order by birthdate, you can use the Sort subroutine:

call %lp:sort(ascending(birthdate))

Or you can use the SortNew function:

%lp = %lp:sortnew(ascending(birthdate))

If you provide the python class with a method to print a python object, then loop through the sorted Arraylist returned by SortNew:

class python ... function myprint is longstring return %this:firstname with ' ' with - %this:surname with ' ' with - '(born: ' with %this:birthdate with ')' end function end class ... %i is float %lp = %lp:sortnew(ascending(birthdate)) for %i from 1 to %lp:count print %lp(%i):myprint end for

This is the sorted result:

John Cleese (born: 19391027) Terry Gilliam (born: 19401122) Graham Chapman (born: 19410108) Terry Jones (born: 19420201) Eric Idle (born: 19430329) Michael Palin (born: 19430505)

Note: Of the collection classes, only the Arraylist class contains a Sort subroutine; all the collection classes have a SortNew function.

Finding minima using a method variable parameter

The function-like method value parameter in Maximum, Minimum, and sorting may be a method variable. %meth in the following simple example is assigned in turn to two local functions:

b %meth is function (string):func is longstring Local function (string):leftmost is longstring return %this:left(1) end function Local function (string):rightmost is longstring return %this:right(1) end function %l is arraylist of string len 30 %l = list('Hickory', 'Dickory', 'Doc') %meth = rightmost print %l:minimum(%meth) %meth = leftmost print %l:minimum(%meth) end

This request prints the number of the item that has the rightmost character that is alphabetically the earliest, then the number of the item that has the leftmost character that is the earliest (and closer to the beginning of the list):

3 2

Note: Specifying the local functions themselves as the Minimum method parameter in the preceding example would also produce the same result. For example, the first print %l:minimum(%meth) call in the example is equivalent to print %l:minimum(rightmost).

However, explicitly specifying right(1) (the method and argument that local function rightmost invokes) as the Minimum parameter does not work. Specifying a method that itself requires a parameter as the Minimum (or Maximum or sort) parameter is a syntax violation and compilation error. The parameter for Minimum is a method value, not a SOUL expression. You can use a local function as the Minimum parameter, as in the example above, to apply a method that requires an argument.

Using the This function as the Maximum parameter

Finding the Maximum, Minimum, and sorting are likely to be very common operations on collections of SOUL intrinsic values. In these cases, you want the function parameter for the maximum/minimum and sorting methods to be an identity function like in the following:

b %l is arraylist of float %l = list(9, 11, 4, -5, 17, 3, 4, 6) local function (float):thisVal is float return %this end function printText Item {%l:maximum(thisVal)} has the maximum value end

To simplify such requests, a special method value provides the identity function for intrinsic classes. The method value (named This) simply returns the value of the method object. It is valid only for intrinsic classes.

Using This, the previous request becomes:

b %l is arraylist of float %l = list(9, 11, 4, -5, 17, 3, 4, 6) printText Item {%l:maximum(this)} has the maximum value end

Since This is the default method value for the maximum/minimum/sorting function parameter, %l:maximum(this) above can be replaced by:

%l:maximum

And sort(descending(this)), for example, can be replaced by:

%l:sort(descending)

Sorting an Arraylist using two sort criteria

The following request sorts an Arraylist by two sort criteria. A SortOrder object is explicitly defined to contain the sort criteria. The List function simplifies the construction of the Arraylist.

b class polis public variable city is string len 16 variable dept is string len 10 variable cost is float constructor newp (%city is string len 16, - %dp is string len 10, %cst is float) function myprint is longstring end public constructor newp (%city is string len 16, - %dp is string len 10, %cst is float) %this:city = %city %this:dept = %dp %this:cost = %cst end constructor function myprint is longstring return %this:city with ' (' with - %this:dept with '): ' with %this:cost end function end class %lp is arraylist of object polis %lp = list(newp('Gotham', 'DPW', 33125), - newp('Chatham', 'Fire', 21940), - newp('Wareham', 'Fire', 8444), - newp('Wareham', 'DPW', 5938), - newp('Chatham', 'DPW', 11651), - newp('Gotham', 'Fire', 41246)) %so is object sortorder for object polis %so = list(ascending(city), ascending(dept)) %lp:sort(%so) %i is float for %i from 1 to %lp:count print %lp(%i):myprint end for end

The result is:

Chatham (DPW): 11651 Chatham (Fire): 21940 Gotham (DPW): 33125 Gotham (Fire): 41246 Wareham (DPW): 5938 Wareham (Fire): 8444

Searching a collection

A variety of methods are common to all the collection classes for the purpose of searching a collection for the item(s) that satisfy one or more specified conditions.

The searching methods

The searching methods (all functions, listed below) have the same, or nearly the same syntax. They take two parameters:

  • An object that specifies the search conditions (a SelectionCriterion object).
  • A parameter (Start) that specifies where in the collection to begin the search. One method, SubsetNew, does not accept this parameter.

The searching methods are:

FindNextItem
Searching "forward" in the collection, finds the next item that matches a criterion, and returns that item.
FindPreviousItem
Searching "backward" in the collection, finds the next item that matches a criterion, and returns that item.
FindNextItemNumber
Searching "forward," finds the next item that matches a criterion, and returns that item number.
FindPreviousItemNumber
Searching "backward," finds the next item that matches a criterion, and returns that item number.
SubsetNew
Returns a new collection that contains all the items in the input collection that match the criterion.

The FindNextItem and FindPreviousItem methods also throw an ItemNotFound exception if no item matches the SelectionCriterion.

SelectionCriterion objects

A SelectionCriterion object, which might consist of multiple components, describes a single selection criterion. For example, the Ge method in that class uses two parameters to form a ("greater than or equal to") comparison criterion to apply to the collection items. So, for SelectionCriterion object %sel, which selects items whose absolute value is less than or equal to 1000, you might have:

%sel = ge(absolute, 1000)

A simple search, starting from the eighth item in the %payoff Arraylist, might be:

%item = %payoff:findNextItem(%sel, start=7)

The parameters of the SelectionCriterion Ge method above provide the operands for the comparison operator Ge. In this case the intrinsic Float Absolute function is applied to an item value. In general, this must be a function that operates on the type of the items in the collection, and it may be a local method or method variable or a class member (variable, property).

The value that results from applying the Absolute method above is compared to the second Ge parameter, 1000. This 1000 may be any SOUL intrinsic expression, such as a string or numeric literal.

In the fragment that follows, the function in the SelectionCriterion is a local method, and the searching method, FindPreviousItemNumber, searches backward starting with the tenth item in the collection to find the item number of the first item that satisfies the criterion:

%flt is arraylist of float %sel is object selectionCriterion for float local function (float):myMod is float return %this:mod(7) end function %sel = LT(myMod, 1) %num = %flt:findPreviousItemNumber(%sel, start=11)

The local method myMod above, which calls the Mod method, is necessary in this case because the SelectionCriterion function parameter may not itself specify a parameter. The function parameter is a method value, not a SOUL expression.

The preceding example also shows a SelectionCriterion object declaration, which must suit the item type to which the criterion will be applied, as described in Declaring a SelectionCriterion object variable.

In the following example, the function parameter is the very useful identity function, This, which returns the value of the item to which it is applied. The searching method SubsetNew returns a collection of all the items in the collection that satisfy either of the criteria (< 0, > 999) that comprise the Or criterion:

%sel = OR(LT(this, 0), GT(this, 999)) %arraylist = %flt:findPreviousItemNumber(%sel, start=11)

Using the searching methods

  • The main benefit of these searching methods is is the ease of coding provided by their simplicity and flexibility. However, the Find and Subset operations on collections of objects will necessarily be considerably more expensive than the comparable operations on Stringlists.

    For example, a level of indirection between object references and objects makes the processing much more complicated than that for Stringlists. However, because the cost of locates or subsets is likely to be a small fraction of the cost of most applications, switching to objects for these applications offers the benefits of cleaner code without a major expense.

  • The FindNextItem and FindPreviousItem methods throw an ItemNotFound exception if no item matches the SelectionCriterion, but the FindNextItemNumber and FindPreviousItemNumber methods do not throw an exception in that case. The following are the suggested guidelines for using these methods:
    • For simply checking if an item in a collection matches a SelectionCriterion, use FindNextItemNumber or FindPreviousItemNumber.
    • For looping over a collection, use FindNextItemNumber or FindPreviousItemNumber with an If test.
    • For extracting a single item that you are very sure must be in the collection, use FindNextItem or FindPreviousItem. If you are wrong about the presence of the item, the exception is thrown and the request is cancelled.
    • For conditionally extracting a single item from a collection, use FindNextItem or FindPreviousItem with a Try/Catch clause.

    As a general footnote, a Try/Catch clause is actually more efficient than an If test. The Try does not produce any compiled code, and the Catch is only evaluated if there is an exception. Try/Catch therefore executes no code other than what you are trying, whereas If has to also to execute the conditional test.

See also