Collections
It is common in programming to have groups of things that all have some common characteristic and are used fairly interchangeably. For example, one might have a number of items in an order, or a parent might have a number of children. Traditionally, this type of processing is dealt with using arrays. Traditional User Language arrays are enhanced by SOUL in Model 204 V7.5 to include support for arrays of structures and objects.
Arrays, however, have some limitations:
- They are static. That is, array instances and their size are set at compile-time.
- There are relatively few things you can do with an array. Typically, you are limited to setting and retrieving the value of an array item.
Since both of these problems with arrays do not apply to objects, another way to view the problem with arrays is that they are not objects, As a remedy, SOUL has a special kind of object that is very much like an array but has all the advantages of an object. These objects are called collections.
Many other object-oriented languages also have a concept of collections or collection objects but with a slight difference: object-oriented languages that have collection objects usually implement them as collections of generic objects, that is objects with no compile-time class (or more accurately, a very generic class). While this provides much flexibility, it also defeats one of the advantages of traditional arrays — compile-time knowledge of the array element datatypes — so it sacrifices compile-time error checking and more efficient compiled code.
The SOUL implementation of typed collections provides the best of both worlds: collection objects along with compile-time declaration of the type of the collection elements.
Collection object declaration syntax
{variable} [Is] [Collection] - collectionType Of [collectionType Of] type - [ Global [(globalName)] ]
variable | The name of the object variable that refers to a collection of the indicated type. If outside a class declaration block and structure, the variable must begin with a percent sign. If inside a structure declaration, the variable must not begin with a percent sign. If inside a class declaration block, the variable cannot start with a percent sign and must be preceded by the word Variable. |
---|---|
collectionType | The name of one of the system collection classes (Arraylist, NamedArraylist, FloatNamedArraylist, UnicodeNamedArraylist). Since collections are always system classes, these names may be preceded with System:. |
type | Any basic SOUL datatype, including String (with a length and optional DP), Fixed (with an optional DP), Float, Longstring, Unicode, and Object (followed by the object class).
Arrays and structures are not supported as collection items, but collections can themselves be collection items. If type is a collection, you specify both the collection and item datatypes. For example: |
globalname | The optional global name if the collection variable is not a class or structure variable. If Global is specified without globalName, the name of the variable without a percent sign is used as the global name. |
Collection object declarations do not contain parameters for size or for number of items. The number of items in a collection is completely dynamic, so no arbitrary limit needs to be set on their size (although SOUL will not support collections with more than 2**31, or about 2 billion, items).
Collections have one VTBL/STBL slot to hold a single collection element, and all other collection items are kept in CCATEMP.
Note: For a collection of objects, it is the object references that are kept in CCATEMP, not the actual objects. So even though only one object reference in the collection will be in VTBL, many of the objects referenced in the collection may be in VTBL/STBL. The collection item in VTBL/STBL is the last referenced item, and an immediately subsequent reference to that same item will use the item directly from VTBL/STBL rather than loading it from CCATEMP. For this reason, consecutive references to the same collection item are quite efficient.
Here are sample collection object declarations:
%scores is collection arraylist of float %costs is arraylist of fixed dp 2 %basket is floatNamedArraylist of object order %staff is collection unicodeNamedArraylist of string len 64 %value is namedArraylist of namedArraylist of longstring
Coding considerations for collections
Although collections vary by collection type, they are more alike than they are different. This remainder of this article contains topics that address features and methods common to all the collection types. The characteristics and methods of each individual collection type are described in separate articles; links to those groups of articles are contained in the "See also" section below.
Operating on collection items
All collections have at least the two standard properties described below.
Count | Number of items in collection. |
---|---|
Item | A specific item in the collection.
The Item property always takes at least one parameter indicating which item is being referenced, though the type of the parameter can vary with the collection class. Some collection classes allow references to items that have not been added to the collection with some other method; other collection classes do not, and cause a request cancellation on such a reference. The Item property can be both set and retrieved in all collections. |
The following example illustrates a simple use of the Count and Item properties:
%scores is collection arraylist of float ... %scores = new ... for %i from 1 to %scores:count %scores:item(%i) = %scores:item(%i) + 10 end for
The Item property name may be left off:
for %i from 1 to %scores:count %scores(%i) = %scores(%i) + 10 end for
If the %scores
Arraylist is just one of the members to be updated in the %allScores
collection of Arraylists, you might have consecutive omissions of the Item name (%allScores(%i)(%j)
):
for %i from 1 to %allScores:count for %j from 1 to %allScores(%i):count %allScores(%i)(%j) = %allScores(%i)(%j) + 10 end for end for
Explicitly specifying the method name Item, is not required because:
- It is very convenient to omit it, since the Item property is so heavily used.
- The Item property always has at least one parameter, so a collection name followed by a parenthesis can be taken to clearly imply the Item property.
- It facilitates conversion of existing arrays to collections.
Collections also have methods that depend on the collection types. The methods associated with the SOUL collection types (Arraylist, NamedArraylist, FloatNamedArraylist, UnicodeNamedArraylist) are described in subsequent sections.
In the descriptions of the methods, the term item means a value or variable of the same type as specified on the collection declaration, or a value or variable convertible to that type. For example, if a collection is declared as:
%stooges is arraylist of object stooge
An Add method applied to %stooges
can only add an object of class stooge
:
%moe is object stooge ... %stooges:add(%moe)
On the other hand, if a collection is declared as
%value is Arraylist of string len 16
you can add anything that can be converted to a string:
%long is longstring ... %value:add('A string') %value:add(22) %value:add(%long)
Note: In the example above, as is usual with longstrings, a truncation of the longstring on addition would cause a request cancellation error.
Comparing and assigning new collection variables
Collection variables, being a special variety of object variables, can be assigned to each other and compared, just like other object variables:
%alist is arraylist of longstring %blist is arraylist of longstring ... %blist = %alist ... if %alist eq %blist then
For assignment or comparison to be allowed, however, both the
collection type (Arraylist, for example) and the collection
item datatype must be identical.
As just two examples, you cannot assign the contents of an Arraylist
variable to a NamedArraylist variable, and you cannot assign the contents
of an arraylist of longstring
to an arraylist of float
.
Note:
When assigning to a collection variable and using the syntax of the New function that explicitly indicates the class for collections, both the collection and item datatypes must be specified just as on the collection variable declaration:
%alist is collection floatNamedArraylist of longstring %alist = %(floatNamedArraylist of longstring):new
Printing a collection
Formerly, the standard way to view the entire contents of a collection was to loop through the list items and display each one using a SOUL Print statement (or Audit or Trace). For a NamedArraylist, for example, you used a method for the item subscript name and a method for the item content:
%nal is namedArraylist of float ... %nal = new %nal('Chicago') = 22 %nal('New York') = -999 %nal('Los Angeles') = 3.1415926 %nal('Philadelphia') = 1099 for %i from 1 to %nal:count print %nal:nameByNumber(%i) and %nal:itemByNumber(%i) end for
This is the result:
Chicago 22 Los Angeles 3.1415926 New York -999 Philadelphia 1099
Now, the Print method for any collection
does the work of the loop in the preceding example, and more.
Supplied for debugging purposes, Print (or the essentially
identical Audit or Trace method) would produce the following output
using the example collection above (that is: %nal:print
):
1: Chicago: 22 2: Los Angeles: 3.1415926 3: New York: -999 4: Philadelphia: 1099
Notice that Print outputs all the collection items (or, optionally, a range of items), and it also includes:
- The ordinal, or position, number for each item
- A separator string after the item position number and also after the item name (if a named collection)
Print also has optional parameters that let you specify:
- The lengths for the item name and number
- A label string to precede each output line
- The number of items to display
Note:
The Print method applies a ToString method (by default) to each item value (and always to each item name), to produce its result. Applying Print to a collection whose item types are not system classes will work only if at least one of the following is true:
- The user class contains a ToString method.
- The Print method includes an appropriate "method parameter," as described below.
General syntax of Print (Audit or Trace) for a collection
%coll:Print (method, numWidth, nameWidth, - separator, start, maxItems, label)
All parameters are optional and all except method have required names (which match the names used in the syntax above). The parameters are described briefly below and in greater detail in the individual method descriptions for the appropriate collection type.
method | The method applied to collection items to produce the printed output.
The method must take no parameters and produce an intrinsic (Float, String, Fixed, Unicode) value. It may be a system or user-written method, a class variable or property, a local method, or a method variable. The default is the ToString method. |
---|---|
numWidth | The number of bytes for the item number in the output. If 0, the default, the item number is not printed. |
nameWidth | The number of bytes for the item name (ignored if an Arraylist). If -1, the default, the entire name is fit exactly. If 0, the item name is not printed. |
separator | A string that follows the item number and that repeats after the item name. The default is a colon. A blank follows each instance of separator. |
start | The number of the collection item from which to start the output display. By default, the display begins from item one. |
maxItems | The maximum number of collection items to print. By default, all items are displayed. |
label | A string, null by default, marking the beginning of each item's line of output. |
Examples using width or local method arguments
For the NamedArraylist in the first example in "Printing a
collection", but issuing %nal:print(numWidth=3, nameWidth=14)
,
this is the result:
1: Chicago : 22 2: Los Angeles : 3.1415926 3: New York : -999 4: Philadelphia : 1099
If you issue %nal:print(numWidth=3, nameWidth=7)
,
the result is:
1: Chicago: 22 2: Los Ang: 3.1415926 3: New Yor: -999 4: Philade: 1099
You can define a local function to display your output:
local function (string):quote is longstring return '''' with %this with '''' end function
Now you issue:
%nal:print(quote)
And you get this result:
1: Chicago: '22' 2: Los Angeles: '3.1415926' 3: New York: '-999' 4: Philadelphia: '1099'
If you named your local method toString
instead of quote
,
it would not need to be specified on the Print method.
This is shown in the following example.
Examples using class or local ToString
In the following request, the method parameter used with the Arraylist Print method is a class variable:
b class python public variable surname is string len 30 variable givenName is string len 30 variable routine is string len 30 constructor new(%surname is string len 30, - %givenName is string len 30, - %routine is string len 30) end public constructor new(%surname is string len 30, - %givenName is string len 30, - %routine is string len 30) %this:surname = %surname %this:givenName = %givenName %this:routine = %routine end constructor end class %pythons is arraylist of object python %pythons = list(new('Cleese', 'John', 'Dead Parrot'), - new('Palin', 'Michael', 'Lumberjack'), - new('Idle', 'Eric', 'Nudge nudge'), - new('Chapman', 'Graham', 'Throat Wobbler Mangrove'), - new('Jones', 'Terry', 'Mouse Organ')) %pythons:print(surname) end
The request result is:
1: Cleese 2: Palin 3: Idle 4: Chapman 5: Jones
If you create the following ToString
method in the class:
function toString is longstring return 'surname=' with %this:surname with ', ' with - 'givenName=' with %this:givenName with ', ' with - 'routine=' with %this:routine end function
And you issue this Print method call, which implicitly invokes
your toString
method:
%pythons:print(start=2, maxItems=3)
The result is:
2: surname=Palin, givenName=Michael, routine=Lumberjack 3: surname=Idle, givenName=Eric, routine=Nudge nudge 4: surname=Chapman, givenName=Graham, routine=Throat Wobbler Mangrove
Examples of subscript display format for named collections
When printing the name subscript for a NamedArraylist, the subscript is left as is. For a FloatNamedArraylist, the Float subscripts are displayed as strings. For a UnicodeNamedAraylist, the Unicode subscripts are translated from Unicode to EBCDIC, (character-entity-encoding any non-translatable characters), as in the following:
%transcendental is unicodeNamedArraylist of float ... %transcendental = new %transcendental('π':U) = 3.1415926 %transcendental:print
The Print result shows the encoded form of the Unicode item name:
1: π: 3.1415926
Note: If you specified
%transcendental:print(namewidth=4)
, for example, the item name is truncated:1: &#x: 3.1415926
Finding collection maxima and minima, and sorting
In addition to Count and Item methods, all collections also have Maximum and Minimum methods. These are methods that let you find the collection item that returns the highest or lowest value, respectively, for the attribute you want to evaluate. That attribute must be in the form of a function you specify that is defined to operate on the type of items in the collection and to return a simple string or numeric value.
Arraylist and NamedArraylist collections also have sorting methods that are similar to Maximum and Minimum but which sort a collection by the attribute function you specify.
What distinguishes the maximum/minimum and sorting methods is:
- The use of a function you specify as a parameter to apply to the collection items
- The variety of types of function parameter you can specify
The following series of examples introduces the maximum/minimum and sorting methods and displays many of the types of function parameter you can implement.
Finding a maximum using a system method parameter
In the following request, the Maximum method first applies the Stringlist Count function to each item in an Arraylist of Stringlist items. The Count function is specified as a parameter to the Maximum method. Maximum then returns the position in the Arraylist of the Stringlist that has the most items. The List function simplifies the construction of the lists.
b %I is float %list1 is object stringlist %list2 is object stringlist %list3 is object stringlist %list1 = List('the', 'quick', 'brown') %list2 = List('fox', 'jumped', 'over', 'the') %list3 = List('lazy', 'dog', 'yesterday', 'two', 'times') %arrayl is collection arraylist of object stringlist %arrayl = List(%list1, %list2, %list3) print 'The longest list is item ' %arrayl:maximum(count) end
The result is:
The longest list is item 3
Finding a minimum using a class Variable parameter
The function you apply to the collection items is not restricted to system class methods. More precisely, the function parameter is a method value: it can be the name of a system method, as above, or a user method or local method. Or the function parameter can be a Function variable, including a class Variable or Property. Any of these types of method value are valid as long as they a) operate on the item type in the collection, and b) return an intrinsic (number, string, unicode) value.
In the following example, the Minimum method applies a class variable to determine the Arraylist item with the minimum value. The List function simplifies the construction of the Arraylist.
b class python public variable firstname is string len 16 variable surname is string len 16 variable birthdate is float constructor newpy (%sname is string len 16, - %name is string len 16, %bd is float) end public constructor newpy (%sname is string len 16, - %name is string len 16, %bd is float) %this:firstname = %name %this:surname = %sname %this:birthdate = %bd end constructor end class %lp is arraylist of object python %lp = list(newpy('Gilliam', 'Terry', '19401122'), - newpy('Cleese', 'John', '19391027'), - newpy('Idle', 'Eric', '19430329'), - newpy('Palin', 'Michael', '19430505'), - newpy('Chapman', 'Graham', '19410108'), - newpy('Jones', 'Terry', '19420201') ) print 'Item ' %lp:minimum(birthdate) ' has eldest Python' end
The result is:
Item 2 has eldest Python
Sorting an Arraylist using one sort criterion
The Arraylist of python
objects above can be readily sorted by birthdate
as well.
The Arraylist Sort and SortNew methods take as input
the sorting criteria, a combination of a
sorting order direction (Ascending or Descending) and its sort key parameter
(a function just like that in Maximum/Minimum which operates on the collection items).
This direction and sort key combination is also known as a SortOrder.
For example, to sort the Arraylist of python
objects
in ascending order by birthdate
, you can use the Sort subroutine:
call %lp:sort(ascending(birthdate))
Or you can use the SortNew function:
%lp = %lp:sortnew(ascending(birthdate))
If you provide the python
class with a method to print a python
object,
then loop through the sorted Arraylist returned by SortNew:
class python ... function myprint is longstring return %this:firstname with ' ' with - %this:surname with ' ' with - '(born: ' with %this:birthdate with ')' end function end class ... %i is float %lp = %lp:sortnew(ascending(birthdate)) for %i from 1 to %lp:count print %lp(%i):myprint end for
This is the sorted result:
John Cleese (born: 19391027) Terry Gilliam (born: 19401122) Graham Chapman (born: 19410108) Terry Jones (born: 19420201) Eric Idle (born: 19430329) Michael Palin (born: 19430505)
Note: Of the collection classes, only the Arraylist class contains a Sort subroutine; all the collection classes have a SortNew function.
Finding minima using a method variable parameter
The function-like method value parameter in Maximum, Minimum, and sorting
may be a method variable.
%meth
in the following simple example is assigned in turn to
two local functions:
b %meth is function (string):func is longstring Local function (string):leftmost is longstring return %this:left(1) end function Local function (string):rightmost is longstring return %this:right(1) end function %l is arraylist of string len 30 %l = list('Hickory', 'Dickory', 'Doc') %meth = rightmost print %l:minimum(%meth) %meth = leftmost print %l:minimum(%meth) end
This request prints the number of the item that has the rightmost character that is alphabetically the earliest, then the number of the item that has the leftmost character that is the earliest (and closer to the beginning of the list):
3 2
Note: Specifying the local functions themselves as the Minimum method parameter in the preceding example would also produce the same result. For example, the first
print %l:minimum(%meth)
call in the example is equivalent toprint %l:minimum(rightmost)
.However, explicitly specifying
right(1)
(the method and argument that local functionrightmost
invokes) as the Minimum parameter does not work. Specifying a method that itself requires a parameter as the Minimum (or Maximum or sort) parameter is a syntax violation and compilation error. The parameter for Minimum is a method value, not a SOUL expression. You can use a local function as the Minimum parameter, as in the example above, to apply a method that requires an argument.
Using the This function as the Maximum parameter
Finding the Maximum, Minimum, and sorting are likely to be very common operations on collections of SOUL intrinsic values. In these cases, you want the function parameter for the maximum/minimum and sorting methods to be an identity function like in the following:
b %l is arraylist of float %l = list(9, 11, 4, -5, 17, 3, 4, 6) local function (float):thisVal is float return %this end function printText Item {%l:maximum(thisVal)} has the maximum value end
To simplify such requests, a special method value provides the identity function for intrinsic classes. The method value (named This) simply returns the value of the method object. It is valid only for intrinsic classes.
Using This, the previous request becomes:
b %l is arraylist of float %l = list(9, 11, 4, -5, 17, 3, 4, 6) printText Item {%l:maximum(this)} has the maximum value end
Since This is the default method value
for the maximum/minimum/sorting
function parameter, %l:maximum(this)
above can be replaced by:
%l:maximum
And sort(descending(this))
, for example, can be replaced by:
%l:sort(descending)
Sorting an Arraylist using two sort criteria
The following request sorts an Arraylist by two sort criteria. A SortOrder object is explicitly defined to contain the sort criteria. The List function simplifies the construction of the Arraylist.
b class polis public variable city is string len 16 variable dept is string len 10 variable cost is float constructor newp (%city is string len 16, - %dp is string len 10, %cst is float) function myprint is longstring end public constructor newp (%city is string len 16, - %dp is string len 10, %cst is float) %this:city = %city %this:dept = %dp %this:cost = %cst end constructor function myprint is longstring return %this:city with ' (' with - %this:dept with '): ' with %this:cost end function end class %lp is arraylist of object polis %lp = list(newp('Gotham', 'DPW', 33125), - newp('Chatham', 'Fire', 21940), - newp('Wareham', 'Fire', 8444), - newp('Wareham', 'DPW', 5938), - newp('Chatham', 'DPW', 11651), - newp('Gotham', 'Fire', 41246)) %so is object sortorder for object polis %so = list(ascending(city), ascending(dept)) %lp:sort(%so) %i is float for %i from 1 to %lp:count print %lp(%i):myprint end for end
The result is:
Chatham (DPW): 11651 Chatham (Fire): 21940 Gotham (DPW): 33125 Gotham (Fire): 41246 Wareham (DPW): 5938 Wareham (Fire): 8444
Searching a collection
A variety of methods are common to all the collection classes for the purpose of searching a collection for the item(s) that satisfy one or more specified conditions.
The searching methods
The searching methods (all functions, listed below) have the same, or nearly the same syntax. They take two parameters:
- An object that specifies the search conditions (a SelectionCriterion object).
- A parameter (Start) that specifies where in the collection to begin the search. One method, SubsetNew, does not accept this parameter.
The searching methods are:
- FindNextItem
- Searching "forward" in the collection, finds the next item that matches a criterion, and returns that item.
- FindPreviousItem
- Searching "backward" in the collection, finds the next item that matches a criterion, and returns that item.
- FindNextItemNumber
- Searching "forward," finds the next item that matches a criterion, and returns that item number.
- FindPreviousItemNumber
- Searching "backward," finds the next item that matches a criterion, and returns that item number.
- SubsetNew
- Returns a new collection that contains all the items in the input collection that match the criterion.
The FindNextItem and FindPreviousItem methods also throw an ItemNotFound exception if no item matches the SelectionCriterion.
SelectionCriterion objects
A SelectionCriterion object,
which might consist of multiple components, describes a single selection criterion.
For example, the Ge method in that class uses two parameters to form a
("greater than or equal to") comparison criterion to apply to the collection items.
So, for SelectionCriterion object %sel
,
which selects items whose absolute value is less than or equal to 1000, you might have:
%sel = ge(absolute, 1000)
A simple search, starting from the eighth item in the %payoff Arraylist, might be:
%item = %payoff:findNextItem(%sel, start=7)
The parameters of the SelectionCriterion Ge method above provide the operands for the comparison operator Ge. In this case the intrinsic Float Absolute function is applied to an item value. In general, this must be a function that operates on the type of the items in the collection, and it may be a local method or method variable or a class member (variable, property).
The value that results from applying the Absolute method above is compared to the second Ge parameter, 1000. This 1000 may be any SOUL intrinsic expression, such as a string or numeric literal.
In the fragment that follows, the function in the SelectionCriterion is a local method, and the searching method, FindPreviousItemNumber, searches backward starting with the tenth item in the collection to find the item number of the first item that satisfies the criterion:
%flt is arraylist of float %sel is object selectionCriterion for float local function (float):myMod is float return %this:mod(7) end function %sel = LT(myMod, 1) %num = %flt:findPreviousItemNumber(%sel, start=11)
The local method myMod
above, which calls the Mod method, is necessary in this case because the SelectionCriterion
function parameter may not itself specify a parameter.
The function parameter is a method value, not a SOUL expression.
The preceding example also shows a SelectionCriterion object declaration, which must suit the item type to which the criterion will be applied, as described in Declaring a SelectionCriterion object variable.
In the following example, the function parameter is the very useful identity function, This, which returns the value of the item to which it is applied. The searching method SubsetNew returns a collection of all the items in the collection that satisfy either of the criteria (< 0, > 999) that comprise the Or criterion:
%sel = OR(LT(this, 0), GT(this, 999)) %arraylist = %flt:findPreviousItemNumber(%sel, start=11)
Using the searching methods
- The main benefit of these searching methods is is the ease of coding provided by
their simplicity and flexibility.
However, the Find and Subset operations
on collections of objects will necessarily be considerably more expensive than the
comparable operations on Stringlists.
For example, a level of indirection between object references and objects makes the processing much more complicated than that for Stringlists. However, because the cost of locates or subsets is likely to be a small fraction of the cost of most applications, switching to objects for these applications offers the benefits of cleaner code without a major expense.
- The FindNextItem and FindPreviousItem methods throw an
ItemNotFound exception if no item matches the SelectionCriterion, but
the FindNextItemNumber and FindPreviousItemNumber methods do not
throw an exception in that case.
The following are the suggested guidelines for using these methods:
- For simply checking if an item in a collection matches a SelectionCriterion, use FindNextItemNumber or FindPreviousItemNumber.
- For looping over a collection, use FindNextItemNumber or FindPreviousItemNumber with an If test.
- For extracting a single item that you are very sure must be in the collection, use FindNextItem or FindPreviousItem. If you are wrong about the presence of the item, the exception is thrown and the request is cancelled.
- For conditionally extracting a single item from a collection, use FindNextItem or FindPreviousItem with a Try/Catch clause.
As a general footnote, a Try/Catch clause is actually more efficient than an If test. The Try does not produce any compiled code, and the Catch is only evaluated if there is an exception. Try/Catch therefore executes no code other than what you are trying, whereas If has to also to execute the conditional test.