Longstrings: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (minor cleanup)
Line 69: Line 69:
Like <var>Strings</var>, a <var>Longstring</var> variable can be used in <var class="product">SOUL</var> expressions, as operands or as input to $functions. <var>Longstring</var> variables can also be used as input to intrinsic methods (as can any other string or numeric datatype).
Like <var>Strings</var>, a <var>Longstring</var> variable can be used in <var class="product">SOUL</var> expressions, as operands or as input to $functions. <var>Longstring</var> variables can also be used as input to intrinsic methods (as can any other string or numeric datatype).


One important point to keep in mind is that <var class="product">Model 204's</var> expression processing behavior is not changed at all unless <var>Longstring</var> variables or $functions are used, and then only changed in the statements where they are actually used.  So the effect of any use of <var>Longstring</var> variables or $functions is limited to the statements that use them.
One important point to keep in mind is that <var class="product">Model 204</var>'s expression processing behavior is not changed at all unless <var>Longstring</var> variables or $functions are used, and then only changed in the statements where they are actually used.  So the effect of any use of <var>Longstring</var> variables or $functions is limited to the statements that use them.


===Concatenation - the With operator===
===Concatenation: the With operator===


<var class="product">SOUL</var> expressions can have embedded sub-expressions or simply expressions. For example, in
<var class="product">SOUL</var> expressions can have embedded sub-expressions or simply expressions. For example, in
<p class="code">%x = %a with %b with %c
<p class="code">%x = %a with %b with %c
</p>
</p>
the expression <code>%a with %b</code> is evaluated and an intermediate result is produced.  This intermediate result is then used as the first operand in a <var>With</var> operation <code>with %c</code>.  With no <var>Longstrings</var> involved, string expressions are silently truncated at 255 bytes, including when producing an intermediate result. So, in the above example, if <code>%a</code> and <code>%b</code> were each 200 bytes long, the intermediate result of <code>%a with %b</code> would be truncated at the 55th byte of <code>%b</code> and the <code>with %c</code> would simply drop <code>%c</code> since the intermediate result that was the first operand of the <code>with %c</code> would already be 255 bytes long. In this case, <code>%x</code> would end up containing all of <code>%a</code>, the first 55 bytes of <code>%b</code> and none of <code>%c</code>. Fortunately, the results would be the same even if the expression were written
the expression <code>%a with %b</code> is evaluated and an intermediate result is produced.  This intermediate result is then used as the first operand in a <var>With</var> operation <code>with %c</code>.  With no <var>Longstrings</var> involved, string expressions are silently truncated at 255 bytes, including when producing an intermediate result. So, in the above example, if <code>%a</code> and <code>%b</code> were each 200 bytes long, the intermediate result of <code>%a with %b</code> would be truncated at the 55th byte of <code>%b</code>, and the <code>with %c</code> would simply drop <code>%c</code>, since the intermediate result that was the first operand of the <code>with %c</code> would already be 255 bytes long. In this case, <code>%x</code> would end up containing all of <code>%a</code>, the first 55 bytes of <code>%b</code>, and none of <code>%c</code>. Fortunately, the results would be the same even if the expression were written as follows:
<p class="code">%x = %a with (%b with %c)
<p class="code">%x = %a with (%b with %c)
</p>
</p>
though it's worth working this out mentally to develop a good feel for how intermediate expression results are processed in <var class="product">SOUL</var>.
It is still worth working this out mentally to develop a good feel for how intermediate expression results are processed in <var class="product">SOUL</var>.
   
   
In any case, the <var>With</var> operation behaves differently in the presence of <var>Longstrings</var>. Specifically, if either operand of a <var>With</var> operation is a <var>Longstring</var>, the intermediate result of the operation is also a <var>Longstring</var>. If, in the above example, <code>%a</code> was a <var>Longstring</var> and <code>%b</code> and <code>%c</code> were regular <var>Strings</var>, the result of <code>%a with %b</code> would be a 400-byte <var>Longstring</var>. When this 400-byte intermediate result <var>Longstring</var> is then concatenated using the <var>With</var> operation on <code>%c</code>, the result will be a <var>Longstring</var> of length 400 plus the length of <code>%c</code>.  If the target of this expression, <code>%x</code>, was a regular <var>String</var>, this would cause a request-cancelling truncation error.
In any case, the <var>With</var> operation behaves differently in the presence of <var>Longstrings</var>. Specifically, if either operand of a <var>With</var> operation is a <var>Longstring</var>, the intermediate result of the operation is also a <var>Longstring</var>. If, in the above example, <code>%a</code> is a <var>Longstring</var> and <code>%b</code> and <code>%c</code> are regular <var>Strings</var>, the result of <code>%a with %b</code> is a 400-byte <var>Longstring</var>. When this 400-byte intermediate result <var>Longstring</var> is then concatenated using the <var>With</var> operation on <code>%c</code>, the result is a <var>Longstring</var> of length 400 plus the length of <code>%c</code>.  If the target of this expression, <code>%x</code>, is a regular <var>String</var>, this causes a request-cancelling truncation error.
   
   
In addition, if the target of a <var>With</var> operation is a <var>Longstring</var>, the <var>With</var> operation produces a <var>Longstring</var> result, even if none of the operands are themselves <var>Longstrings</var>. For example, if
In addition, if the target of a <var>With</var> operation is a <var>Longstring</var>, the <var>With</var> operation produces a <var>Longstring</var> result, even if none of the operands are themselves <var>Longstrings</var>. For example, if <code>%x</code> is a <var>Longstring</var>, and <code>%a</code> and <code>%b</code> are <code>String Len 255</code>, each with 200 bytes of data:
<code>%x</code> is a <var>Longstring</var>, and <code>%a</code> and <code>%b</code> are <code>String Len 255</code>, each with 200 bytes of data:
<p class="code">%x = %a with %b
<p class="code">%x = %a with %b
</p>
</p>
<code>%x</code> will be 400 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code>. If either of the operands of such a <var>With</var> clause is itself an expression, that expression is treated as if its target were also a <var>Longstring</var>. For example, if in
<code>%x</code> will be 400 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code>. If either of the operands of such a <var>With</var> clause is itself an expression, that expression is treated as if its target were also a <var>Longstring</var>. For example, if in
<p class="code">%x = %a with (%b with %c)
<p class="code">%x = %a with (%b with %c)
</p>
</p>
<code>%x</code> is a <var>Longstring</var>, and <code>%a</code>, <code>%b</code>, and <code>%c</code> are <code>String Len 255</code>, each with 200 bytes of data, <code>%x</code> will end up being 600 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code> with all of <code>%c</code>. This works the same way if the assignment is written as either of the following:
<code>%x</code> is a <var>Longstring</var>, and <code>%a</code>, <code>%b</code>, and <code>%c</code> are <code>String Len 255</code>, each with 200 bytes of data, <code>%x</code> will end up being 600 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code> with all of <code>%c</code>. This works the same way if the assignment is written as either of the following:
<p class="code">%x = (%a with %b) with %c
<p class="code">%x = (%a with %b) with %c
%x = %a with %b with %c
%x = %a with %b with %c
Line 102: Line 101:
<p class="code">%short = (%long with '123') with '456'
<p class="code">%short = (%long with '123') with '456'
</p>
</p>
The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a <var>Longstring</var>, albeit one with less than 255 bytes of data.  The cancellation can be avoided with the use of the <var>$str</var> function, as in the following:
The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a <var>Longstring</var>, albeit one with less than 255 bytes of data.  The cancellation can be avoided with the use of the <var>$Str</var> function, as in the following:
<p class="code">%short = $str((%long with '123') with '456')
<p class="code">%short = $str((%long with '123') with '456')
</p>
</p>
Though, again, this is simply carrying on the dubious <var class="product">SOUL</var> programming practice of truncation by assignment.
Though, again, this is simply carrying on the dubious <var class="product">SOUL</var> programming practice of truncation by assignment.
   
   
Note that the "upgrading"; of <var>With</var> operations to <var>Longstring</var> <var>With</var> operations is not induced by a <var>Longstring</var> variable or expression inside a $function call.  For example, <code>%long</code> is a <var>Longstring</var> with 30 bytes of data, and <code>%short</code> is <code>String Len 10</code>:
Note that the "upgrading" of <var>With</var> operations to <var>Longstring</var> <var>With</var> operations is not induced by a <var>Longstring</var> variable or expression inside a $function call.  For example, <code>%long</code> is a <var>Longstring</var> with 30 bytes of data, and <code>%short</code> is <code>String Len 10</code>:
<p class="code">%short = '*' with $substr(%long, 1, 20)
<p class="code">%short = '*' with $substr(%long, 1, 20)
</p>
</p>
<code>%short</code> ends up containing an asterisk followed by the first 9 bytes of <code>%long</code>. The assignment is made with silent truncation, because the result of a non-longstring-capable $function is always treated as a regular <var>String</var> for the purposes of assignment and <var>With</var> processing.
<code>%short</code> ends up containing an asterisk followed by the first 9 bytes of <code>%long</code>. The assignment is made with silent truncation, because the result of a non-longstring-capable $function is always treated as a regular <var>String</var> for the purposes of assignment and <var>With</var> processing.


===Numeric conversion===  
===Numeric conversion===  
In a context where a <var>Longstring</var> is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the <var>Longstring</var> variable is longer than 255 bytes, even if most or all of these bytes are leading zeros.  For example, <code>%long</code> is a <var>Longstring</var> with 300 zeros followed by a one:
In a context where a <var>Longstring</var> is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the <var>Longstring</var> variable is longer than 255 bytes, even if most or all of these bytes are leading zeros.  For example, <code>%long</code> is a <var>Longstring</var> with 300 zeros followed by a one:
<p class="code">%a = %long + 1
<p class="code">%a = %long + 1
Line 119: Line 117:
The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. <var>Longstring</var> data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as <var>String</var> data.
The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. <var>Longstring</var> data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as <var>String</var> data.
   
   
If the result of a numeric operation on a <var>Longstring</var> is then used in a <var>With</var> operation, the <var>With</var> operation is not upgraded to a <var>Longstring</var> <var>With</var> operation, because the intermediate result of the numeric operation is not a <var>Longstring</var> but a numeric, which is then automatically converted to a <var>String</var> intermediate result.  For example, <code>%long</code> is a <var>Longstring</var> containing <code>99</code> and <code>%short</code> is <code>String Len 2</code>:
If the result of a numeric operation on a <var>Longstring</var> is then used in a <var>With</var> operation, the <var>With</var> operation is not upgraded to a <var>Longstring</var> <var>With</var> operation, because the intermediate result of the numeric operation is not a <var>Longstring</var> but a numeric, which is then automatically converted to a <var>String</var> intermediate result.  For example, <code>%long</code> is a <var>Longstring</var> containing <code>99</code>, and <code>%short</code> is <code>String Len 2</code>:
<p class="code">%short = %long + 1
<p class="code">%short = %long + 1
</p>
</p>
The result is not a request cancellation; instead, a <code>M204.0552: VARIABLE TOO SMALL FOR RESULT</code> message is issued, and an asterisk ( * ) is assigned to <code>%short</code>. Similarly, with these definitions and values
The result is not a request cancellation; instead, a <code>M204.0552: VARIABLE TOO SMALL FOR RESULT</code> message is issued, and an asterisk ( * ) is assigned to <code>%short</code>. Similarly, with these definitions and values:
<p class="code">%short = (%long + 1) with '*'
<p class="code">%short = (%long + 1) with '*'
</p>
</p>
results in a <code>10</code> being assigned to <code>%short</code> with no warnings, exactly the behavior if <code>%long</code> were a <code>String Len 255</code>.
The result is a <code>10</code> being assigned to <code>%short</code> with no warnings, exactly the behavior if <code>%long</code> were a <code>String Len 255</code>.


<div id="longstrInvIndex"></var>
<div id="longstrInvIndex"></var>
Line 138: Line 136:


===Comparisons===
===Comparisons===
Comparison operations such as <var>Eq</var>, <var>Lt</var>, <var>Le</var>, <var>></var>, <var><</var>, etc. will perform <var>Longstring</var> comparisons if either of the operands is a <var>Longstring</var>, that is, comparison operations involving <var>Longstring</var> operands behave pretty much as expected.
Comparison operations such as <var>Eq</var>, <var>Lt</var>, <var>Le</var>, <var>></var>, <var><</var>, etc. will perform <var>Longstring</var> comparisons if either of the operands is a <var>Longstring</var>, that is, comparison operations involving <var>Longstring</var> operands behave pretty much as expected.


Line 206: Line 203:
   
   
==Changing Longstring truncation behavior==
==Changing Longstring truncation behavior==
While it is sometimes convenient that <var class="product">Model 204</var> silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect <var class="product">[[User Language]]</var> programs.  Because of this history and the higher chance of unintentional truncation from a <var>Longstring</var> source, the default behavior for <var>Longstrings</var> is that any truncation on assignment from a <var>Longstring</var>, <var>Longstring</var> $function, or <var>Longstring</var> <var>With</var> operation causes request cancellation.  This behavior should facilitate &ldquo;cleaner&rdquo; and more robust code - where truncation is intended, it is explicitly indicated (for example, with <var>[[$Lstr_Substr]]</var>, <var>[[$Lstr_Left]]</var>, or <var>[[$Str]]</var>).
While it is sometimes convenient that <var class="product">Model 204</var> silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect <var class="product">[[User Language]]</var> programs.  Because of this history and the higher chance of unintentional truncation from a <var>Longstring</var> source, the default behavior for <var>Longstrings</var> is that any truncation on assignment from a <var>Longstring</var>, <var>Longstring</var> $function, or <var>Longstring</var> <var>With</var> operation causes request cancellation.  This behavior should facilitate "cleaner" and more robust code &mdash; where truncation is intended, it is explicitly indicated (for example, with <var>[[$Lstr_Substr]]</var>, <var>[[$Lstr_Left]]</var>, or <var>[[$Str]]</var>).
   
   
Nevertheless, since this cancellation on truncation behavior is inconsistent with <var class="product">Model 204's</var> behavior for strings, it might be viewed as undesirable.  If you want to prevent request continuation on truncation of a <var>Longstring</var> source in an Online, you can <var>MSGCTL</var> the error message for <var>Longstring</var> truncation to <var>NOCAN</var>.
Nevertheless, since this cancellation on truncation behavior is inconsistent with <var class="product">Model 204's</var> behavior for strings, it might be viewed as undesirable.  If you want to prevent request continuation on truncation of a <var>Longstring</var> source in an Online, you can <var>MSGCTL</var> the error message for <var>Longstring</var> truncation to <var>NOCAN</var>.
Line 224: Line 221:
   
   
==Longstrings and the Print, Html, and Text statements==
==Longstrings and the Print, Html, and Text statements==
Using <var>Longstring</var> expressions in the <var>Print</var> statement works largely &ldquo;as expected&rdquo;: given the constraints of <var>LOBUFF</var>, <var>OUTCCC</var>, and <var>OUTMRL</var>, and other output target specific parameters - the values of <var>Longstrings</var> are simply displayed to the output target.  One minor exception to this is that the <var>To</var> clause on the <var>Print</var> statement is not supported for <var>Longstrings</var>.
Using <var>Longstring</var> expressions in the <var>Print</var> statement works largely "as expected": given the constraints of <var>LOBUFF</var>, <var>OUTCCC</var>, and <var>OUTMRL</var>, and other output target specific parameters &mdash; the values of <var>Longstrings</var> are simply displayed to the output target.  One minor exception to this is that the <var>To</var> clause on the <var>Print</var> statement is not supported for <var>Longstrings</var>.
   
   
It should also be kept in mind that the <var>With</var> keyword in <var>Print</var> statements is not the <var>With</var> concatenation operator, although the result is usually the same as if it were.  Specifically, the <var>With</var> keyword results in the part before the <var>With</var> being printed, followed by the part after.  This means that if two regular <var>String</var> variables, each with 255 bytes of data in them, are printed as follows:
It should also be kept in mind that the <var>With</var> keyword in <var>Print</var> statements is not the <var>With</var> concatenation operator, although the result is usually the same as if it were.  Specifically, the <var>With</var> keyword results in the part before the <var>With</var> being printed, followed by the part after.  This means that if two regular <var>String</var> variables, each with 255 bytes of data in them, are printed as follows:
Line 296: Line 293:
Of course, variables that need to hold more than 255 bytes of data must be declared as <var>Longstrings</var>, and any data beyond 255 bytes gets stored in CCATEMP.  This means manipulation of very long <var>Longstring</var> variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization.  In addition, very long <var>Longstring</var> values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in <var>Longstrings</var> in applications; quite the contrary.  <var>Longstrings</var> are designed for applications that require long values, and the performance of <var>Longstring</var> manipulation, even for very long values, will generally be pretty good.
Of course, variables that need to hold more than 255 bytes of data must be declared as <var>Longstrings</var>, and any data beyond 255 bytes gets stored in CCATEMP.  This means manipulation of very long <var>Longstring</var> variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization.  In addition, very long <var>Longstring</var> values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in <var>Longstrings</var> in applications; quite the contrary.  <var>Longstrings</var> are designed for applications that require long values, and the performance of <var>Longstring</var> manipulation, even for very long values, will generally be pretty good.
   
   
Nevertheless, it is a good idea to avoid unnecessary, very long, <var>Longstring</var> operations - unnecessary because the application does not require it, or because the operation has already been performed once.  Regarding the latter, if a very long <var>Longstring</var> operation occurs in a loop, it would be better to move the operation outside the loop if possible, or to only do it conditionally if it's really required and hasn't already been performed in a previous iteration of the loop.
Nevertheless, it is a good idea to avoid unnecessary, very long, <var>Longstring</var> operations &mdash; unnecessary because the application does not require it, or because the operation has already been performed once.  Regarding the latter, if a very long <var>Longstring</var> operation occurs in a loop, it would be better to move the operation outside the loop if possible, or to only do it conditionally if it's really required and hasn't already been performed in a previous iteration of the loop.
   
   
There is relatively little space overhead for the part of a <var>Longstring</var> that resides in CCATEMP - 6124 of the 6144 bytes on each CCATEMP page actually hold data.  So the first 255 bytes of a 60,000 byte long <var>Longstring</var> value are stored in STBL, and the remaining 60,000-255 bytes are stored on (60,000-255)/6124, or 10 CCATEMP pages. Intermediate results will also use some CCATEMP space, though this usage will typically be short-lived - the space being released as soon as the statement completes.  So, for example, if <code>%a</code> and <code>%b</code> are <var>Longstring</var> variables each with 90,000 bytes of data, and <code>%c</code> is a <var>Longstring</var> variable, the following statement will temporarily require an extra 120,000 bytes of space (255 of them in STBL) to hold the result of the <code>%a with %b</code> operation:
There is relatively little space overhead for the part of a <var>Longstring</var> that resides in <code>CCATEMP - 6124</code> of the 6144 bytes on each CCATEMP page actually hold data.  So the first 255 bytes of a 60,000 byte long <var>Longstring</var> value are stored in STBL, and the remaining <code>60,000-255</code> bytes are stored on <code>(60,000-255)/6124</code>, or 10 CCATEMP pages. Intermediate results will also use some CCATEMP space, though this usage will typically be short-lived &mdash; the space being released as soon as the statement completes.  So, for example, if <code>%a</code> and <code>%b</code> are <var>Longstring</var> variables each with 90,000 bytes of data, and <code>%c</code> is a <var>Longstring</var> variable, the following statement will temporarily require an extra 120,000 bytes of space (255 of them in STBL) to hold the result of the <code>%a with %b</code> operation:
<p class="code">%c = $lstr_substr(%a with %b, 60000, 60000)
<p class="code">%c = $lstr_substr(%a with %b, 60000, 60000)
</p>
</p>

Revision as of 20:28, 27 March 2015

As of Model 204 version 7.5, Longstrings appear as a native Model 204 datatype and are defined in the same way as other variable datatypes:

%name is longstring

Longstring variables are largely interchangeable with String variables, with the exception that a Longstring can have a length up to 2**31-1 bytes, while String variables have a maximum length of 255 bytes. The Variables Are statement and the VTYPE parameter do not allow Longstring to be set as a default type, so all Longstring variables must be explicitly declared as such. Longstring variables can be defined as Common and as subroutine parameters, but there is currently no support for Static Longstring variables. Longstrings may be specified in an Initial clause.

Like other %variables, a Longstring cannot be declared as Global on its declaration. However, a Longstring %variable can be dynamically bound to a global Longstring with the $Lstr_global function, and it can be dynamically bound to a session global Longstring with the $Lstr_session function.

The value of a global or session Longstring can also be retrieved with $Lstr_global_get or $Lstr_session_get, and it can be updated with $Lstr_global_set or $Lstr_session_set.

Longstrings can also be declared as arrays:

%heaps is longstring array(10)

The Longstring datatype is not supported inside images. However, image items with length greater than 255 are now supported:

image foo bar is string len 300 end image

While such image items can't have arbitrary lengths up to 2**31-1 like other Longstring variables, they exhibit the same behavior as other Longstring variables in request cancellation in the case of truncation, and in upgrading With operations to Longstring With operations.

While it might be tempting to redefine many or all String Len 255 variables as Longstring, there are a few subtle issues discussed in this chapter that might result in problems should this be done. This is not to say that many such variables shouldn't be converted to Longstring, but it might not be as simple as a one-line editing change.

Truncation

One key difference between a Longstring and a regular String is the default behavior of Longstring truncation: any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation. Two examples of the application of this rule follow:

  • An assignment to a String variable from a Longstring results in request cancellation if the value of the Longstring exceeds the declared String length. This cancellation can happen even if the Longstring is less than 255 bytes long. If, say, variable %short were defined as String Len 55, and a Longstring variable called %long contained 60 bytes of data, an assignment like the following results in request cancellation:

    %short = %long

    Yet, you can successfully use an intermediate assignment to a String Len 255 variable (called %medium in the following example) followed by the assignment of that variable to %short:

    %medium = %long %short = %medium

    As a result, the last five bytes of the value originally held in %long are silently truncated and assigned to %short.

    Of course, since a regular String can never be longer than 255 bytes, any assignment from a Longstring longer than 255 bytes to a regular String will result in request cancellation. There are several ways around this problem, but the simplest is to use the $Str function to silently truncate a Longstring at 255 bytes or whatever is required for assignment to its target. Effectively, the $str function tells Model 204 to treat the Longstring as it would a regular String for truncation purposes, and the assignment succeeds:

    %short = $str(%long)

  • Although the Longstring datatype is not supported inside images, you can assign from a Longstring to an image item. However, assigning to an image item a Longstring variable that has a value that ends with one or more of the target image item's Pad character (which defaults to the space character) where the target image item is not NoStrip results in an implicit truncation — the trailing pad characters are effectively removed. Since implicit truncation of a Longstring value on assignment is not allowed, this results in request cancellation. For example, the following request, which prints the result They're different, shows the image item truncation for an assignment from a String:

    begin %str is string len 8 image foo x is string len 8 end image prepare image foo %str = 'Blank ' %foo:x = %str if %foo:x ne %str then print 'They''re different' end if end

    If %str is declared as a Longstring above, however, the request is cancelled by a Longstring truncation error. But if %str is declared as a Longstring, and if %foo:x = %str is replaced by %foo:x = $str(%str), the request succeeds.

Using $str to correct for this Longstring truncation behavior is not always appropriate, though. The use of $str might be viewed as a continuation of the dubious Model 204 programming practice of truncation by assignment, so it might be avoided or at least used as a last result as a matter of policy. In fact, converting many String variables to Longstring might be viewed as a way of detecting possible unintentional truncation in existing applications, although there are some subtle issues one should be aware of before embarking on such an enterprise.

For additional discussion of these truncation issues, see Changing Longstring truncation behavior.

Longstrings in expressions

Like Strings, a Longstring variable can be used in SOUL expressions, as operands or as input to $functions. Longstring variables can also be used as input to intrinsic methods (as can any other string or numeric datatype).

One important point to keep in mind is that Model 204's expression processing behavior is not changed at all unless Longstring variables or $functions are used, and then only changed in the statements where they are actually used. So the effect of any use of Longstring variables or $functions is limited to the statements that use them.

Concatenation: the With operator

SOUL expressions can have embedded sub-expressions or simply expressions. For example, in

%x = %a with %b with %c

the expression %a with %b is evaluated and an intermediate result is produced. This intermediate result is then used as the first operand in a With operation with %c. With no Longstrings involved, string expressions are silently truncated at 255 bytes, including when producing an intermediate result. So, in the above example, if %a and %b were each 200 bytes long, the intermediate result of %a with %b would be truncated at the 55th byte of %b, and the with %c would simply drop %c, since the intermediate result that was the first operand of the with %c would already be 255 bytes long. In this case, %x would end up containing all of %a, the first 55 bytes of %b, and none of %c. Fortunately, the results would be the same even if the expression were written as follows:

%x = %a with (%b with %c)

It is still worth working this out mentally to develop a good feel for how intermediate expression results are processed in SOUL.

In any case, the With operation behaves differently in the presence of Longstrings. Specifically, if either operand of a With operation is a Longstring, the intermediate result of the operation is also a Longstring. If, in the above example, %a is a Longstring and %b and %c are regular Strings, the result of %a with %b is a 400-byte Longstring. When this 400-byte intermediate result Longstring is then concatenated using the With operation on %c, the result is a Longstring of length 400 plus the length of %c. If the target of this expression, %x, is a regular String, this causes a request-cancelling truncation error.

In addition, if the target of a With operation is a Longstring, the With operation produces a Longstring result, even if none of the operands are themselves Longstrings. For example, if %x is a Longstring, and %a and %b are String Len 255, each with 200 bytes of data:

%x = %a with %b

%x will be 400 bytes long, containing all of %a concatenated with all of %b. If either of the operands of such a With clause is itself an expression, that expression is treated as if its target were also a Longstring. For example, if in

%x = %a with (%b with %c)

%x is a Longstring, and %a, %b, and %c are String Len 255, each with 200 bytes of data, %x will end up being 600 bytes long, containing all of %a concatenated with all of %b with all of %c. This works the same way if the assignment is written as either of the following:

%x = (%a with %b) with %c %x = %a with %b with %c

Expression processing is the same for string literals, so if %x is a Longstring, and %a is a String Len 255 with 255 bytes of data, the following assigns 258 bytes to %x:

%x = %a with '...'

Another way of looking at this is that in the presence of Longstring variables, whether as the target or as one of the operands, all concatenation operations are "upgraded" to be Longstring concatenations. One side-effect of this is that if an operand of a concatenation is a Longstring, Longstring truncation rules apply to the ultimate target of the assignment. For example, %long is a Longstring containing 'Testing...', and %short is a String Len 12:

%short = (%long with '123') with '456'

The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a Longstring, albeit one with less than 255 bytes of data. The cancellation can be avoided with the use of the $Str function, as in the following:

%short = $str((%long with '123') with '456')

Though, again, this is simply carrying on the dubious SOUL programming practice of truncation by assignment.

Note that the "upgrading" of With operations to Longstring With operations is not induced by a Longstring variable or expression inside a $function call. For example, %long is a Longstring with 30 bytes of data, and %short is String Len 10:

%short = '*' with $substr(%long, 1, 20)

%short ends up containing an asterisk followed by the first 9 bytes of %long. The assignment is made with silent truncation, because the result of a non-longstring-capable $function is always treated as a regular String for the purposes of assignment and With processing.

Numeric conversion

In a context where a Longstring is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the Longstring variable is longer than 255 bytes, even if most or all of these bytes are leading zeros. For example, %long is a Longstring with 300 zeros followed by a one:

%a = %long + 1

The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. Longstring data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as String data.

If the result of a numeric operation on a Longstring is then used in a With operation, the With operation is not upgraded to a Longstring With operation, because the intermediate result of the numeric operation is not a Longstring but a numeric, which is then automatically converted to a String intermediate result. For example, %long is a Longstring containing 99, and %short is String Len 2:

%short = %long + 1

The result is not a request cancellation; instead, a M204.0552: VARIABLE TOO SMALL FOR RESULT message is issued, and an asterisk ( * ) is assigned to %short. Similarly, with these definitions and values:

%short = (%long + 1) with '*'

The result is a 10 being assigned to %short with no warnings, exactly the behavior if %long were a String Len 255.

Longstrings not allowed as index %variable in For statement

One case of automatic conversion to numeric where String and Longstring behaviors differ is index loop control variables. For example, the following loop is valid if %s is a String, but it results in a compilation error if %s is a Longstring:

for %s from 1 to 2 print %s end for

Comparisons

Comparison operations such as Eq, Lt, Le, >, <, etc. will perform Longstring comparisons if either of the operands is a Longstring, that is, comparison operations involving Longstring operands behave pretty much as expected.

Longstrings and $functions

Longstrings can be used as inputs to $functions. As mentioned before, if a Longstring expression is assigned to a regular String, a request-cancelling truncation error will occur if the target String variable is not big enough to hold the source Longstring. Request-cancelling truncation errors also occur if a Longstring that is longer than 255 bytes is passed to a non-Longstring-capable $function. For example:

print $substr(%long, 1, 50)

would result in request cancellation if %long was longer than 255 bytes. One way around this would be to use the $str function to tell SOUL to treat the Longstring as a String in this case as in:

print $substr($str(%long), 1, 50)

though a better approach in this case would be to use the Longstring-capable sub-stringing function, $Lstr_Substr, as in:

print $lstr_substr(%long, 1, 50)

The Longstring-capable $functions in this manual typically start with "$lstr", end in "_lstr" (such as $ListInf_Lstr), or belong to a family of $functions (such as the $Regex family) that are completely Longstring-capable. Longstring-capable $functions specific to other Sirius products (like the Janus Web Server and Janus Sockets $functions) typically do not use an "lstr" prefix or suffix, but they are identified in their documentation as Longstring-capable.

In addition to their ability to process more than 255-byte long strings, Longstring-capable $functions have some special characteristics pertaining to expression handling:

  • A Longstring-capable $function that returns a string result (as opposed to one that returns a numeric result such as $Lstr_Index) is treated as a Longstring expression for the purposes of truncation and for the upgrading of With operations to Longstring With operations. For example, if %short is String Len 5 and %junk contains Some text:

    %short = $lstr_substr(%junk, 1, 7)

    would result in a request-cancelling truncation error. This is true whether %junk was a Longstring or a regular String, though the latter illustrates the point that regular String variables (or expressions) can be used as input to Longstring-capable $functions. If %junk contained 300 bytes of data:

    %out = $lstr_substr(%junk, 1, 255) with '*'

    would result in a request-cancelling truncation error if %out were a regular String variable, and would result in 256 bytes, the last byte being an asterisk, being assigned to %out if %out were a Longstring.

  • All string arguments to Longstring-capable $functions are treated as Longstring targets for the purpose of upgrading With operations to Longstring With operations. For example, since $Lstr_Right is Longstring-capable, With in its string argument is upgraded to Longstring. So, if %medium is a string containing 252 or more characters, then:

    $lstr_right(%medium with '****', 256)

    returns the right-most 252 bytes of %medium, concatenated with four asterisks.

    Note: This behavior does not imply that Longstring-capable $functions will always accept strings longer than 255 bytes as their arguments. For example, $Lstr_Index will not accept strings longer than 255 bytes as its second argument (the string being searched for), and $Lstr_Right and $Lstr_Left won't accept any strings longer than a single byte for their third argument (the pad character). This $function-specific behavior does not affect the treatment of the $function results or arguments as Longstring data for expression handling purposes.

Longstrings and complex subroutines

Complex subroutine parameters, both Input and Output (or InOut, which means the same thing as Output) can be defined as Longstring, as in either of the following:

subroutine chop(%x is longstring input) subroutine chop(%x is longstring output)

In addition, Longstring variables and expressions can be passed as parameters to complex subroutines. For Output parameters, Longstring issues are fairly straightforward. There are two restrictions:

  • You cannot pass a Longstring as a parameter to a subroutine that defines the parameter as String Output.
  • You cannot pass a regular String as a parameter to a subroutine that defines the parameter as Longstring Output.

For Input parameters, things are somewhat more complex, because:

  • Mismatches in String and Longstring datatypes are allowed between passed value and declared parameter.
  • Input parameters can actually receive the results of expressions as their inputs.

While for Input parameters, Strings and Longstrings may be passed interchangeably as Longstring and String parameters, subroutine declaration statements (Declare Subroutine) must exactly match the parameter types on the actual subroutine definitions. That is, given a declaration like this:

declare subroutine tender(longstring)

One cannot later specify the subroutine as

subroutine tender(%mercy is string len 255)

If a Longstring parameter is passed to a subroutine with the parameter defined as String Input, the request is cancelled if the Longstring value is longer than the length of the String Input parameter (as always, this will happen even if the Longstring value is shorter than 255 bytes). This mimics the behavior of an assignment of a Longstring variable to a regular String variable.

If a Longstring array is passed to a subroutine with the parameter defined as a String Array, the request is cancelled if any element of the Longstring array is longer than 255 bytes, whether or not that element is ever referenced in the complex subroutine. Outside the functionality issues raised by this limitation, it also suggests an inefficiency in passing a Longstring array to a String parameter: the inefficiency of scanning the array for values longer than 255 bytes. Because of both the functionality and efficiency issues, it is probably best to avoid passing a Longstring array to a String array parameter if at all possible.

Because a String variable or a literal can always fit into a Longstring parameter, there are no truncation or other issues associated with passing String variables and literals as parameters defined as Longstring.

If a call to a complex subroutine contains a With operation for a Longstring parameter, that With operation is “upgraded” to a Longstring With operation, whether or not any of the operands are themselves Longstrings, exactly as if the target of a With operation were a Longstring variable. As everywhere else, a With operation involving a Longstring in a subroutine call will also be upgraded to a Longstring With operation, meaning that no truncation will occur at 255 bytes, and that if the result is longer than the length of the target String parameter, the request will be cancelled.

Changing Longstring truncation behavior

While it is sometimes convenient that Model 204 silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect User Language programs. Because of this history and the higher chance of unintentional truncation from a Longstring source, the default behavior for Longstrings is that any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation. This behavior should facilitate "cleaner" and more robust code — where truncation is intended, it is explicitly indicated (for example, with $Lstr_Substr, $Lstr_Left, or $Str).

Nevertheless, since this cancellation on truncation behavior is inconsistent with Model 204's behavior for strings, it might be viewed as undesirable. If you want to prevent request continuation on truncation of a Longstring source in an Online, you can MSGCTL the error message for Longstring truncation to NOCAN.

The three messages that you might need to MSGCTL are MSIR.0680, MSIR.0681, and MSIR.0682.

  • MSIR.0680 is issued if the SIRFACT system parameter X'01' bit is set, or if the Model 204 DEBUGUL user parameter is set to a non-zero value.
  • MSIR.0681 is issued for requests entered at command level rather than run from a procedure.
  • MSIR.0682 is issued otherwise.

Issuing MSGCTL for these messages to NOCAN might prevent request cancellation from the occasional Longstring truncation, but if silent truncation of Longstrings is heavily used as a programming “technique” inside a request, the user running the request will quickly be restarted with a “TOO MANY ERRORS” message. To prevent this, MSGCTL the indicated messages to NOCOUNT.

Even then, a large number of these messages might be viewed as being annoying, at best, if the intent is to simply ignore silent truncation of Longstrings. In that case, MSGCTL the indicated messages to NOTERM and maybe even NOAUDIT (if this latter is available). Even then, there will be a little Model 204 processing overhead in producing the messages that are everywhere suppressed, so it would still generally be more efficient to truncate Longstrings explicitly using $Str, $Lstr_Substr or $Lstr_Left.

If you use the default Longstring behavior, at least in the development and test environments, you should find it will rapidly catch potential problems and so produce more bug-free code. The request cancellation due to Longstring truncation should therefore be a benefit. In those places that “truncation by assignment” is used in the code, if you change any of the types in the source expression and discover request cancellation, you will probably decide it is better to use an explicit truncation construct, rather than to retain this dubious coding practice.

If there is concern about request cancellation in a production region, you can MSGCTL the indicated messages to NOTERM in production. However, such a switch allows a production request to continue after an unanticipated Longstring truncation, so it could result in data corruption or a more subtle error later in the request that will cause request cancellation anyway, but be more difficult to diagnose.

Longstrings and the Print, Html, and Text statements

Using Longstring expressions in the Print statement works largely "as expected": given the constraints of LOBUFF, OUTCCC, and OUTMRL, and other output target specific parameters — the values of Longstrings are simply displayed to the output target. One minor exception to this is that the To clause on the Print statement is not supported for Longstrings.

It should also be kept in mind that the With keyword in Print statements is not the With concatenation operator, although the result is usually the same as if it were. Specifically, the With keyword results in the part before the With being printed, followed by the part after. This means that if two regular String variables, each with 255 bytes of data in them, are printed as follows:

print %a with %b

510 bytes of data would be printed, which is different from the With operator in an assignment like the following, which will result in %c simply containing the contents of %a, because the With operation results in truncation at 255 bytes:

%c = %a with %b

This difference between the With keyword in the Print statement and the With operator in expressions predates Longstrings and is, in fact, more significant with regular Strings than with Longstrings.

The HTML and Text statements allow variable values or expression results to be embedded inside the expression start and end characters (defaults: { and }). As with the Print statement, this works pretty much “as expected” for Longstrings: the contents of the Longstring variable or the result of a Longstring expression will be displayed in their entirety within display parameter constraints. The only Longstring related issue for HTML statement expressions is that if an expression is not a Longstring variable, a Longstring $function, or a With operation involving one or more of these, the expression is assumed to be a regular String expression that undergoes silent truncation at 255 bytes. For example, if %a and %b were regular String variables both containing 200 bytes of data, the following would truncate the concatenation of %a and %b at 255 bytes:

text data The result is {%a with %b}

To get around this, one can force the With operation to be upgraded to a Longstring With operation, using:

text data The result is {$lstr(%a with %b)}

However, the use of With operations in Html statements is generally silly, since the same result can be obtained by simply entering each operand in the With expression as a separate expression as in:

text data The result is {%a}{%b}

Longstrings and methods

In addition to their use as local variables and as inputs to or outputs from $functions and complex subroutines, Longstrings can, of course, also be used in the object-oriented constructs made available by SOUL. These uses include:

  • As structure or class members.
  • As input parameters to both user-defined and system methods.
  • As the result (output value) of SOUL and system methods.

In fact, all system methods are Longstring-capable, so they behave, for the purposes of truncation and upgrading of With operations, the same as Longstring-capable $functions. Therefore any With operation whose result is an input to a system method causes the With operation to be upgraded to a Longstring With operation. Similarly, any implicit truncation of the result of a system method results in request cancellation.

User-defined methods, on the other hand, can declare their inputs and output as Longstrings or Strings of a specific length. Longstring inputs and results exhibit the same truncation and With operation behavior as string inputs to system methods. For example, consider the following function declaration in some class:

function encode(%in is longstring) is longstring

If the method is invoked as follows:

%x = %foo:encode(%a with %b)

the %a with %b is upgraded to a Longstring With operation, because its target (the %in parameter in the Encode function) is a Longstring. Similarly, if %x is a standard String variable with some specific length, the request will be cancelled if the result of the Encode method is longer than %x's declared length.

String inputs and output, on the other hand, will behave like standard String variables for the purposes of truncation and With operation behavior. For example, consider the following function declaration in some class:

function stubby(%in is string len 4) is string len 2

If the method is invoked as follows:

%x = %foo:stubby(%a with %b)

the %a with %b is not upgraded to a Longstring With operation because its target (the %in parameter in the Stubby function) is not a Longstring. Of course, if either %a or %b is a Longstring, then the With operation will be a Longstring With operation, anyway. If neither %a nor %b is a Longstring and %a contains foo and %b contains bar, the result of the With operation would be foobar which would be silently truncated to foob when assigned to the input parameter %in.

Similarly, if %x is a String Len 1, and the Stubby method returns ok, the ok would be silently truncated to o when assigned to %x. In fact, if the Stubby method had the following statement:

return 'Not OK'

the return value would be silently truncated to No before being assigned to the target variable, even if the target variable for the Stubby invocation was longer than two bytes. On the other hand, if the Stubby method had the following statement:

return %schooner

and %schooner was a Longstring with a value longer than two bytes, the request would be cancelled because of Longstring trucation, even if the target variable for the Stubby invocation was, itself, a Longstring.

Finally, support for intrinsic methods was introduced. As for all other system method inputs, intrinsic String system methods all behave as if their method string was a Longstring. For example, in:

%x = (%a with %b):right(40, pad='*')

the %a with %b would be upgraded to a Longstring With, even if neither %a nor %b were a Longstring.

For intrinsic and other methods, the fact that all string inputs are treated as Longstrings does not mean that the method will necessarily accept arbitrarily long values. In fact, it's quite possible for a parameter to be restricted to being a single character. For example the intrinsic String Right method has a named parameter called Pad that cannot be longer than one byte:

%y = %x:right(50, pad=%pad)

In this example, if %pad had a value longer than a single byte, the request would be cancelled. This, in spite of the fact that the parameter behaves like a Longstring parameter.

Longstring performance

The first 255 bytes of Longstrings are always kept in STBL, so the code path for manipulating a Longstring variable with a value that is shorter than 256 bytes is usually identical to or only slightly greater than the code path for manipulating a regular String variable. A Longstring variable always has 257 bytes of STBL allocated for it at compile time, and it requires somewhat more VTBL space than a regular String variable. Longstring arrays require 257 bytes of STBL per element and some VTBL space per element. This is unlike regular String variables, which require no per-element VTBL space.

Yet because of the minor code path issues and the table space issues just mentioned, it is probably not a good idea to use Longstring variables in contexts where the values are never expected to exceed 255 bytes, unless performance is not a major concern, or unless the extra error detection for Longstring truncation is desired.

Of course, variables that need to hold more than 255 bytes of data must be declared as Longstrings, and any data beyond 255 bytes gets stored in CCATEMP. This means manipulation of very long Longstring variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization. In addition, very long Longstring values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in Longstrings in applications; quite the contrary. Longstrings are designed for applications that require long values, and the performance of Longstring manipulation, even for very long values, will generally be pretty good.

Nevertheless, it is a good idea to avoid unnecessary, very long, Longstring operations — unnecessary because the application does not require it, or because the operation has already been performed once. Regarding the latter, if a very long Longstring operation occurs in a loop, it would be better to move the operation outside the loop if possible, or to only do it conditionally if it's really required and hasn't already been performed in a previous iteration of the loop.

There is relatively little space overhead for the part of a Longstring that resides in CCATEMP - 6124 of the 6144 bytes on each CCATEMP page actually hold data. So the first 255 bytes of a 60,000 byte long Longstring value are stored in STBL, and the remaining 60,000-255 bytes are stored on (60,000-255)/6124, or 10 CCATEMP pages. Intermediate results will also use some CCATEMP space, though this usage will typically be short-lived — the space being released as soon as the statement completes. So, for example, if %a and %b are Longstring variables each with 90,000 bytes of data, and %c is a Longstring variable, the following statement will temporarily require an extra 120,000 bytes of space (255 of them in STBL) to hold the result of the %a with %b operation:

%c = $lstr_substr(%a with %b, 60000, 60000)

Because concatenation of one string to another is such a common operation, assignment of the concatenation of a Longstring variable and another string to the first Longstring variable is highly optimized. For example, if %long is a Longstring with 50,000 bytes of data:

%long = %long with '!'

would simply tack an exclamation mark on the end of %long rather than copying all of %long and an exclamation mark and then assigning that string to %long. Note, however, that this optimization is only performed if a single string is being concatenated with the current value of the target variable. That is, in the following:

%long = %long with '+' with $time

an intermediate Longstring containing the concatenation of %long and a plus sign will be created. That intermediate Longstring will then be concatenated with the current time (as returned by $time) and then assigned to %long. This means that the current contents of %long end up being copied twice in such a case which, if %long contains 50,000 bytes, is 100,000 bytes worth of data movement which will be quite expensive, by any standard. Fortunately, it is easy to “help out” the compiler to make this operation more efficient:

%long = %long with ('+' with $time)

In this case, the concatenation of the plus sign and the current time are assigned to an intermediate work Longstring. Then, because this intermediate value is simply being concatenated with %long and then assigned back to %long, the concatenation optimization results in the intermediate work Longstring simply being tacked on to the end of %long, requiring almost no data movement, at all. Even in cases where the concatenation can't be optimized to an append operation, it is usually a good idea to isolate concatenations involving relatively small values from a preceding one involving a (potentially) very long one.

For example, if a Longstring with a potentially large value is being bracketed by the date and time, using a greater-than and less-than symbol as separators, the following:

%long = $date with '>' with %long with ('<' with $time)

will be more efficient than

%long = $date with '>' with %long with '<' with $time