Longstrings: Difference between revisions
m (typo) |
|||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
<!-- <var>Longstrings</var> --> | <!-- <var>Longstrings</var> --> | ||
As of <var class="product">Model 204</var> version 7.5, <var>Longstrings</var> appear as a native <var class="product">Model 204</var> datatype and are defined in the same way as other variable datatypes: | |||
<p class="code">%name is longstring | <p class="code">%name is longstring | ||
</p> | </p> | ||
<var>Longstring</var> | <var>Longstring</var> variables are largely interchangeable with <var>String</var> variables, with the exception that a <var>Longstring</var> can have a length up to 2**31-1 bytes, while <var>String</var> variables have a maximum length of 255 bytes. The <var>Variables Are</var> statement and the <var>VTYPE</var> parameter do not allow <var>Longstring</var> to be set as a default type, so all <var>Longstring</var> variables must be explicitly declared as such. <var>Longstring</var> variables can be defined as <var>Common</var> and as subroutine parameters, but there is currently no support for <var>Static</var> <var>Longstring</var> variables. <var>Longstrings</var> may be specified in an <var>Initial</var> clause. | ||
Like other %variables, a <var>Longstring</var> cannot be declared as < | Like other %variables, a <var>Longstring</var> cannot be declared as <var>Global</var> on its declaration. However, a <var>Longstring</var> %variable can be dynamically bound to a global <var>Longstring</var> with the <var>[[$Lstr_Global_and_$Lstr_Session|$Lstr_global]]</var> function, and it can be dynamically bound to a session global <var>Longstring</var> with the <var>[[$Lstr_Global_and_$Lstr_Session|$Lstr_session]]</var> function. | ||
The value of a global or session <var> | The value of a global or session <var>Longstring</var> can also be retrieved with <var>[[$Lstr_Global_Get_and_$Lstr_Session_Get|$Lstr_global_get]]</var> or <var>[[$Lstr_Global_Get_and_$Lstr_Session_Get|$Lstr_session_get]]</var>, and it can be updated with <var>[[$Lstr_Global_Set_and_$Lstr_Session_Set|$Lstr_global_set]]</var> or <var>[[$Lstr_Global_Set_and_$Lstr_Session_Set|$Lstr_session_set]]</var>. | ||
<var>Longstrings</var> can also be declared as arrays: | <var>Longstrings</var> can also be declared as arrays: | ||
Line 13: | Line 13: | ||
</p> | </p> | ||
The <var>Longstring</var> datatype is not supported inside images. However, | The <var>Longstring</var> datatype is not supported inside images. However, image items with length greater than 255 are now supported: | ||
<p class="code">image foo | <p class="code">image foo | ||
bar is string len 300 | bar is string len 300 | ||
Line 19: | Line 19: | ||
</p> | </p> | ||
While such image items can't have arbitrary lengths up to 2**31-1 like other <var>Longstring</var> variables, they exhibit the same behavior as other <var>Longstring</var> variables in request cancellation in the case of truncation, and in upgrading <var>With</var> operations to <var>Longstring</var> < | While such image items can't have arbitrary lengths up to 2**31-1 like other <var>Longstring</var> variables, they exhibit the same behavior as other <var>Longstring</var> variables in request cancellation in the case of truncation, and in upgrading <var>With</var> operations to <var>Longstring</var> <var>With</var> operations. | ||
While it might be tempting to redefine many or all <code>String Len 255</code> variables as <var>Longstring</var>, there are a few subtle issues discussed in this chapter that might result in problems should this be done. This is not to say that many such variables shouldn't be converted to <var>Longstring</var>, but it might not be as simple as a one-line editing change. | While it might be tempting to redefine many or all <code>String Len 255</code> variables as <var>Longstring</var>, there are a few subtle issues discussed in this chapter that might result in problems should this be done. This is not to say that many such variables shouldn't be converted to <var>Longstring</var>, but it might not be as simple as a one-line editing change. | ||
==Truncation== | ==Truncation== | ||
One key difference between a <var>Longstring</var> and a regular <var>String</var> is the default behavior of <var>Longstring</var> truncation: <b>any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation</b>. Two examples of the application of this rule follow: | One key difference between a <var>Longstring</var> and a regular <var>String</var> is the default behavior of <var>Longstring</var> truncation: <b>any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation</b>. Two examples of the application of this rule follow: | ||
Line 44: | Line 44: | ||
<p class="code">begin | <p class="code">begin | ||
%str is string len 8 | %str is string len 8 | ||
image foo | image foo | ||
x is string len 8 | x is string len 8 | ||
end image | end image | ||
prepare image foo | prepare image foo | ||
%str = 'Blank ' | %str = 'Blank ' | ||
%foo:x = %str | %foo:x = %str | ||
if %foo:x ne %str then | if %foo:x ne %str then | ||
print 'They<nowiki>''</nowiki>re different' | print 'They<nowiki>''</nowiki>re different' | ||
Line 65: | Line 65: | ||
For additional discussion of these truncation issues, see [[Longstrings#Changing_Longstring_truncation_behavior|Changing Longstring truncation behavior]]. | For additional discussion of these truncation issues, see [[Longstrings#Changing_Longstring_truncation_behavior|Changing Longstring truncation behavior]]. | ||
==Longstrings in expressions== | |||
Like <var>Strings</var>, a <var>Longstring</var> variable can be used in <var class="product">SOUL</var> expressions, as operands or as input to $functions. <var>Longstring</var> variables can also be used as input to intrinsic methods (as can any other string or numeric datatype). | |||
One important point to keep in mind is that <var class="product">Model 204</var>'s expression processing behavior is not changed at all unless <var>Longstring</var> variables or $functions are used, and then only changed in the statements where they are actually used. So the effect of any use of <var>Longstring</var> variables or $functions is limited to the statements that use them. | |||
===Concatenation: the With operator=== | |||
<var class="product"> | |||
<var class="product">SOUL</var> expressions can have embedded sub-expressions or simply expressions. For example, in | |||
<p class="code">%x = %a with %b with %c | <p class="code">%x = %a with %b with %c | ||
</p> | </p> | ||
the expression <code>%a with %b</code> is evaluated and an intermediate result is produced. This intermediate result is then used as the first operand in a < | the expression <code>%a with %b</code> is evaluated and an intermediate result is produced. This intermediate result is then used as the first operand in a <var>With</var> operation <code>with %c</code>. With no <var>Longstrings</var> involved, string expressions are silently truncated at 255 bytes, including when producing an intermediate result. So, in the above example, if <code>%a</code> and <code>%b</code> were each 200 bytes long, the intermediate result of <code>%a with %b</code> would be truncated at the 55th byte of <code>%b</code>, and the <code>with %c</code> would simply drop <code>%c</code>, since the intermediate result that was the first operand of the <code>with %c</code> would already be 255 bytes long. In this case, <code>%x</code> would end up containing all of <code>%a</code>, the first 55 bytes of <code>%b</code>, and none of <code>%c</code>. Fortunately, the results would be the same even if the expression were written as follows: | ||
<p class="code">%x = %a with (%b with %c) | <p class="code">%x = %a with (%b with %c) | ||
</p> | </p> | ||
It is still worth working this out mentally to develop a good feel for how intermediate expression results are processed in <var class="product">SOUL</var>. | |||
In any case, the <var>With</var> operation behaves differently in the presence of <var>Longstrings</var>. | In any case, the <var>With</var> operation behaves differently in the presence of <var>Longstrings</var>. Specifically, if either operand of a <var>With</var> operation is a <var>Longstring</var>, the intermediate result of the operation is also a <var>Longstring</var>. If, in the above example, <code>%a</code> is a <var>Longstring</var> and <code>%b</code> and <code>%c</code> are regular <var>Strings</var>, the result of <code>%a with %b</code> is a 400-byte <var>Longstring</var>. When this 400-byte intermediate result <var>Longstring</var> is then concatenated using the <var>With</var> operation on <code>%c</code>, the result is a <var>Longstring</var> of length 400 plus the length of <code>%c</code>. If the target of this expression, <code>%x</code>, is a regular <var>String</var>, this causes a request-cancelling truncation error. | ||
In addition, if the target of a < | In addition, if the target of a <var>With</var> operation is a <var>Longstring</var>, the <var>With</var> operation produces a <var>Longstring</var> result, even if none of the operands are themselves <var>Longstrings</var>. For example, if <code>%x</code> is a <var>Longstring</var>, and <code>%a</code> and <code>%b</code> are <code>String Len 255</code>, each with 200 bytes of data: | ||
<code>%x</code> is a <var>Longstring</var>, and <code>%a</code> and <code>%b</code> are <code>String Len 255</code>, each with 200 bytes of data: | |||
<p class="code">%x = %a with %b | <p class="code">%x = %a with %b | ||
</p> | </p> | ||
<code>%x</code> will be 400 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code>. | <code>%x</code> will be 400 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code>. If either of the operands of such a <var>With</var> clause is itself an expression, that expression is treated as if its target were also a <var>Longstring</var>. For example, if in | ||
<p class="code">%x = %a with (%b with %c) | <p class="code">%x = %a with (%b with %c) | ||
</p> | </p> | ||
<code>%x</code> is a <var>Longstring</var>, and <code>%a</code>, <code>%b</code>, and <code>%c</code> are <code>String Len 255</code>, each with 200 bytes of data, <code>%x</code> will end up being 600 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code> with all of <code>%c</code>. | <code>%x</code> is a <var>Longstring</var>, and <code>%a</code>, <code>%b</code>, and <code>%c</code> are <code>String Len 255</code>, each with 200 bytes of data, <code>%x</code> will end up being 600 bytes long, containing all of <code>%a</code> concatenated with all of <code>%b</code> with all of <code>%c</code>. This works the same way if the assignment is written as either of the following: | ||
<p class="code">%x = (%a with %b) with %c | <p class="code">%x = (%a with %b) with %c | ||
%x = %a with %b with %c | %x = %a with %b with %c | ||
Line 98: | Line 101: | ||
<p class="code">%short = (%long with '123') with '456' | <p class="code">%short = (%long with '123') with '456' | ||
</p> | </p> | ||
The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a <var>Longstring</var>, albeit one with less than 255 bytes of data. The cancellation can be avoided with the use of the <var>$ | The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a <var>Longstring</var>, albeit one with less than 255 bytes of data. The cancellation can be avoided with the use of the <var>$Str</var> function, as in the following: | ||
<p class="code">%short = $str((%long with '123') with '456') | <p class="code">%short = $str((%long with '123') with '456') | ||
</p> | </p> | ||
Though, again, this is simply carrying on the dubious <var class="product"> | Though, again, this is simply carrying on the dubious <var class="product">SOUL</var> programming practice of truncation by assignment. | ||
Note that the "upgrading" | Note that the "upgrading" of <var>With</var> operations to <var>Longstring</var> <var>With</var> operations is not induced by a <var>Longstring</var> variable or expression inside a $function call. For example, <code>%long</code> is a <var>Longstring</var> with 30 bytes of data, and <code>%short</code> is <code>String Len 10</code>: | ||
<p class="code">%short = '*' with $substr(%long, 1, 20) | <p class="code">%short = '*' with $substr(%long, 1, 20) | ||
</p> | </p> | ||
<code>%short</code> ends up containing an asterisk followed by the first 9 bytes of <code>%long</code>. | <code>%short</code> ends up containing an asterisk followed by the first 9 bytes of <code>%long</code>. The assignment is made with silent truncation, because the result of a non-longstring-capable $function is always treated as a regular <var>String</var> for the purposes of assignment and <var>With</var> processing. | ||
===Numeric conversion=== | |||
In a context where a <var>Longstring</var> is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the <var>Longstring</var> variable is longer than 255 bytes, even if most or all of these bytes are leading zeros. For example, <code>%long</code> is a <var>Longstring</var> with 300 zeros followed by a one: | In a context where a <var>Longstring</var> is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the <var>Longstring</var> variable is longer than 255 bytes, even if most or all of these bytes are leading zeros. For example, <code>%long</code> is a <var>Longstring</var> with 300 zeros followed by a one: | ||
<p class="code">%a = %long + 1 | <p class="code">%a = %long + 1 | ||
</p> | </p> | ||
The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. <var>Longstring</var> data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as String data. < | The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. <var>Longstring</var> data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as <var>String</var> data. | ||
If the result of a numeric operation on a <var>Longstring</var> is then used in a <var>With</var> operation, the <var>With</var> operation is not upgraded to a <var>Longstring</var> <var>With</var> operation, because the intermediate result of the numeric operation is not a <var>Longstring</var> but a numeric, which is then automatically converted to a <var>String</var> intermediate result. For example, <code>%long</code> is a <var>Longstring</var> containing <code>99</code>, and <code>%short</code> is <code>String Len 2</code>: | |||
<p class="code">%short = %long + 1 | |||
</p> | |||
The result is not a request cancellation; instead, a <code>M204.0552: VARIABLE TOO SMALL FOR RESULT</code> message is issued, and an asterisk ( * ) is assigned to <code>%short</code>. Similarly, with these definitions and values: | |||
<p class="code">%short = (%long + 1) with '*' | |||
</p> | |||
The result is a <code>10</code> being assigned to <code>%short</code> with no warnings, exactly the behavior if <code>%long</code> were a <code>String Len 255</code>. | |||
<div id="longstrInvIndex"></div> | |||
====Longstrings not allowed as index %variable in For statement==== | |||
<!--Caution: <div> above--> | |||
One case of automatic conversion to numeric where <var>String</var> and <var>Longstring</var> behaviors differ is index loop control variables. For example, the following loop is valid if <code>%s</code> is a <var>String</var>, but it results in a compilation error if <code>%s</code> is a <var>Longstring</var>: | |||
<p class="code">for %s from 1 to 2 | <p class="code">for %s from 1 to 2 | ||
print %s | print %s | ||
end for | end for | ||
</p> | </p> | ||
===Comparisons=== | |||
Comparison operations such as <var>Eq</var>, <var>Lt</var>, <var>Le</var>, <var>></var>, <var><</var>, etc. will perform <var>Longstring</var> comparisons if either of the operands is a <var>Longstring</var>, that is, comparison operations involving <var>Longstring</var> operands behave pretty much as expected. | |||
Comparison operations such as < | |||
==Longstrings and $functions== | ==Longstrings and $functions== | ||
<var>Longstrings</var> can be used as inputs to $functions. As mentioned before, if a <var>Longstring</var> expression is assigned to a regular String, a request-cancelling truncation error will occur if the target String variable is not big enough to hold the source <var>Longstring</var>. Request-cancelling truncation errors also occur if a <var>Longstring</var> that is longer than 255 bytes is passed to a non-<var>Longstring</var>-capable $function. For example: | <var>Longstrings</var> can be used as inputs to $functions. As mentioned before, if a <var>Longstring</var> expression is assigned to a regular <var>String</var>, a request-cancelling truncation error will occur if the target <var>String</var> variable is not big enough to hold the source <var>Longstring</var>. Request-cancelling truncation errors also occur if a <var>Longstring</var> that is longer than 255 bytes is passed to a non-<var>Longstring</var>-capable $function. For example: | ||
<p class="code">print $substr(%long, 1, 50) | <p class="code">print $substr(%long, 1, 50) | ||
</p> | </p> | ||
would result in request cancellation if <code>%long</code> was longer than 255 bytes. One way around this would be to use the <var>$str</var> function to tell <var class="product"> | would result in request cancellation if <code>%long</code> was longer than 255 bytes. One way around this would be to use the <var>$str</var> function to tell <var class="product">SOUL</var> to treat the <var>Longstring</var> as a <var>String</var> in this case as in: | ||
<p class="code">print $substr($str(%long), 1, 50) | <p class="code">print $substr($str(%long), 1, 50) | ||
</p> | </p> | ||
Line 144: | Line 153: | ||
In addition to their ability to process more than 255-byte long strings, <var>Longstring</var>-capable $functions have some special characteristics pertaining to expression handling: | In addition to their ability to process more than 255-byte long strings, <var>Longstring</var>-capable $functions have some special characteristics pertaining to expression handling: | ||
<ul> | <ul> | ||
<li>A <var>Longstring</var>-capable $function that returns a string result (as opposed to one that returns a numeric result such as <var>[[$Lstr_Index]]</var>) is treated as a <var>Longstring</var> expression for the purposes of truncation and for the upgrading of < | <li>A <var>Longstring</var>-capable $function that returns a string result (as opposed to one that returns a numeric result such as <var>[[$Lstr_Index]]</var>) is treated as a <var>Longstring</var> expression for the purposes of truncation and for the upgrading of <var>With</var> operations to <var>Longstring</var> <var>With</var> operations. For example, if <code>%short</code> is <code>String Len 5</code> and <code>%junk</code> contains <code>Some text</code>: | ||
<p class="code">%short = $lstr_substr(%junk, 1, 7) | <p class="code">%short = $lstr_substr(%junk, 1, 7) | ||
</p> | </p> | ||
would result in a request-cancelling truncation error. This is true whether <code>%junk</code> was a <var>Longstring</var> or a regular String, though the latter illustrates the point that regular String variables (or expressions) can be used as input to <var>Longstring</var>-capable $functions. If <code>%junk</code> contained 300 bytes of data: | would result in a request-cancelling truncation error. This is true whether <code>%junk</code> was a <var>Longstring</var> or a regular <var>String</var>, though the latter illustrates the point that regular <var>String</var> variables (or expressions) can be used as input to <var>Longstring</var>-capable $functions. If <code>%junk</code> contained 300 bytes of data: | ||
<p class="code">%out = $lstr_substr(%junk, 1, 255) with '*' | <p class="code">%out = $lstr_substr(%junk, 1, 255) with '*' | ||
</p> | </p> | ||
would result in a request-cancelling truncation error if <code>%out</code> were a regular String variable, and would result in 256 bytes, the last byte being an asterisk, being assigned to <code>%out</code> if <code>%out</code> were a <var>Longstring</var>. | would result in a request-cancelling truncation error if <code>%out</code> were a regular <var>String</var> variable, and would result in 256 bytes, the last byte being an asterisk, being assigned to <code>%out</code> if <code>%out</code> were a <var>Longstring</var>. | ||
<li>All string arguments to <var>Longstring</var>-capable $functions are treated as <var>Longstring</var> targets for the purpose of upgrading <var>With</var> operations to <var>Longstring</var> <var>With</var> operations. For example, since <var>[[$Lstr_Right]]</var> is <var>Longstring</var>-capable, <var>With</var> in its string argument is upgraded to <var>Longstring</var>. So, if <code>%medium</code> is a string containing 252 or more characters, then: | <li>All string arguments to <var>Longstring</var>-capable $functions are treated as <var>Longstring</var> targets for the purpose of upgrading <var>With</var> operations to <var>Longstring</var> <var>With</var> operations. For example, since <var>[[$Lstr_Right]]</var> is <var>Longstring</var>-capable, <var>With</var> in its string argument is upgraded to <var>Longstring</var>. So, if <code>%medium</code> is a string containing 252 or more characters, then: | ||
<p class="code">$lstr_right(%medium with '****', 256) | <p class="code">$lstr_right(%medium with '****', 256) | ||
Line 159: | Line 168: | ||
<p class="note"><b>Note:</b> This behavior does not imply that <var>Longstring</var>-capable $functions will always accept strings longer than 255 bytes as their arguments. For example, <var>$Lstr_Index</var> will not accept strings longer than 255 bytes as its second argument (the string being searched for), and <var>$Lstr_Right</var> and <var>[[$Lstr_Left]]</var> won't accept any strings longer than a single byte for their third argument (the pad character). This $function-specific behavior does not affect the treatment of the $function results or arguments as <var>Longstring</var> data for expression handling purposes. </p> | <p class="note"><b>Note:</b> This behavior does not imply that <var>Longstring</var>-capable $functions will always accept strings longer than 255 bytes as their arguments. For example, <var>$Lstr_Index</var> will not accept strings longer than 255 bytes as its second argument (the string being searched for), and <var>$Lstr_Right</var> and <var>[[$Lstr_Left]]</var> won't accept any strings longer than a single byte for their third argument (the pad character). This $function-specific behavior does not affect the treatment of the $function results or arguments as <var>Longstring</var> data for expression handling purposes. </p> | ||
</ul> | </ul> | ||
==Longstrings and complex subroutines== | ==Longstrings and complex subroutines== | ||
Complex subroutine parameters, both < | Complex subroutine parameters, both <var>Input</var> and <var>Output</var> (or <var>InOut</var>, which means the same thing as <var>Output</var>) can be defined as <var>Longstring</var>, as in either of the following: | ||
<p class="code">subroutine chop(%x is longstring input) | <p class="code">subroutine chop(%x is longstring input) | ||
subroutine chop(%x is longstring output) | subroutine chop(%x is longstring output) | ||
</p> | </p> | ||
In addition, <var>Longstring</var> variables and expressions can be passed as parameters to complex subroutines. For Output parameters, <var> | In addition, <var>Longstring</var> variables and expressions can be passed as parameters to complex subroutines. For Output parameters, <var>Longstring</var> issues are fairly straightforward. There are two restrictions: | ||
<ul> | <ul> | ||
<li>You '''cannot''' pass a <var>Longstring</var> as a parameter to a subroutine that defines the parameter as < | <li>You '''cannot''' pass a <var>Longstring</var> as a parameter to a subroutine that defines the parameter as <var>String Output</var>. | ||
<li>You '''cannot''' pass a regular String as a parameter to a subroutine that defines the parameter as < | <li>You '''cannot''' pass a regular <var>String</var> as a parameter to a subroutine that defines the parameter as <var>Longstring Output</var>. | ||
</ul> | </ul> | ||
For Input parameters, things are somewhat more complex, because: | For Input parameters, things are somewhat more complex, because: | ||
<ul> | <ul> | ||
<li>Mismatches in String and <var>Longstring</var> datatypes are allowed between passed value and declared parameter. | <li>Mismatches in <var>String</var> and <var>Longstring</var> datatypes are allowed between passed value and declared parameter. | ||
<li>Input parameters can actually receive the results of expressions as their inputs. | <li>Input parameters can actually receive the results of expressions as their inputs. | ||
</ul> | </ul> | ||
While for Input parameters, Strings and <var>Longstrings</var> may be passed interchangeably as <var>Longstring</var> and String parameters, subroutine declaration statements (< | While for Input parameters, <var>Strings</var> and <var>Longstrings</var> may be passed interchangeably as <var>Longstring</var> and <var>String</var> parameters, subroutine declaration statements (<var>Declare Subroutine</var>) must exactly match the parameter types on the actual subroutine definitions. That is, given a declaration like this: | ||
<p class="code">declare subroutine tender(longstring) | <p class="code">declare subroutine tender(longstring) | ||
</p> | </p> | ||
Line 185: | Line 194: | ||
</p> | </p> | ||
If a <var>Longstring</var> parameter is passed to a subroutine with the parameter defined as < | If a <var>Longstring</var> parameter is passed to a subroutine with the parameter defined as <var>String Input</var>, the request is cancelled if the <var>Longstring</var> value is longer than the length of the <var>String Input</var> parameter (as always, this will happen even if the <var>Longstring</var> value is shorter than 255 bytes). This mimics the behavior of an assignment of a <var>Longstring</var> variable to a regular <var>String</var> variable. | ||
If a <var>Longstring</var> array is passed to a subroutine with the parameter defined as a < | If a <var>Longstring</var> array is passed to a subroutine with the parameter defined as a <var>String Array</var>, the request is cancelled if '''any''' element of the <var>Longstring</var> array is longer than 255 bytes, whether or not that element is ever referenced in the complex subroutine. Outside the functionality issues raised by this limitation, it also suggests an inefficiency in passing a <var>Longstring</var> array to a <var>String</var> parameter: the inefficiency of scanning the array for values longer than 255 bytes. Because of both the functionality and efficiency issues, it is probably best to avoid passing a <var>Longstring</var> array to a <var>String</var> array parameter if at all possible. | ||
Because a String variable or a literal can always fit into a <var>Longstring</var> parameter, there are no truncation or other issues associated with passing String variables and literals as parameters defined as <var>Longstring</var>. | Because a <var>String</var> variable or a literal can always fit into a <var>Longstring</var> parameter, there are no truncation or other issues associated with passing <var>String</var> variables and literals as parameters defined as <var>Longstring</var>. | ||
If a call to a complex subroutine contains a <var>With</var> operation for a <var>Longstring</var> parameter, that <var>With</var> operation is “upgraded” to a <var>Longstring</var> <var>With</var> operation, whether or not any of the operands are themselves <var>Longstrings</var>, exactly as if the target of a <var>With</var> operation were a <var>Longstring</var> variable. As everywhere else, a <var>With</var> operation involving a <var>Longstring</var> in a subroutine call will also be upgraded to a <var>Longstring</var> <var>With</var> operation, meaning that no truncation will occur at 255 bytes, and that if the result is longer than the length of the target <var>String</var> parameter, the request will be cancelled. | |||
==Changing Longstring truncation behavior== | ==Changing Longstring truncation behavior== | ||
While it is sometimes convenient that <var class="product">Model 204</var> silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect <var class="product">User Language</var> programs. Because of this history and the higher chance of unintentional truncation from a <var>Longstring</var> source, the default behavior for <var>Longstrings</var> is that any truncation on assignment from a <var>Longstring</var>, <var> | While it is sometimes convenient that <var class="product">Model 204</var> silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect <var class="product">[[User Language]]</var> programs. Because of this history and the higher chance of unintentional truncation from a <var>Longstring</var> source, the default behavior for <var>Longstrings</var> is that any truncation on assignment from a <var>Longstring</var>, <var>Longstring</var> $function, or <var>Longstring</var> <var>With</var> operation causes request cancellation. This behavior should facilitate "cleaner" and more robust code — where truncation is intended, it is explicitly indicated (for example, with <var>[[$Lstr_Substr]]</var>, <var>[[$Lstr_Left]]</var>, or <var>[[$Str]]</var>). | ||
Nevertheless, since this cancellation on truncation behavior is inconsistent with <var class="product">Model 204's</var> behavior for strings, it might be viewed as undesirable. If you want to prevent request continuation on truncation of a <var>Longstring</var> source in an Online, you can < | Nevertheless, since this cancellation on truncation behavior is inconsistent with <var class="product">Model 204's</var> behavior for strings, it might be viewed as undesirable. If you want to prevent request continuation on truncation of a <var>Longstring</var> source in an Online, you can <var>MSGCTL</var> the error message for <var>Longstring</var> truncation to <var>NOCAN</var>. | ||
The three messages that you might need to MSGCTL are MSIR.0680, MSIR.0681, and MSIR.0682. | The three messages that you might need to MSGCTL are MSIR.0680, MSIR.0681, and MSIR.0682. | ||
Line 203: | Line 212: | ||
<li>MSIR.0682 is issued otherwise.</ul> | <li>MSIR.0682 is issued otherwise.</ul> | ||
Issuing < | Issuing <var>MSGCTL</var> for these messages to <var>NOCAN</var> might prevent request cancellation from the occasional <var>Longstring</var> truncation, but if silent truncation of <var>Longstrings</var> is heavily used as a programming “technique” inside a request, the user running the request will quickly be restarted with a “TOO MANY ERRORS” message. To prevent this, <var>MSGCTL</var> the indicated messages to <var>NOCOUNT</var>. | ||
Even then, a large number of these messages might be viewed as being annoying, at best, if the intent is to simply ignore silent truncation of <var>Longstrings</var>. In that case, <var>MSGCTL</var> the indicated messages to <var>NOTERM</var> and maybe even <var>NOAUDIT</var> (if this latter is available). Even then, there will be a little <var class="product">Model 204</var> processing overhead in producing the messages that are everywhere suppressed, so it would still generally be more efficient to truncate <var>Longstrings</var> explicitly using <var>[[$Str]]</var>, <var>[[$Lstr_Substr]]</var> or <var>[[$Lstr_Left]]</var>. | |||
If you use the default <var>Longstring</var> behavior, at least in the development and test environments, you should find it will rapidly catch potential problems and so produce more bug-free code. The request cancellation due to <var>Longstring</var> truncation should therefore be a benefit. In those places that “truncation by assignment” is used in the code, if you change any of the types in the source expression and discover request cancellation, you will probably decide it is better to use an explicit truncation construct, rather than to retain this dubious coding practice. | |||
If you | If there is concern about request cancellation in a production region, you can <var>MSGCTL</var> the indicated messages to <var>NOTERM</var> in production. However, such a switch allows a production request to continue after an unanticipated <var>Longstring</var> truncation, so it could result in data corruption or a more subtle error later in the request that will cause request cancellation anyway, but be more difficult to diagnose. | ||
==Longstrings and the Print, Html, and Text statements== | ==Longstrings and the Print, Html, and Text statements== | ||
Using <var>Longstring</var> expressions in the < | Using <var>Longstring</var> expressions in the <var>Print</var> statement works largely "as expected": given the constraints of <var>LOBUFF</var>, <var>OUTCCC</var>, and <var>OUTMRL</var>, and other output target specific parameters — the values of <var>Longstrings</var> are simply displayed to the output target. One minor exception to this is that the <var>To</var> clause on the <var>Print</var> statement is not supported for <var>Longstrings</var>. | ||
It should also be kept in mind that the < | It should also be kept in mind that the <var>With</var> keyword in <var>Print</var> statements is not the <var>With</var> concatenation operator, although the result is usually the same as if it were. Specifically, the <var>With</var> keyword results in the part before the <var>With</var> being printed, followed by the part after. This means that if two regular <var>String</var> variables, each with 255 bytes of data in them, are printed as follows: | ||
<p class="code">print %a with %b | <p class="code">print %a with %b | ||
</p> | </p> | ||
510 bytes of data would be printed, which is different from the < | 510 bytes of data would be printed, which is different from the <var>With</var> operator in an assignment like the following, which will result in <code>%c</code> simply containing the contents of <code>%a</code>, because the <var>With</var> operation results in truncation at 255 bytes: | ||
<p class="code">%c = %a with %b | <p class="code">%c = %a with %b | ||
</p> | </p> | ||
This difference between the < | This difference between the <var>With</var> keyword in the <var>Print</var> statement and the <var>With</var> operator in expressions predates <var>Longstrings</var> and is, in fact, more significant with regular <var>Strings</var> than with <var>Longstrings</var>. | ||
The <var>[[Text_and_Html_statements#The_HTML_or_TEXT_statement|HTML]]</var> and <var>[[Text_and_Html_statements#The_HTML_or_TEXT_statement|Text]]</var> statements allow variable values or expression results to be embedded inside the expression start and end characters (defaults: { and }). As with the < | The <var>[[Text_and_Html_statements#The_HTML_or_TEXT_statement|HTML]]</var> and <var>[[Text_and_Html_statements#The_HTML_or_TEXT_statement|Text]]</var> statements allow variable values or expression results to be embedded inside the expression start and end characters (defaults: { and }). As with the <var>Print</var> statement, this works pretty much “as expected” for <var>Longstrings</var>: the contents of the <var>Longstring</var> variable or the result of a <var>Longstring</var> expression will be displayed in their entirety within display parameter constraints. The only <var>Longstring</var> related issue for <var>[[Text_and_Html_statements#The_HTML_or_TEXT_statement|HTML]]</var> statement expressions is that if an expression is not a <var>Longstring</var> variable, a <var>Longstring</var> $function, or a <var>With</var> operation involving one or more of these, the expression is assumed to be a regular <var>String</var> expression that undergoes silent truncation at 255 bytes. For example, if <code>%a</code> and <code>%b</code> were regular <var>String</var> variables both containing 200 bytes of data, the following would truncate the concatenation of <code>%a</code> and <code>%b</code> at 255 bytes: | ||
<p class="code">text data The result is {%a with %b} | <p class="code">text data The result is {%a with %b} | ||
</p> | </p> | ||
To get around this, one can force the < | To get around this, one can force the <var>With</var> operation to be upgraded to a <var>Longstring</var> <var>With</var> operation, using: | ||
<p class="code">text data The result is {$lstr(%a with %b)} | <p class="code">text data The result is {$lstr(%a with %b)} | ||
</p> | </p> | ||
However, the use of < | However, the use of <var>With</var> operations in <var>Html</var> statements is generally silly, since the same result can be obtained by simply entering each operand in the With expression as a separate expression as in: | ||
<p class="code">text data The result is {%a}{%b} | <p class="code">text data The result is {%a}{%b} | ||
</p> | </p> | ||
==Longstrings and methods== | ==Longstrings and methods== | ||
In addition to their use as local variables and as inputs to or outputs from $functions and complex subroutines, <var>Longstrings</var> can, of course, also be used in the object-oriented constructs made available by | In addition to their use as local variables and as inputs to or outputs from $functions and complex subroutines, <var>Longstrings</var> can, of course, also be used in the object-oriented constructs made available by <var class="product">SOUL</var>. These uses include: | ||
<ul> | <ul> | ||
<li>As structure or class members. | <li>As structure or class members. | ||
<li>As input parameters to both | <li>As input parameters to both user-defined and system methods. | ||
<li>As the result (output value) of <var class="product"> | <li>As the result (output value) of <var class="product">SOUL</var> and system methods. | ||
</ul> | </ul> | ||
In fact, '''all''' system methods are <var>Longstring</var>-capable, so they behave, for the purposes of truncation and upgrading of <var>With</var> operations, the same as <var>Longstring</var>-capable $functions. Therefore any <var>With</var> operation whose result is an input to a system method causes the <var>With</var> operation to be upgraded to a <var>Longstring</var> <var>With</var> operation. Similarly, any implicit truncation of the result of a system method results in request cancellation. | In fact, '''all''' system methods are <var>Longstring</var>-capable, so they behave, for the purposes of truncation and upgrading of <var>With</var> operations, the same as <var>Longstring</var>-capable $functions. Therefore any <var>With</var> operation whose result is an input to a system method causes the <var>With</var> operation to be upgraded to a <var>Longstring</var> <var>With</var> operation. Similarly, any implicit truncation of the result of a system method results in request cancellation. | ||
User-defined methods, on the other hand, can declare their inputs and output as <var>Longstrings</var> or <var>Strings</var> of a specific length. <var>Longstring</var> inputs and results exhibit the same truncation and <var>With</var> operation behavior as string inputs to system methods. For example, consider the following function declaration in some class: | |||
<p class="code">function encode(%in is longstring) is longstring | <p class="code">function encode(%in is longstring) is longstring | ||
</p> | </p> | ||
Line 265: | Line 274: | ||
<p class="code">return %schooner | <p class="code">return %schooner | ||
</p> | </p> | ||
and <code>%schooner</code> was a <var>Longstring</var> with a value longer than two bytes, the request would be cancelled because of <var> | and <code>%schooner</code> was a <var>Longstring</var> with a value longer than two bytes, the request would be cancelled because of <var>Longstring</var> trucation, even if the target variable for the Stubby invocation was, itself, a <var>Longstring</var>. | ||
Finally, | Finally, support for [[Intrinsic classes|intrinsic]] methods was introduced. As for all other system method inputs, intrinsic <var>String</var> system methods all behave as if their method string was a <var>Longstring</var>. For example, in: | ||
<p class="code">%x = (%a with %b):right(40, pad='*') | <p class="code">%x = (%a with %b):right(40, pad='*') | ||
</p> | </p> | ||
the <code>%a with %b</code> would be upgraded to a <var>Longstring</var> <var>With</var>, even if neither <code>%a</code> nor <code>%b</code> were a <var>Longstring</var>. | the <code>%a with %b</code> would be upgraded to a <var>Longstring</var> <var>With</var>, even if neither <code>%a</code> nor <code>%b</code> were a <var>Longstring</var>. | ||
For intrinsic and other methods, the fact that all string inputs are treated as <var>Longstrings</var> does not mean that the method will necessarily accept arbitrarily long values. In fact, it's quite possible for a parameter to be restricted to being a single character. For example the intrinsic <var>String</var> <var>[[Right_(String_function)|Right]]</var> method has a named parameter called <var>Pad</var> that cannot be longer than one byte: | For intrinsic and other methods, the fact that all string inputs are treated as <var>Longstrings</var> does not mean that the method will necessarily accept arbitrarily long values. In fact, it's quite possible for a parameter to be restricted to being a single character. For example the intrinsic <var>String</var> <var>[[Right_(String_function)|Right]]</var> method has a named parameter called <var>Pad</var> that cannot be longer than one byte: | ||
<p class="code">%y = %x:right(50, pad=%pad) | <p class="code">%y = %x:right(50, pad=%pad) | ||
</p> | </p> | ||
In this example, if <code>%pad</code> had a value longer than a single byte, the request would be cancelled. This, in spite of the fact that the parameter behaves like a <var>Longstring</var> parameter. | In this example, if <code>%pad</code> had a value longer than a single byte, the request would be cancelled. This, in spite of the fact that the parameter behaves like a <var>Longstring</var> parameter. | ||
==Longstring performance== | ==Longstring performance== | ||
The first 255 bytes of <var>Longstrings</var> are always kept in STBL, so the code path for manipulating a <var>Longstring</var> variable with a value that is shorter than 256 bytes is usually identical to or only slightly greater than the code path for manipulating a regular String variable. A <var>Longstring</var> variable always has 257 bytes of STBL allocated for it at compile time, and it requires somewhat more VTBL space than a regular String variable. <var>Longstring</var> arrays require 257 bytes of STBL per element and some VTBL space per element. This is unlike regular String variables, which require no per-element VTBL space. | The first 255 bytes of <var>Longstrings</var> are always kept in STBL, so the code path for manipulating a <var>Longstring</var> variable with a value that is shorter than 256 bytes is usually identical to or only slightly greater than the code path for manipulating a regular <var>String</var> variable. A <var>Longstring</var> variable always has 257 bytes of STBL allocated for it at compile time, and it requires somewhat more VTBL space than a regular <var>String</var> variable. <var>Longstring</var> arrays require 257 bytes of STBL per element and some VTBL space per element. This is unlike regular <var>String</var> variables, which require no per-element VTBL space. | ||
Yet because of the minor code path issues and the table space issues just mentioned, it is probably not a good idea to use <var>Longstring</var> variables in contexts where the values are never expected to exceed 255 bytes, unless performance is not a major concern, or unless the extra error detection for <var>Longstring</var> truncation is desired. | Yet because of the minor code path issues and the table space issues just mentioned, it is probably not a good idea to use <var>Longstring</var> variables in contexts where the values are never expected to exceed 255 bytes, unless performance is not a major concern, or unless the extra error detection for <var>Longstring</var> truncation is desired. | ||
Line 284: | Line 293: | ||
Of course, variables that need to hold more than 255 bytes of data must be declared as <var>Longstrings</var>, and any data beyond 255 bytes gets stored in CCATEMP. This means manipulation of very long <var>Longstring</var> variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization. In addition, very long <var>Longstring</var> values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in <var>Longstrings</var> in applications; quite the contrary. <var>Longstrings</var> are designed for applications that require long values, and the performance of <var>Longstring</var> manipulation, even for very long values, will generally be pretty good. | Of course, variables that need to hold more than 255 bytes of data must be declared as <var>Longstrings</var>, and any data beyond 255 bytes gets stored in CCATEMP. This means manipulation of very long <var>Longstring</var> variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization. In addition, very long <var>Longstring</var> values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in <var>Longstrings</var> in applications; quite the contrary. <var>Longstrings</var> are designed for applications that require long values, and the performance of <var>Longstring</var> manipulation, even for very long values, will generally be pretty good. | ||
Nevertheless, it is a good idea to avoid unnecessary, very long, <var>Longstring</var> operations | Nevertheless, it is a good idea to avoid unnecessary, very long, <var>Longstring</var> operations — unnecessary because the application does not require it, or because the operation has already been performed once. Regarding the latter, if a very long <var>Longstring</var> operation occurs in a loop, it would be better to move the operation outside the loop if possible, or to only do it conditionally if it's really required and hasn't already been performed in a previous iteration of the loop. | ||
There is relatively little space overhead for the part of a <var>Longstring</var> that resides in CCATEMP - 6124 of the 6144 bytes on each CCATEMP page actually hold data. So the first 255 bytes of a 60,000 byte long <var>Longstring</var> value are stored in STBL, and the remaining 60,000-255 bytes are stored on (60,000-255)/6124, or 10 CCATEMP pages. | There is relatively little space overhead for the part of a <var>Longstring</var> that resides in <code>CCATEMP - 6124</code> of the 6144 bytes on each CCATEMP page actually hold data. So the first 255 bytes of a 60,000 byte long <var>Longstring</var> value are stored in STBL, and the remaining <code>60,000-255</code> bytes are stored on <code>(60,000-255)/6124</code>, or 10 CCATEMP pages. Intermediate results will also use some CCATEMP space, though this usage will typically be short-lived — the space being released as soon as the statement completes. So, for example, if <code>%a</code> and <code>%b</code> are <var>Longstring</var> variables each with 90,000 bytes of data, and <code>%c</code> is a <var>Longstring</var> variable, the following statement will temporarily require an extra 120,000 bytes of space (255 of them in STBL) to hold the result of the <code>%a with %b</code> operation: | ||
<p class="code">%c = $lstr_substr(%a with %b, 60000, 60000) | <p class="code">%c = $lstr_substr(%a with %b, 60000, 60000) | ||
</p> | </p> | ||
Because concatenation of one string to another is such a common operation, assignment of the concatenation of a <var>Longstring</var> variable and another string to the first <var>Longstring</var> variable is highly optimized. For example, if <code>%long</code> is a <var> | Because concatenation of one string to another is such a common operation, assignment of the concatenation of a <var>Longstring</var> variable and another string to the first <var>Longstring</var> variable is highly optimized. For example, if <code>%long</code> is a <var>Longstring</var> with 50,000 bytes of data: | ||
<p class="code">%long = %long with '!' | <p class="code">%long = %long with '!' | ||
</p> | </p> | ||
Line 301: | Line 310: | ||
In this case, the concatenation of the plus sign and the current time are assigned to an intermediate work <var>Longstring</var>. Then, because this intermediate value is simply being concatenated with <code>%long</code> and then assigned back to <code>%long</code>, the concatenation optimization results in the intermediate work <var>Longstring</var> simply being tacked on to the end of <code>%long</code>, requiring almost no data movement, at all. Even in cases where the concatenation can't be optimized to an append operation, it is usually a good idea to isolate concatenations involving relatively small values from a preceding one involving a (potentially) very long one. | In this case, the concatenation of the plus sign and the current time are assigned to an intermediate work <var>Longstring</var>. Then, because this intermediate value is simply being concatenated with <code>%long</code> and then assigned back to <code>%long</code>, the concatenation optimization results in the intermediate work <var>Longstring</var> simply being tacked on to the end of <code>%long</code>, requiring almost no data movement, at all. Even in cases where the concatenation can't be optimized to an append operation, it is usually a good idea to isolate concatenations involving relatively small values from a preceding one involving a (potentially) very long one. | ||
For example, if a <var> | For example, if a <var>Longstring</var> with a potentially large value is being bracketed by the date and time, using a greater-than and less-than symbol as separators, the following: | ||
<p class="code">%long = $date with '>' with %long with ('<' with $time) | <p class="code">%long = $date with '>' with %long with ('<' with $time) | ||
</p> | </p> | ||
Line 307: | Line 316: | ||
<p class="code">%long = $date with '>' with %long with '<' with $time | <p class="code">%long = $date with '>' with %long with '<' with $time | ||
</p> | </p> | ||
[[Category:SOUL]] | |||
[[Category:Overviews]] | [[Category:Overviews]] | ||
[[Category:User Language syntax enhancements]] | [[Category:User Language syntax enhancements]] |
Latest revision as of 20:42, 27 March 2015
As of Model 204 version 7.5, Longstrings appear as a native Model 204 datatype and are defined in the same way as other variable datatypes:
%name is longstring
Longstring variables are largely interchangeable with String variables, with the exception that a Longstring can have a length up to 2**31-1 bytes, while String variables have a maximum length of 255 bytes. The Variables Are statement and the VTYPE parameter do not allow Longstring to be set as a default type, so all Longstring variables must be explicitly declared as such. Longstring variables can be defined as Common and as subroutine parameters, but there is currently no support for Static Longstring variables. Longstrings may be specified in an Initial clause.
Like other %variables, a Longstring cannot be declared as Global on its declaration. However, a Longstring %variable can be dynamically bound to a global Longstring with the $Lstr_global function, and it can be dynamically bound to a session global Longstring with the $Lstr_session function.
The value of a global or session Longstring can also be retrieved with $Lstr_global_get or $Lstr_session_get, and it can be updated with $Lstr_global_set or $Lstr_session_set.
Longstrings can also be declared as arrays:
%heaps is longstring array(10)
The Longstring datatype is not supported inside images. However, image items with length greater than 255 are now supported:
image foo bar is string len 300 end image
While such image items can't have arbitrary lengths up to 2**31-1 like other Longstring variables, they exhibit the same behavior as other Longstring variables in request cancellation in the case of truncation, and in upgrading With operations to Longstring With operations.
While it might be tempting to redefine many or all String Len 255
variables as Longstring, there are a few subtle issues discussed in this chapter that might result in problems should this be done. This is not to say that many such variables shouldn't be converted to Longstring, but it might not be as simple as a one-line editing change.
Truncation
One key difference between a Longstring and a regular String is the default behavior of Longstring truncation: any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation. Two examples of the application of this rule follow:
- An assignment to a String variable from a Longstring results in request cancellation if the value of the Longstring exceeds the declared String length. This cancellation can happen even if the Longstring is less than 255 bytes long. If, say, variable
%short
were defined asString Len 55
, and a Longstring variable called%long
contained 60 bytes of data, an assignment like the following results in request cancellation:%short = %long
Yet, you can successfully use an intermediate assignment to a
String Len 255
variable (called%medium
in the following example) followed by the assignment of that variable to%short
:%medium = %long %short = %medium
As a result, the last five bytes of the value originally held in
%long
are silently truncated and assigned to%short
.Of course, since a regular String can never be longer than 255 bytes, any assignment from a Longstring longer than 255 bytes to a regular String will result in request cancellation. There are several ways around this problem, but the simplest is to use the $Str function to silently truncate a Longstring at 255 bytes or whatever is required for assignment to its target. Effectively, the $str function tells Model 204 to treat the Longstring as it would a regular String for truncation purposes, and the assignment succeeds:
%short = $str(%long)
- Although the Longstring datatype is not supported inside images, you can assign from a Longstring to an image item. However, assigning to an image item a Longstring variable that has a value that
ends with one or more of the target image item's Pad character (which defaults to the space character) where the target image item is not NoStrip results in an implicit truncation — the trailing pad characters are effectively removed. Since implicit truncation of a Longstring value on assignment is not allowed, this results in request cancellation.
For example, the following request, which prints the result
They're different
, shows the image item truncation for an assignment from a String:begin %str is string len 8 image foo x is string len 8 end image prepare image foo %str = 'Blank ' %foo:x = %str if %foo:x ne %str then print 'They''re different' end if end
If
%str
is declared as a Longstring above, however, the request is cancelled by a Longstring truncation error. But if%str
is declared as a Longstring, and if%foo:x = %str
is replaced by%foo:x = $str(%str)
, the request succeeds.
Using $str to correct for this Longstring truncation behavior is not always appropriate, though. The use of $str might be viewed as a continuation of the dubious Model 204 programming practice of truncation by assignment, so it might be avoided or at least used as a last result as a matter of policy. In fact, converting many String variables to Longstring might be viewed as a way of detecting possible unintentional truncation in existing applications, although there are some subtle issues one should be aware of before embarking on such an enterprise.
For additional discussion of these truncation issues, see Changing Longstring truncation behavior.
Longstrings in expressions
Like Strings, a Longstring variable can be used in SOUL expressions, as operands or as input to $functions. Longstring variables can also be used as input to intrinsic methods (as can any other string or numeric datatype).
One important point to keep in mind is that Model 204's expression processing behavior is not changed at all unless Longstring variables or $functions are used, and then only changed in the statements where they are actually used. So the effect of any use of Longstring variables or $functions is limited to the statements that use them.
Concatenation: the With operator
SOUL expressions can have embedded sub-expressions or simply expressions. For example, in
%x = %a with %b with %c
the expression %a with %b
is evaluated and an intermediate result is produced. This intermediate result is then used as the first operand in a With operation with %c
. With no Longstrings involved, string expressions are silently truncated at 255 bytes, including when producing an intermediate result. So, in the above example, if %a
and %b
were each 200 bytes long, the intermediate result of %a with %b
would be truncated at the 55th byte of %b
, and the with %c
would simply drop %c
, since the intermediate result that was the first operand of the with %c
would already be 255 bytes long. In this case, %x
would end up containing all of %a
, the first 55 bytes of %b
, and none of %c
. Fortunately, the results would be the same even if the expression were written as follows:
%x = %a with (%b with %c)
It is still worth working this out mentally to develop a good feel for how intermediate expression results are processed in SOUL.
In any case, the With operation behaves differently in the presence of Longstrings. Specifically, if either operand of a With operation is a Longstring, the intermediate result of the operation is also a Longstring. If, in the above example, %a
is a Longstring and %b
and %c
are regular Strings, the result of %a with %b
is a 400-byte Longstring. When this 400-byte intermediate result Longstring is then concatenated using the With operation on %c
, the result is a Longstring of length 400 plus the length of %c
. If the target of this expression, %x
, is a regular String, this causes a request-cancelling truncation error.
In addition, if the target of a With operation is a Longstring, the With operation produces a Longstring result, even if none of the operands are themselves Longstrings. For example, if %x
is a Longstring, and %a
and %b
are String Len 255
, each with 200 bytes of data:
%x = %a with %b
%x
will be 400 bytes long, containing all of %a
concatenated with all of %b
. If either of the operands of such a With clause is itself an expression, that expression is treated as if its target were also a Longstring. For example, if in
%x = %a with (%b with %c)
%x
is a Longstring, and %a
, %b
, and %c
are String Len 255
, each with 200 bytes of data, %x
will end up being 600 bytes long, containing all of %a
concatenated with all of %b
with all of %c
. This works the same way if the assignment is written as either of the following:
%x = (%a with %b) with %c %x = %a with %b with %c
Expression processing is the same for string literals, so if %x
is a Longstring, and %a
is a String Len 255
with 255 bytes of data, the following assigns 258 bytes to %x
:
%x = %a with '...'
Another way of looking at this is that in the presence of Longstring variables, whether as the target or as one of the operands, all concatenation operations are "upgraded" to be Longstring concatenations. One side-effect of this is that if an operand of a concatenation is a Longstring, Longstring truncation rules apply to the ultimate target of the assignment. For example, %long
is a Longstring containing 'Testing...'
, and %short
is a String Len 12
:
%short = (%long with '123') with '456'
The result is a request-cancelling truncation error, because the result of all the concatenation operations is treated as a Longstring, albeit one with less than 255 bytes of data. The cancellation can be avoided with the use of the $Str function, as in the following:
%short = $str((%long with '123') with '456')
Though, again, this is simply carrying on the dubious SOUL programming practice of truncation by assignment.
Note that the "upgrading" of With operations to Longstring With operations is not induced by a Longstring variable or expression inside a $function call. For example, %long
is a Longstring with 30 bytes of data, and %short
is String Len 10
:
%short = '*' with $substr(%long, 1, 20)
%short
ends up containing an asterisk followed by the first 9 bytes of %long
. The assignment is made with silent truncation, because the result of a non-longstring-capable $function is always treated as a regular String for the purposes of assignment and With processing.
Numeric conversion
In a context where a Longstring is automatically converted to a numeric datatype, a request-cancelling truncation error occurs if the Longstring variable is longer than 255 bytes, even if most or all of these bytes are leading zeros. For example, %long
is a Longstring with 300 zeros followed by a one:
%a = %long + 1
The result is a request-cancelling truncation error. Fortunately, it's not likely that one is likely to encounter numbers with greater than 255 digits in them. Longstring data used in a numeric context will undergo the dubious automatic conversion of invalid numeric data into a zero in the same way as String data.
If the result of a numeric operation on a Longstring is then used in a With operation, the With operation is not upgraded to a Longstring With operation, because the intermediate result of the numeric operation is not a Longstring but a numeric, which is then automatically converted to a String intermediate result. For example, %long
is a Longstring containing 99
, and %short
is String Len 2
:
%short = %long + 1
The result is not a request cancellation; instead, a M204.0552: VARIABLE TOO SMALL FOR RESULT
message is issued, and an asterisk ( * ) is assigned to %short
. Similarly, with these definitions and values:
%short = (%long + 1) with '*'
The result is a 10
being assigned to %short
with no warnings, exactly the behavior if %long
were a String Len 255
.
Longstrings not allowed as index %variable in For statement
One case of automatic conversion to numeric where String and Longstring behaviors differ is index loop control variables. For example, the following loop is valid if %s
is a String, but it results in a compilation error if %s
is a Longstring:
for %s from 1 to 2 print %s end for
Comparisons
Comparison operations such as Eq, Lt, Le, >, <, etc. will perform Longstring comparisons if either of the operands is a Longstring, that is, comparison operations involving Longstring operands behave pretty much as expected.
Longstrings and $functions
Longstrings can be used as inputs to $functions. As mentioned before, if a Longstring expression is assigned to a regular String, a request-cancelling truncation error will occur if the target String variable is not big enough to hold the source Longstring. Request-cancelling truncation errors also occur if a Longstring that is longer than 255 bytes is passed to a non-Longstring-capable $function. For example:
print $substr(%long, 1, 50)
would result in request cancellation if %long
was longer than 255 bytes. One way around this would be to use the $str function to tell SOUL to treat the Longstring as a String in this case as in:
print $substr($str(%long), 1, 50)
though a better approach in this case would be to use the Longstring-capable sub-stringing function, $Lstr_Substr, as in:
print $lstr_substr(%long, 1, 50)
The Longstring-capable $functions in this manual typically start with "$lstr", end in "_lstr" (such as $ListInf_Lstr), or belong to a family of $functions (such as the $Regex family) that are completely Longstring-capable. Longstring-capable $functions specific to other Sirius products (like the Janus Web Server and Janus Sockets $functions) typically do not use an "lstr" prefix or suffix, but they are identified in their documentation as Longstring-capable.
In addition to their ability to process more than 255-byte long strings, Longstring-capable $functions have some special characteristics pertaining to expression handling:
- A Longstring-capable $function that returns a string result (as opposed to one that returns a numeric result such as $Lstr_Index) is treated as a Longstring expression for the purposes of truncation and for the upgrading of With operations to Longstring With operations. For example, if
%short
isString Len 5
and%junk
containsSome text
:%short = $lstr_substr(%junk, 1, 7)
would result in a request-cancelling truncation error. This is true whether
%junk
was a Longstring or a regular String, though the latter illustrates the point that regular String variables (or expressions) can be used as input to Longstring-capable $functions. If%junk
contained 300 bytes of data:%out = $lstr_substr(%junk, 1, 255) with '*'
would result in a request-cancelling truncation error if
%out
were a regular String variable, and would result in 256 bytes, the last byte being an asterisk, being assigned to%out
if%out
were a Longstring. - All string arguments to Longstring-capable $functions are treated as Longstring targets for the purpose of upgrading With operations to Longstring With operations. For example, since $Lstr_Right is Longstring-capable, With in its string argument is upgraded to Longstring. So, if
%medium
is a string containing 252 or more characters, then:$lstr_right(%medium with '****', 256)
returns the right-most 252 bytes of
%medium
, concatenated with four asterisks.Note: This behavior does not imply that Longstring-capable $functions will always accept strings longer than 255 bytes as their arguments. For example, $Lstr_Index will not accept strings longer than 255 bytes as its second argument (the string being searched for), and $Lstr_Right and $Lstr_Left won't accept any strings longer than a single byte for their third argument (the pad character). This $function-specific behavior does not affect the treatment of the $function results or arguments as Longstring data for expression handling purposes.
Longstrings and complex subroutines
Complex subroutine parameters, both Input and Output (or InOut, which means the same thing as Output) can be defined as Longstring, as in either of the following:
subroutine chop(%x is longstring input) subroutine chop(%x is longstring output)
In addition, Longstring variables and expressions can be passed as parameters to complex subroutines. For Output parameters, Longstring issues are fairly straightforward. There are two restrictions:
- You cannot pass a Longstring as a parameter to a subroutine that defines the parameter as String Output.
- You cannot pass a regular String as a parameter to a subroutine that defines the parameter as Longstring Output.
For Input parameters, things are somewhat more complex, because:
- Mismatches in String and Longstring datatypes are allowed between passed value and declared parameter.
- Input parameters can actually receive the results of expressions as their inputs.
While for Input parameters, Strings and Longstrings may be passed interchangeably as Longstring and String parameters, subroutine declaration statements (Declare Subroutine) must exactly match the parameter types on the actual subroutine definitions. That is, given a declaration like this:
declare subroutine tender(longstring)
One cannot later specify the subroutine as
subroutine tender(%mercy is string len 255)
If a Longstring parameter is passed to a subroutine with the parameter defined as String Input, the request is cancelled if the Longstring value is longer than the length of the String Input parameter (as always, this will happen even if the Longstring value is shorter than 255 bytes). This mimics the behavior of an assignment of a Longstring variable to a regular String variable.
If a Longstring array is passed to a subroutine with the parameter defined as a String Array, the request is cancelled if any element of the Longstring array is longer than 255 bytes, whether or not that element is ever referenced in the complex subroutine. Outside the functionality issues raised by this limitation, it also suggests an inefficiency in passing a Longstring array to a String parameter: the inefficiency of scanning the array for values longer than 255 bytes. Because of both the functionality and efficiency issues, it is probably best to avoid passing a Longstring array to a String array parameter if at all possible.
Because a String variable or a literal can always fit into a Longstring parameter, there are no truncation or other issues associated with passing String variables and literals as parameters defined as Longstring.
If a call to a complex subroutine contains a With operation for a Longstring parameter, that With operation is “upgraded” to a Longstring With operation, whether or not any of the operands are themselves Longstrings, exactly as if the target of a With operation were a Longstring variable. As everywhere else, a With operation involving a Longstring in a subroutine call will also be upgraded to a Longstring With operation, meaning that no truncation will occur at 255 bytes, and that if the result is longer than the length of the target String parameter, the request will be cancelled.
Changing Longstring truncation behavior
While it is sometimes convenient that Model 204 silently truncates string data on assignment to a variable or intermediate result, it has also been the source of a vast number of incorrect User Language programs. Because of this history and the higher chance of unintentional truncation from a Longstring source, the default behavior for Longstrings is that any truncation on assignment from a Longstring, Longstring $function, or Longstring With operation causes request cancellation. This behavior should facilitate "cleaner" and more robust code — where truncation is intended, it is explicitly indicated (for example, with $Lstr_Substr, $Lstr_Left, or $Str).
Nevertheless, since this cancellation on truncation behavior is inconsistent with Model 204's behavior for strings, it might be viewed as undesirable. If you want to prevent request continuation on truncation of a Longstring source in an Online, you can MSGCTL the error message for Longstring truncation to NOCAN.
The three messages that you might need to MSGCTL are MSIR.0680, MSIR.0681, and MSIR.0682.
- MSIR.0680 is issued if the SIRFACT system parameter X'01' bit is set, or if the Model 204 DEBUGUL user parameter is set to a non-zero value.
- MSIR.0681 is issued for requests entered at command level rather than run from a procedure.
- MSIR.0682 is issued otherwise.
Issuing MSGCTL for these messages to NOCAN might prevent request cancellation from the occasional Longstring truncation, but if silent truncation of Longstrings is heavily used as a programming “technique” inside a request, the user running the request will quickly be restarted with a “TOO MANY ERRORS” message. To prevent this, MSGCTL the indicated messages to NOCOUNT.
Even then, a large number of these messages might be viewed as being annoying, at best, if the intent is to simply ignore silent truncation of Longstrings. In that case, MSGCTL the indicated messages to NOTERM and maybe even NOAUDIT (if this latter is available). Even then, there will be a little Model 204 processing overhead in producing the messages that are everywhere suppressed, so it would still generally be more efficient to truncate Longstrings explicitly using $Str, $Lstr_Substr or $Lstr_Left.
If you use the default Longstring behavior, at least in the development and test environments, you should find it will rapidly catch potential problems and so produce more bug-free code. The request cancellation due to Longstring truncation should therefore be a benefit. In those places that “truncation by assignment” is used in the code, if you change any of the types in the source expression and discover request cancellation, you will probably decide it is better to use an explicit truncation construct, rather than to retain this dubious coding practice.
If there is concern about request cancellation in a production region, you can MSGCTL the indicated messages to NOTERM in production. However, such a switch allows a production request to continue after an unanticipated Longstring truncation, so it could result in data corruption or a more subtle error later in the request that will cause request cancellation anyway, but be more difficult to diagnose.
Longstrings and the Print, Html, and Text statements
Using Longstring expressions in the Print statement works largely "as expected": given the constraints of LOBUFF, OUTCCC, and OUTMRL, and other output target specific parameters — the values of Longstrings are simply displayed to the output target. One minor exception to this is that the To clause on the Print statement is not supported for Longstrings.
It should also be kept in mind that the With keyword in Print statements is not the With concatenation operator, although the result is usually the same as if it were. Specifically, the With keyword results in the part before the With being printed, followed by the part after. This means that if two regular String variables, each with 255 bytes of data in them, are printed as follows:
print %a with %b
510 bytes of data would be printed, which is different from the With operator in an assignment like the following, which will result in %c
simply containing the contents of %a
, because the With operation results in truncation at 255 bytes:
%c = %a with %b
This difference between the With keyword in the Print statement and the With operator in expressions predates Longstrings and is, in fact, more significant with regular Strings than with Longstrings.
The HTML and Text statements allow variable values or expression results to be embedded inside the expression start and end characters (defaults: { and }). As with the Print statement, this works pretty much “as expected” for Longstrings: the contents of the Longstring variable or the result of a Longstring expression will be displayed in their entirety within display parameter constraints. The only Longstring related issue for HTML statement expressions is that if an expression is not a Longstring variable, a Longstring $function, or a With operation involving one or more of these, the expression is assumed to be a regular String expression that undergoes silent truncation at 255 bytes. For example, if %a
and %b
were regular String variables both containing 200 bytes of data, the following would truncate the concatenation of %a
and %b
at 255 bytes:
text data The result is {%a with %b}
To get around this, one can force the With operation to be upgraded to a Longstring With operation, using:
text data The result is {$lstr(%a with %b)}
However, the use of With operations in Html statements is generally silly, since the same result can be obtained by simply entering each operand in the With expression as a separate expression as in:
text data The result is {%a}{%b}
Longstrings and methods
In addition to their use as local variables and as inputs to or outputs from $functions and complex subroutines, Longstrings can, of course, also be used in the object-oriented constructs made available by SOUL. These uses include:
- As structure or class members.
- As input parameters to both user-defined and system methods.
- As the result (output value) of SOUL and system methods.
In fact, all system methods are Longstring-capable, so they behave, for the purposes of truncation and upgrading of With operations, the same as Longstring-capable $functions. Therefore any With operation whose result is an input to a system method causes the With operation to be upgraded to a Longstring With operation. Similarly, any implicit truncation of the result of a system method results in request cancellation.
User-defined methods, on the other hand, can declare their inputs and output as Longstrings or Strings of a specific length. Longstring inputs and results exhibit the same truncation and With operation behavior as string inputs to system methods. For example, consider the following function declaration in some class:
function encode(%in is longstring) is longstring
If the method is invoked as follows:
%x = %foo:encode(%a with %b)
the %a with %b
is upgraded to a Longstring With operation, because its target (the %in
parameter in the Encode
function) is a Longstring. Similarly, if %x
is a standard String variable with some specific length, the request will be cancelled if the result of the Encode
method is longer than %x
's declared length.
String inputs and output, on the other hand, will behave like standard String variables for the purposes of truncation and With operation behavior. For example, consider the following function declaration in some class:
function stubby(%in is string len 4) is string len 2
If the method is invoked as follows:
%x = %foo:stubby(%a with %b)
the %a with %b
is not upgraded to a Longstring With operation because its target (the %in
parameter in the Stubby
function) is not a Longstring. Of course, if either %a
or %b
is a Longstring, then the With operation will be a Longstring With operation, anyway. If neither %a
nor %b
is a Longstring and %a
contains foo
and %b
contains bar
, the result of the With operation would be foobar
which would be silently truncated to foob
when assigned to the input parameter %in
.
Similarly, if %x
is a String Len 1
, and the Stubby
method returns ok
, the ok
would be silently truncated to o
when assigned to %x
. In fact, if the Stubby
method had the following statement:
return 'Not OK'
the return value would be silently truncated to No
before being assigned to the target variable, even if the target variable for the Stubby
invocation was longer than two bytes. On the other hand, if the Stubby
method had the following statement:
return %schooner
and %schooner
was a Longstring with a value longer than two bytes, the request would be cancelled because of Longstring trucation, even if the target variable for the Stubby invocation was, itself, a Longstring.
Finally, support for intrinsic methods was introduced. As for all other system method inputs, intrinsic String system methods all behave as if their method string was a Longstring. For example, in:
%x = (%a with %b):right(40, pad='*')
the %a with %b
would be upgraded to a Longstring With, even if neither %a
nor %b
were a Longstring.
For intrinsic and other methods, the fact that all string inputs are treated as Longstrings does not mean that the method will necessarily accept arbitrarily long values. In fact, it's quite possible for a parameter to be restricted to being a single character. For example the intrinsic String Right method has a named parameter called Pad that cannot be longer than one byte:
%y = %x:right(50, pad=%pad)
In this example, if %pad
had a value longer than a single byte, the request would be cancelled. This, in spite of the fact that the parameter behaves like a Longstring parameter.
Longstring performance
The first 255 bytes of Longstrings are always kept in STBL, so the code path for manipulating a Longstring variable with a value that is shorter than 256 bytes is usually identical to or only slightly greater than the code path for manipulating a regular String variable. A Longstring variable always has 257 bytes of STBL allocated for it at compile time, and it requires somewhat more VTBL space than a regular String variable. Longstring arrays require 257 bytes of STBL per element and some VTBL space per element. This is unlike regular String variables, which require no per-element VTBL space.
Yet because of the minor code path issues and the table space issues just mentioned, it is probably not a good idea to use Longstring variables in contexts where the values are never expected to exceed 255 bytes, unless performance is not a major concern, or unless the extra error detection for Longstring truncation is desired.
Of course, variables that need to hold more than 255 bytes of data must be declared as Longstrings, and any data beyond 255 bytes gets stored in CCATEMP. This means manipulation of very long Longstring variables could result in significant logical and even physical CCATEMP I/O and higher CCATEMP utilization. In addition, very long Longstring values means large quantities of data need to be scanned or copied, which in itself could be a source of CPU overhead. This is not to say that long values should not be used in Longstrings in applications; quite the contrary. Longstrings are designed for applications that require long values, and the performance of Longstring manipulation, even for very long values, will generally be pretty good.
Nevertheless, it is a good idea to avoid unnecessary, very long, Longstring operations — unnecessary because the application does not require it, or because the operation has already been performed once. Regarding the latter, if a very long Longstring operation occurs in a loop, it would be better to move the operation outside the loop if possible, or to only do it conditionally if it's really required and hasn't already been performed in a previous iteration of the loop.
There is relatively little space overhead for the part of a Longstring that resides in CCATEMP - 6124
of the 6144 bytes on each CCATEMP page actually hold data. So the first 255 bytes of a 60,000 byte long Longstring value are stored in STBL, and the remaining 60,000-255
bytes are stored on (60,000-255)/6124
, or 10 CCATEMP pages. Intermediate results will also use some CCATEMP space, though this usage will typically be short-lived — the space being released as soon as the statement completes. So, for example, if %a
and %b
are Longstring variables each with 90,000 bytes of data, and %c
is a Longstring variable, the following statement will temporarily require an extra 120,000 bytes of space (255 of them in STBL) to hold the result of the %a with %b
operation:
%c = $lstr_substr(%a with %b, 60000, 60000)
Because concatenation of one string to another is such a common operation, assignment of the concatenation of a Longstring variable and another string to the first Longstring variable is highly optimized. For example, if %long
is a Longstring with 50,000 bytes of data:
%long = %long with '!'
would simply tack an exclamation mark on the end of %long
rather than copying all of %long
and an exclamation mark and then assigning that string to %long
. Note, however, that this optimization is only performed if a single string is being concatenated with the current value of the target variable. That is, in the following:
%long = %long with '+' with $time
an intermediate Longstring containing the concatenation of %long
and a plus sign will be created. That intermediate Longstring will then be concatenated with the current time (as returned by $time) and then assigned to %long
. This means that the current contents of %long
end up being copied twice in such a case which, if %long
contains 50,000 bytes, is 100,000 bytes worth of data movement which will be quite expensive, by any standard. Fortunately, it is easy to “help out” the compiler to make this operation more efficient:
%long = %long with ('+' with $time)
In this case, the concatenation of the plus sign and the current time are assigned to an intermediate work Longstring. Then, because this intermediate value is simply being concatenated with %long
and then assigned back to %long
, the concatenation optimization results in the intermediate work Longstring simply being tacked on to the end of %long
, requiring almost no data movement, at all. Even in cases where the concatenation can't be optimized to an append operation, it is usually a good idea to isolate concatenations involving relatively small values from a preceding one involving a (potentially) very long one.
For example, if a Longstring with a potentially large value is being bracketed by the date and time, using a greater-than and less-than symbol as separators, the following:
%long = $date with '>' with %long with ('<' with $time)
will be more efficient than
%long = $date with '>' with %long with '<' with $time