RegexReplaceCorresponding (Stringlist function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
m (syntax diagram, tags and links)
Line 1: Line 1:
{{Template:Stringlist:RegexReplaceCorresponding subtitle}}
{{Template:Stringlist:RegexReplaceCorresponding subtitle}}
 
<p>This method searches a given string for matches to one of multiple regular expressions contained in a list, and it replaces found matches with or according to a string contained in a list that corresponds to the regex list.</p><p>The regex list items are treated as mutually exclusive alternatives, and the function stops as soon as an item matches and the replacement is made. A "global" option is also available to continue searching and replacing within the given string using the matching regex item until no more matches are found.</p><p><var>RegexReplaceCorresponding</var> uses the rules of regular expression matching (information about which is provided in [[Regex processing]]).</p><p><var>RegexReplaceCorresponding</var> accepts two required and two optional arguments, and it returns a string.</p>
This method searches a given string for matches to one of multiple regular expressions contained in a list, and it replaces found matches with or according to a string contained in a list that corresponds to the regex list. The method is available as of Version 6.9 of the <var class=product>Sirius Mods</var>. The regex list items are treated as mutually exclusive alternatives, and the function stops as soon as an item matches and the replacement is made. A "global" option is also available to continue searching and replacing within the given string using the matching regex item until no more matches are found. <var>RegexReplaceCorresponding</var> uses the rules of regular expression matching (information about which is provided in [[Regex processing]]). <var>RegexReplaceCorresponding</var> accepts two required and two optional arguments, and it returns a string. Specifying an invalid argument results in request cancellation.


==Syntax==
==Syntax==
Line 7: Line 6:
===Syntax terms===
===Syntax terms===
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>outStr</th>
<tr><th>outString</th>
<td>A string set to the value of '''inStr''' with each matched substring replaced by the value of the '''replacementList''' item that corresponds to the matching '''%regList''' item.</td></tr>
<td>A string set to the value of <var class="term">inString</var> with each matched substring replaced by the value of the <var class="term">replacementList</var> item that corresponds to the matching <var class="term">sl</var> item.</td></tr>
<tr><th>%regList</th>
<tr><th>sl</th>
<td>A <var>Stringlist</var> object whose items are interpreted as regular expressions and applied to the '''inStr''' value.</td></tr>
<td>A <var>Stringlist</var> object whose items are interpreted as regular expressions and applied to the <var class="term">inString</var> value.</td></tr>
<tr><th>inStr</th>
<tr><th>inString</th>
<td>The input string, to which the regular expressions in '''%regList''' are applied.</td></tr>
<td>The input string, to which the regular expressions in <var class="term">sl</var> are applied.</td></tr>
<tr><th>replacementList</th>
<tr><th>replacementList</th>
<td>A <var>Stringlist</var>, each of whose items is a potential replacement string for the substring of '''inStr''' that is matched by the corresponding item of '''%regList'''. Except when the <tt>.A</tt> option is specified (as described below for the Options argument), you can include <tt>.$0</tt> markers in '''replacementList''' items as placeholders for the substring of '''inStr''' that the item matches. <tt>.xxx$0</tt> is an example of a valid replacement string, and <tt>.xxx</tt> concatenated with the portion of '''inStr''' that gets matched (by the corresponding '''%regList''' item) constitute the replacement string. Any character after the dollar sign other than a zero is an error. Multiple zeroes (as many as 9) are permitted; a digit following such a string of zeroes must be escaped. You can also use the format <tt>.$m0</tt>, where '''m''' is one of the following modifiers:
<td><p>A <var>Stringlist</var>, each of whose items is a potential replacement string for the substring of <var class="term">inString</var> that is matched by the corresponding item of <var class="term">sl</var>.</p><p>Except when the <code>A</code> option is specified (as described below for the Options argument), you can include <code>$0</code> markers in <var class="term">replacementList</var> items as placeholders for the substring of <var class="term">inString</var> that the item matches.</p><p><code>xxx$0</code> is an example of a valid replacement string, and <code>xxx</code> concatenated with the portion of <var class="term">inString</var> that gets matched (by the corresponding <var class="term">sl</var> item) constitute the replacement string.</p><p>Any character after the dollar sign other than a zero is an error. Multiple zeroes (as many as 9) are permitted; a digit following such a string of zeroes must be escaped.</p><p>You can also use the format <code>$m0</code>, where <i>m</i> is one of the following modifiers:</p>
<table class="syntaxNested">
<table class="syntaxNested">
<tr><th>U or u</th>
<tr><th>U or u</th>
Line 21: Line 20:
<td>Indicates that the matched substring should be lowercased when inserted.</td></tr>
<td>Indicates that the matched substring should be lowercased when inserted.</td></tr>
</table>
</table>
The only characters you can escape in a replacement string are dollar sign (<tt>.$</tt>), backslash (<tt>.\</tt>), and the digits <tt>.0</tt> through <tt>.9</tt>. So only these escapes are respected:<tt>.\\</tt>, <tt>.\$</tt>, and <tt>.\0</tt> through <tt>.\9</tt>. No other escapes are allowed in a replacement string -- this includes "shorthand" escapes like <tt>.\d</tt> -- and an "unaccompanied" backslash (<tt>.\</tt>) is an error. For example, since the scan for the number that accompanies the meta-$ stops at the first nonnumeric, you use <tt>.1$0\0</tt> to indicate that the first matched substring should go between the numbers 1 and 0 in the replacement string.</td></tr>
<p>The only characters you can escape in a replacement string are dollar sign (<code>$</code>), backslash (<code>\</code>), and the digits <code>0</code> through <code>9</code>. So only these escapes are respected:<code>\\</code>, <code>\$</code>, and <code>\0</code> through <code>\9</code>. No other escapes are allowed in a replacement string -- this includes "shorthand" escapes like <code>\d</code> -- and an "unaccompanied" backslash (<code>\</code>) is an error.</p><p>For example, since the scan for the number that accompanies the meta-$ stops at the first nonnumeric, you use <code>1$0\0</code> to indicate that the first matched substring should go between the numbers <code>1</code> and <code>0</code> in the replacement string.</p></td></tr>
<tr><th><b>Options=</b> string</th>
<tr><th><b>Options</b></th>
<td>The Options argument (name required) is an optional string of options. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see [[Regex processing]].
<td>The Options argument (name required) is an optional string of <var class="term">options</var>. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see <var>[[Regex processing]]</var>.
<table class="syntaxNested">
<table class="syntaxNested">
<tr><th>I</th>
<tr><th>I</th>
<td>Do case-insensitive matching between '''string''' and '''regex'''.</td></tr>
<td>Do case-insensitive matching between <var class="term">inString</var> and <var class="term">sl</var>.</td></tr>
<tr><th>S</th>
<tr><th>S</th>
<td>Dot-All mode: a dot (<tt>..</tt>) can match any character, including carriage return and linefeed.</td></tr>
<td>Dot-All mode: a dot (<code>.</code>) can match any character, including carriage return and linefeed.</td></tr>
<tr><th>M</th>
<tr><th>M</th>
<td>Multi-line mode: let anchor characters match end-of-line indicators '''wherever''' the indicator appears in the input string. M mode is ignored if C (XML Schema) mode is specified.</td></tr>
<td>Multi-line mode: let anchor characters match end-of-line indicators <b><i>wherever</i></b> the indicator appears in the input string. <code>M</code> mode is ignored if <code>C</code> (XML Schema) mode is specified.</td></tr>
<tr><th>C</th>
<tr><th>C</th>
<td>Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information, see [[Regex processing]]. </td></tr>
<td>Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information, see [[Regex processing]]. </td></tr>
<tr><th>G</th>
<tr><th>G</th>
<td>Replace every occurrence of the match, not just (as in non-G mode) the first matched substring only.</td></tr>
<td>Replace every occurrence of the match, not just (as in non-<code>G</code> mode) the first matched substring only.</td></tr>
<tr><th>A</th>
<tr><th>A</th>
<td>Copy the '''replacement''' string as is. Do not recognize escapes; interpret a <tt>.$n</tt> combination as a literal and '''not''' as a special marker; and so on.</td></tr>
<td>Copy the <i>replacement</i> string as is. Do not recognize escapes; interpret a <code>$n</code> combination as a literal and <b><i>not</i></b> as a special marker; and so on.</td></tr>
</table>
</table>
</td></tr>
</td></tr>
<tr><th><b>Status=</b> num</th>
<tr><th><b>Status</b></th>
<td>The Status argument (name required) is optional; if specified, it is set to an integer code. These values are possible:
<td>The Status argument (name required) is optional; if specified, it is set to an integer code. These values are possible:
<table class="syntaxNested">
<table class="syntaxNested">
<tr><th>&thinsp.<i>n</i></th>
<tr><th><i>n</i></th>
<td>The number of replacements made. A value greater than 1 indicates option <tt>.G</tt> was in effect.</td></tr>
<td>The number of replacements made. A value greater than 1 indicates option <code>G</code> was in effect.</td></tr>
<tr><th>&thinsp.0</th>
<tr><th>0</th>
<td>No match: :hp1.inStr:ehp1. not matched by any :hp1.%regList:ehp1. items.</td></tr>
<td>No match: <var class="term">inString</var> not matched by any <var class="term">sl</var> items.</td></tr>
<tr><th>-2</th>
<tr><th>-2</th>
<td>Syntax or other error: for example, the number of items in '''%regList''' does not equal the number in '''replacementList'''; or a '''%regList''' item exceeds 6124 bytes; or '''%regList''' is empty.</td></tr>
<td>Syntax or other error: for example, the number of items in <var class="term">sl</var> does not equal the number in <var class="term">replacementList</var>; or a <var class="term">sl</var> item exceeds 6124 bytes; or <var class="term">sl</var> is empty.</td></tr>
<tr><th>-5</th>
<tr><th>-5</th>
<td>An invalid string in a '''replacementList''' item. For example, an invalid escape sequence, or a <tt>.$</tt> followed by any characters other than one or more (but no more than 9) zeroes.</td></tr>
<td>An invalid string in a <var class="term">replacementList</var> item. For example, an invalid escape sequence, or a <code>$</code> followed by any characters other than one or more (but no more than 9) zeroes.</td></tr>
<tr><th>-1<i>nnn</i></th>
<tr><th>-1<i>nnn</i></th>
<td>A regex in '''%regList''' is invalid.<i>nnn</i>, the absolute value of the return minus 1000, gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1. Prior to Version 7.0 of the <var class=product>Sirius Mods</var>, an invalid regex results in a Status value of <tt>.-1</tt>.</td></tr>
<td><p>A regex in <var class="term">sl</var> is invalid. <i>nnn</i> (the absolute value of the return minus 1000) gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1.</p><p>Prior to <var class="product">Sirius Mods</var> Version 7.0, an invalid regex results in a Status value of <code>-1</code>.</p></td></tr>
</table>
</table>
<p class="code"><blockquote> If you omit this argument and a negative Status value is to be returned, the run is cancelled.</blockquote></td></tr>
<b>Note:</b> If you omit this argument and a negative <var class="Term">Status</var> value is to be returned, the run is cancelled.</td></tr>
</p>
</table>
</table>


==Usage notes==
==Usage notes==
<ul><li>It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, <tt>.UTABLE LPDLST 3000</tt> and <tt>.UTABLE LSTBL 9000</tt>. For further discussion of this, see [[User Language]].<li>Items in '''%regList''' must '''not''' exceed 6124 bytes. However, the '''inStr''' value and items in '''replacementList''' may exceed 6124 bytes.<li>For information about additional methods and $functions that support regular expressions, see [[Regex processing]].</ul>
<ul><li>All errors in <var class="term">RegexReplaceCorresponding</var>, including invalid argument(s) result in request cancellation.<li>It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, <code>UTABLE LPDLST 3000</code> and <code>UTABLE LSTBL 9000</code>. For further discussion of this, see [[User Language]].<li>Items in <var  class="term">sl</var> must <b><i>not</i></b> exceed 6124 bytes. However, the <var class="term">inString</var> value and items in <var class="term">replacementList</var> may exceed 6124 bytes.<li>For information about additional methods and $functions that support regular expressions, see [[Regex processing]].<li><var class="term">RegexReplaceCorresponding</var> is available as of <var class="product">Sirius Mods</var> Version 6.9.</ul>


==Examples==
==Examples==
 
<ol><li>In the following code fragment, the second item in regex list <code>%regList</code> is the first to match the input string <code>inStr</code>. The subexpression in that item performs no special capturing function -- the parentheses are for grouping only. Since <code>%opt='g'</code> is specified, three replacements are made (using the corresponding, second, item in <code>%repList</code>):
In the following code fragment, the second item in regex list '''%regList''' is the first to match the input string '''inStr'''. The subexpression in that item performs no special capturing function -- the parentheses are for grouping only. Since <tt>.%opt='g'</tt> is specified, three replacements are made (using the corresponding, second, item in '''%repList'''):
<p class="code"> ...
 
<p class="code">...
%regList = new
%regList = new
text to %regList
text to %regList
abcx
  abcx
a(bc?)
  a(bc?)
abcd
  abcd
end text
end text


%repList = new
%repList = new
text to %repList
text to %repList
&
  &
&&
  &&
&&&
  &&&
end text
end text


%inStr = 'abc1abc2abcd'
%inStr = 'abc1abc2abcd'
%opt='g'
%opt='g'
%outStr = %regList:<var>RegexReplaceCorresponding</var> (%inStr, %repList, Options=%opt, Status=%st)
%outStr = %regList:RegexReplaceCorresponding(%inStr, %repList, Options=%opt, Status=%st)


Print 'Status from ReplaceCorresponding is ' %st
Print 'Status from ReplaceCorresponding is ' %st
Print 'Output<var>String</var>: ' %outStr
Print 'Output String: ' %outStr
...
  ...
</p>
</p>


Line 91: Line 87:


<p class="code">Status from ReplaceCorresponding is 3
<p class="code">Status from ReplaceCorresponding is 3
Output<var>String</var>: &&1&&2&&d
Output String: &&1&&2&&d
</p>
</p></ol>




==See also==
==See also==
{{Template:Stringlist:RegexReplaceCorresponding footer}}
{{Template:Stringlist:RegexReplaceCorresponding footer}}

Revision as of 22:41, 27 January 2011

Replace substrings that match regex with items in a Stringlist (Stringlist class)

This method searches a given string for matches to one of multiple regular expressions contained in a list, and it replaces found matches with or according to a string contained in a list that corresponds to the regex list.

The regex list items are treated as mutually exclusive alternatives, and the function stops as soon as an item matches and the replacement is made. A "global" option is also available to continue searching and replacing within the given string using the matching regex item until no more matches are found.

RegexReplaceCorresponding uses the rules of regular expression matching (information about which is provided in Regex processing).

RegexReplaceCorresponding accepts two required and two optional arguments, and it returns a string.

Syntax

%outString = sl:RegexReplaceCorresponding( inString, replacementList, - [Options= string], - [Status= %output]) Throws InvalidRegex

Syntax terms

outString A string set to the value of inString with each matched substring replaced by the value of the replacementList item that corresponds to the matching sl item.
sl A Stringlist object whose items are interpreted as regular expressions and applied to the inString value.
inString The input string, to which the regular expressions in sl are applied.
replacementList

A Stringlist, each of whose items is a potential replacement string for the substring of inString that is matched by the corresponding item of sl.

Except when the A option is specified (as described below for the Options argument), you can include $0 markers in replacementList items as placeholders for the substring of inString that the item matches.

xxx$0 is an example of a valid replacement string, and xxx concatenated with the portion of inString that gets matched (by the corresponding sl item) constitute the replacement string.

Any character after the dollar sign other than a zero is an error. Multiple zeroes (as many as 9) are permitted; a digit following such a string of zeroes must be escaped.

You can also use the format $m0, where m is one of the following modifiers:

U or u Specifies that the matched substring should be uppercased when inserted.
L or l Indicates that the matched substring should be lowercased when inserted.

The only characters you can escape in a replacement string are dollar sign ($), backslash (\), and the digits 0 through 9. So only these escapes are respected:\\, \$, and \0 through \9. No other escapes are allowed in a replacement string -- this includes "shorthand" escapes like \d -- and an "unaccompanied" backslash (\) is an error.

For example, since the scan for the number that accompanies the meta-$ stops at the first nonnumeric, you use 1$0\0 to indicate that the first matched substring should go between the numbers 1 and 0 in the replacement string.

Options The Options argument (name required) is an optional string of options. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see Regex processing.
I Do case-insensitive matching between inString and sl.
S Dot-All mode: a dot (.) can match any character, including carriage return and linefeed.
M Multi-line mode: let anchor characters match end-of-line indicators wherever the indicator appears in the input string. M mode is ignored if C (XML Schema) mode is specified.
C Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information, see Regex processing.
G Replace every occurrence of the match, not just (as in non-G mode) the first matched substring only.
A Copy the replacement string as is. Do not recognize escapes; interpret a $n combination as a literal and not as a special marker; and so on.
Status The Status argument (name required) is optional; if specified, it is set to an integer code. These values are possible:
n The number of replacements made. A value greater than 1 indicates option G was in effect.
0 No match: inString not matched by any sl items.
-2 Syntax or other error: for example, the number of items in sl does not equal the number in replacementList; or a sl item exceeds 6124 bytes; or sl is empty.
-5 An invalid string in a replacementList item. For example, an invalid escape sequence, or a $ followed by any characters other than one or more (but no more than 9) zeroes.
-1nnn

A regex in sl is invalid. nnn (the absolute value of the return minus 1000) gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1.

Prior to Sirius Mods Version 7.0, an invalid regex results in a Status value of -1.

Note: If you omit this argument and a negative Status value is to be returned, the run is cancelled.

Usage notes

  • All errors in RegexReplaceCorresponding, including invalid argument(s) result in request cancellation.
  • It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, UTABLE LPDLST 3000 and UTABLE LSTBL 9000. For further discussion of this, see User Language.
  • Items in sl must not exceed 6124 bytes. However, the inString value and items in replacementList may exceed 6124 bytes.
  • For information about additional methods and $functions that support regular expressions, see Regex processing.
  • RegexReplaceCorresponding is available as of Sirius Mods Version 6.9.

Examples

  1. In the following code fragment, the second item in regex list %regList is the first to match the input string inStr. The subexpression in that item performs no special capturing function -- the parentheses are for grouping only. Since %opt='g' is specified, three replacements are made (using the corresponding, second, item in %repList):

    ... %regList = new text to %regList abcx a(bc?) abcd end text %repList = new text to %repList & && &&& end text %inStr = 'abc1abc2abcd' %opt='g' %outStr = %regList:RegexReplaceCorresponding(%inStr, %repList, Options=%opt, Status=%st) Print 'Status from ReplaceCorresponding is ' %st Print 'Output String: ' %outStr ...

    The result would be:

    Status from ReplaceCorresponding is 3 Output String: &&1&&2&&d


See also