$RegexMatch: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
mNo edit summary
Line 5: Line 5:


This function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the "rules" of regular expression matching (information about the rules observed is provided in ). The function is available as of Version 6.9 of the <var class="product">[[Sirius Mods]]</var>.  
This function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the "rules" of regular expression matching (information about the rules observed is provided in ). The function is available as of Version 6.9 of the <var class="product">[[Sirius Mods]]</var>.  
<var>$RegexMatch</var> accepts two required and two optional arguments, and it returns a numeric value. It is also callable. Specifying an invalid argument results in request cancellation.


==Syntax==
==Syntax==
<p class="syntax"><section begin="syntax" /> [%rc =] $RegexMatch(inStr, regex, [options], [%status])
<p class="syntax"><section begin="syntax" /> [%rc =] $RegexMatch(inStr, regex, [options], [%status])
<section end="syntax" /></p>
<section end="syntax" /></p>
<p class="caption">$RegexMatch Function
</p>
<p class="caption">'''%rc''', if specified, is a number that is either 0 if the regular expression was invalid or no match was found, or the position of the character '''after''' the last character matched.</p>


<var>$RegexMatch</var> accepts two required and two optional arguments, and it returns a numeric value. It is also callable . Specifying an invalid argument results in request cancellation.
===Syntax terms===
<ul>
<table class="syntaxTable">
<tr><th>%rc</th>
<td>a number that is either 0 (if the regular expression was invalid or no match was found) or the position of the character '''after''' the last character matched. </td></tr>
 
<tr><th>instr</th>
<td>The input string, to which the regular expression <var class="term">regex</var> is applied. This is a required argument. </td></tr>


<li>The first argument is the input string, to which the regular expression '''regex''' is applied. This is a required argument.
<tr><th>regex</th>
<li>The second argument is a string that is interpreted as a regular expression and is applied to the '''inStr''' argument to determine whether the regex matches '''inStr'''. This is a required argument.
<td>A string that is interpreted as a regular expression and is applied to the <var class="term">inStr</var> argument to determine whether the regex matches <var class="term">inStr</var>. This is a required argument. </td></tr>
<li>The third argument is an optional string of options. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see
<table class="syntaxTable">
<tr><th>I</th>
<td>Do case-insensitive matching between '''string''' and '''regex'''.</td></tr>
<tr><th>S</th>
<td>Dot-All mode: a dot (".") can match any character, including carriage return and linefeed.</td></tr>
<tr><th>M</th>


<tr><th>options</th>
<td>An optional string of options. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see [[Regex processing#Common regex options|"Common regex options"]].


<td>Multi-line mode: let anchor characters match end-of-line indicators '''wherever''' the indicator appears in the input string. M mode is ignored if C (XML Schema) mode is specified.</td></tr>
<table class="syntaxTable">
<tr><th>C</th>
<tr><th><var>I</var></th>
<td>Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information,
<td>Do case-insensitive matching between <var class="term">instr</var> and <var class="term">regex</var>.</td></tr>
<tr><th><var>S</var></th>
<td>Dot-All mode: a dot (<tt>.</tt>) can match any character, including carriage return and linefeed.</td></tr>
<tr><th><var>M</var></th>
<td>Multi-line mode: let anchor characters match end-of-line indicators '''wherever''' the indicator appears in the input string. <var>M</var> mode is ignored if <var>C</var> (XML Schema) mode is specified.</td></tr>
<tr><th><var>C</var></th>
<td>Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information, see [[Regex processing#XML Schema mode|"XML Schema mode"]].
</td></tr></table>
</td></tr></table>
</td></tr>


<li>The fourth argument is optional; if specified, it is set to an integer status value. These values are possible:
<tr><th>%status</th>
<td>The fourth argument is optional; if specified, it is set to an integer status value. These values are possible:
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>&amp;amp;thinsp.1</th>
<tr><th>1</th>
<td>A successful match was obtained.</td></tr>
<td>A successful match was obtained.</td></tr>
<tr><th>&amp;amp;thinsp.0</th>
<tr><th>0</th>
<td>No match: '''inStr''' was not matched by '''regex'''.</td></tr>
<td>No match: '''inStr''' was not matched by '''regex'''.</td></tr>
<tr><th>-1<i>nnn</i></th>
<tr><th>-1<i>nnn</i></th>
<td>The pattern in '''regex''' is invalid. <i>nnn</i>, the absolute value of the return minus 1000, gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1. Prior to Version 7.0 of the <var class="product">[[Sirius Mods]]</var>, an invalid regex results in a '''status''' value of <tt>-1</tt>. <p>'''Note: ''' If you omit this argument and a negative '''status''' value is to be returned, the run is cancelled. </p>
<td>The pattern in '''regex''' is invalid. <i>nnn</i>, the absolute value of the return minus 1000, gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1. Prior to Version 7.0 of the <var class="product">Sirius Mods</var>, an invalid regex results in a <var class="term">%status</var> value of <tt>-1</tt>.  
<p>
'''Note:''' If you omit this argument and a negative <var class="term">%status</var> value is to be returned, the run is cancelled. </p>
</td></tr></table>
</td></tr></table>
</td></tr>
</table>
==Usage notes==
<ul>
<li>It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, <code>UTABLE LPDLST 3000</code> and <code>UTABLE LSTBL 9000</code>. For further discussion of this, see [[Regex processing#User Language programming considerations|"User Language programming considerations"]].


</ul>
<li><var>$RegexMatch</var> is considered <var>Longstring</var>-capable. Its string inputs and outputs are considered <var>[[Longstrings]]</var> for expression-compilation purposes, and they have standard <var>Longstring</var> truncation behavior: truncation by assignment results in request cancellation. For more information, see [[Longstrings#Longstrings and $functions|"Longstrings and $functions"]].


==Notes==
<li>If <var class="term">%rc</var> is zero, either <var class="term">regex</var> did not match <var class="term">inStr</var>, or there was an error in the regex. The <var class="term">%status</var> argument returns additional information. If it is negative, it indicates an error. If it is zero, it indicates there was no error, but the regex did not match.


<ul>
<li>For information about additional methods and $functions that support regular expressions, see [[Regex processing|"Regex processing"]].


<li>It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, <tt>UTABLE LPDLST 3000</tt> and <tt>UTABLE LSTBL 9000</tt>. For further discussion of this,
<li><var>$RegexMatch</var> is available as of Version 6.9.
<li>$RegexMatch is considered Longstring-capable. Its string inputs and outputs are considered Longstrings for expression-compilation purposes, and they have standard Longstring truncation behavior: truncation by assignment results in request cancellation. For more information,
<li>If '''%rc''' is zero, either '''regex''' did not match '''inStr''', or there was an error in the regex. The '''%status''' argument returns additional information. If it is negative, it indicates an error. If it is zero, it indicates there was no error, but the regex did not match.  
<li>For information about additional methods and $functions that support regular expressions,
</ul>
</ul>


==Examples==
==Examples==
The following example tests whether the regex <code>\*bc?[5-8]</code> matches the string <code>a*b6</code>. If the return code is 0 (no match), the status variable is checked for more information.


The following example tests whether the regex <tt>\*bc?[5-8]</tt> matches the string <tt>a*b6</tt>. If the return code is 0 (no match), the status variable is checked for more information.
<p class="code">Begin
 
%rc float
<p class="code"> Begin
%regex Longstring
%rc float
%String Longstring
%regex Longstring
%Options string len 10
%String Longstring
%status float
%Options string len 10
%status float
   
   
%Options = ''
%Options = ''
%regex = '\*bc?[5-8]'
%regex = '\*bc?[5-8]'
%String = 'a\*b6'
%String = 'a\*b6'
   
   
%rc = $RegexMatch (%String, %regex, %Options, %status)
%rc = $RegexMatch (%String, %regex, %Options, %status)
If (%rc EQ 0) then
If (%rc EQ 0) then
    Print 'Status from <var>$RegexMatch</var> is ' %status
  Print 'Status from <var>$RegexMatch</var> is ' %status
Else
Else
    Print %regex ' matches ' %String
  Print %regex ' matches ' %String
End If
End If
End
End
</p>
</p>


The regex matches the input string; the example result is:
The regex matches the input string; the example result is:
<p class="code"> \*bc?[5-8] matches a\*b6
<p class="code">\*bc?[5-8] matches a\*b6
</p>
</p>


This regex demonstrates the following:
This regex demonstrates the following:
<ul>
<ul>
<li>To match a string, a regex pattern must merely "fit" a substring of the string.  
<li>To match a string, a regex pattern must merely "fit" a substring of the string.  
<li>Metacharacters, in this case star (<tt>*</tt>), must be escaped.  
<li>Metacharacters, in this case star (<code>*</code>), must be escaped.  
<li>An optional character (<tt>c?</tt>) may fail to find a match, but this does not prevent the success of the overall match.  
<li>An optional character (<code>c?</code>) may fail to find a match, but this does not prevent the success of the overall match.  
<li>The character class range (<tt>[5-8]</tt>) matches the <tt>6</tt> in the input string.
<li>The character class range (<code>[5-8]</code>) matches the <tt>6</tt> in the input string.
</ul>
</ul>
<var>$RegexMatch</var> is available as of Version 6.9.


<ul class="smallAndTightList">
<ul class="smallAndTightList">

Revision as of 22:23, 23 October 2012

Whether string matches regex

Most Sirius $functions have been deprecated in favor of Object Oriented methods. The OO equivalent for the $RegexMatch function is the RegexMatch (String function).

This function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the "rules" of regular expression matching (information about the rules observed is provided in ). The function is available as of Version 6.9 of the Sirius Mods.

$RegexMatch accepts two required and two optional arguments, and it returns a numeric value. It is also callable. Specifying an invalid argument results in request cancellation.

Syntax

<section begin="syntax" /> [%rc =] $RegexMatch(inStr, regex, [options], [%status]) <section end="syntax" />

Syntax terms

%rc a number that is either 0 (if the regular expression was invalid or no match was found) or the position of the character after the last character matched.
instr The input string, to which the regular expression regex is applied. This is a required argument.
regex A string that is interpreted as a regular expression and is applied to the inStr argument to determine whether the regex matches inStr. This is a required argument.
options An optional string of options. The options are single letters, which may be specified in uppercase or lowercase, in any combination, and separated by blanks or not separated. For more information about these options, see "Common regex options".
I Do case-insensitive matching between instr and regex.
S Dot-All mode: a dot (.) can match any character, including carriage return and linefeed.
M Multi-line mode: let anchor characters match end-of-line indicators wherever the indicator appears in the input string. M mode is ignored if C (XML Schema) mode is specified.
C Do the match according to XML Schema regex rules. Each regex is implicitly anchored at the beginning and end, and no characters serve as anchors. For more information, see "XML Schema mode".
%status The fourth argument is optional; if specified, it is set to an integer status value. These values are possible:
1 A successful match was obtained.
0 No match: inStr was not matched by regex.
-1nnn The pattern in regex is invalid. nnn, the absolute value of the return minus 1000, gives the 1-based position of the character being scanned when the error was discovered. The value for an error occurring at end-of-string is the length of the string + 1. Prior to Version 7.0 of the Sirius Mods, an invalid regex results in a %status value of -1.

Note: If you omit this argument and a negative %status value is to be returned, the run is cancelled.

Usage notes

  • It is strongly recommended that you protect your environment from regex processing demands on PDL and STBL space by setting, say, UTABLE LPDLST 3000 and UTABLE LSTBL 9000. For further discussion of this, see "User Language programming considerations".
  • $RegexMatch is considered Longstring-capable. Its string inputs and outputs are considered Longstrings for expression-compilation purposes, and they have standard Longstring truncation behavior: truncation by assignment results in request cancellation. For more information, see "Longstrings and $functions".
  • If %rc is zero, either regex did not match inStr, or there was an error in the regex. The %status argument returns additional information. If it is negative, it indicates an error. If it is zero, it indicates there was no error, but the regex did not match.
  • For information about additional methods and $functions that support regular expressions, see "Regex processing".
  • $RegexMatch is available as of Version 6.9.

Examples

The following example tests whether the regex \*bc?[5-8] matches the string a*b6. If the return code is 0 (no match), the status variable is checked for more information.

Begin %rc float %regex Longstring %String Longstring %Options string len 10 %status float %Options = %regex = '\*bc?[5-8]' %String = 'a\*b6' %rc = $RegexMatch (%String, %regex, %Options, %status) If (%rc EQ 0) then Print 'Status from $RegexMatch is ' %status Else Print %regex ' matches ' %String End If End

The regex matches the input string; the example result is:

\*bc?[5-8] matches a\*b6

This regex demonstrates the following:

  • To match a string, a regex pattern must merely "fit" a substring of the string.
  • Metacharacters, in this case star (*), must be escaped.
  • An optional character (c?) may fail to find a match, but this does not prevent the success of the overall match.
  • The character class range ([5-8]) matches the 6 in the input string.

Products authorizing $RegexMatch