UnicodeRegexMatch (Unicode function): Difference between revisions
m (→Usage notes: typo) |
m (finish unfinished sentence) |
||
Line 2: | Line 2: | ||
The <var>UnicodeRegexMatch</var> [[Intrinsic classes|intrinsic]] function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the [[Regex_processing#Regex_rules|rules]] of regular expression matching. | The <var>UnicodeRegexMatch</var> [[Intrinsic classes|intrinsic]] function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the [[Regex_processing#Regex_rules|rules]] of regular expression matching. | ||
==Syntax== | ==Syntax== | ||
Line 38: | Line 36: | ||
<td>This argument is available for Rocket development testing purposes only. It is not an ordinary user parameter.</td></tr> | <td>This argument is available for Rocket development testing purposes only. It is not an ordinary user parameter.</td></tr> | ||
</table> | </table> | ||
==Usage notes== | ==Usage notes== | ||
Line 96: | Line 87: | ||
<li>The regex is specified with the following statement: | <li>The regex is specified with the following statement: | ||
<p class="code">%regex = '[^aeiou]'</p> | <p class="code">%regex = '[^aeiou]'</p> | ||
<p> | |||
Comparing this to the [[RegexMatch (String function)#circumflex|example using circumflex]] for <var>RegexMatch</var> illustrates | Comparing this to the [[RegexMatch (String function)#circumflex|example using circumflex]] for <var>RegexMatch</var> illustrates | ||
one benefit of <var>UnicodeRegexMatch</var>: since the input is Unicode, | one benefit of <var>UnicodeRegexMatch</var>: since the input is Unicode, | ||
the circumflex character can simply be specified directly without concern whether the program was entered with [[Unicode#Code points.2C character set mappings|codepage 1047 or 0037]]. </p></li> | |||
<li>The right-hand side of that statement (<code>'[' '5F':HexToString 'aeiou]'</code>) uses the [[Implicit concatenation|implicit concatenation]] feature | <li>The right-hand side of that statement (<code>'[' '5F':HexToString 'aeiou]'</code>) uses the [[Implicit concatenation|implicit concatenation]] feature. </li> | ||
<li>This use of <var>UnicodeRegexMatch</var> is like the standard <var class="product">SOUL</var> <var>[[$Verify]]</var> function, although it indicates not just whether all characters in the given string are in the regex, but also the position (plus one) of the first character that is not in the regex. | <li>This use of <var>UnicodeRegexMatch</var> is like the standard <var class="product">SOUL</var> <var>[[$Verify]]</var> function, although it indicates not just whether all characters in the given string are in the regex, but also the position (plus one) of the first character that is not in the regex. </li> | ||
</ul> | </ul> | ||
==See also== | ==See also== | ||
{{Template:Unicode:UnicodeRegexMatch footer}} | {{Template:Unicode:UnicodeRegexMatch footer}} |
Revision as of 18:57, 11 May 2016
Position after match of regex (Unicode class)
The UnicodeRegexMatch intrinsic function determines whether a given pattern (regular expression, or "regex") matches within a given string according to the rules of regular expression matching.
Syntax
%number = unicode:UnicodeRegexMatch( regex, [Options= string], - [CaptureList= stringlist]) Throws InvalidRegex
Syntax terms
%number | A variable to return the position of the character after the last character matched, or a zero if no characters in the method object Unicode string match the regular expression. | ||||||||
---|---|---|---|---|---|---|---|---|---|
unicode | The input Unicode string, to which the regular expression regex is applied. | ||||||||
regex | A Unicode string that is interpreted as a regular expression and that is applied to the method object unicode to determine whether the regular expression matches unicode. | ||||||||
Options | This is an optional, but name required, parameter supplying a string of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank-separated or not as you prefer. For more information about these options, see Common regex options.
| ||||||||
CaptureList | This argument is available for Rocket development testing purposes only. It is not an ordinary user parameter. |
Usage notes
- It is strongly recommended that you protect your environment from regular expression processing demands on PDL and STBL space by setting, say,
UTABLE LPDLST 3000
andUTABLE LSTBL 9000
. See User Language programming considerations. - For information about additional methods that support regular expressions, see Regex processing.
- UnicodeRegexMatch may be something of a misnomer. It does not determine if a string matches a regular expression, it determines if a string contains a substring that matches a regular expression. UnicodeRegexMatch behaves more like a matching method if the regular expression is "anchored" (begins with a caret (^) and ends with a dollar sign ($)), or if the C option indicates XML Schema mode.
Examples
Finding the first position of one of several characters
A common programming problem is to "scan" a string and find the first position that is one of several characters. This can be readily accomplished with UnicodeRegexMatch. Here is an example:
%regex = '[aeiou]'; * Scan for any vowel %str = 'That quick brown fox' %i = %str:unicodeRegexMatch(%regex) if %i then printText Before vowel: {%str:unicodeLeft(%i - 2)} printText The vowel: {%str:unicodeChar(%i-1)} printText After vowel: {%str:unicodeSubstring(%i)}
The result of the above fragment is:
Before vowel: Th The vowel: a After vowel: t quick brown fox
Notes:
- The position returned by UnicodeRegexMatch is the position of the character after the first successful match.
Finding the first position that is not one of several characters
A programming task similar to that in the preceding example is finding the position of the first character that is not one of a set of characters. This task is readily accomplished with UnicodeRegexMatch. Here is an example:
%regex = '[^aeiou]'; * Scan for any non-vowel %str = 'albatross' %i = %str:unicodeRegexMatch(%regex) if %i then printText Before non-vowel: {%str:unicodeLeft(%i - 2)} printText The non-vowel: {%str:unicodeChar(%i-1)} printText After non-vowel: {%str:unicodeSubstring(%i)}
The result of the above fragment is:
Before non-vowel: a The non-vowel: l After non-vowel: batross
Notes:
- The regex is specified with the following statement:
%regex = '[^aeiou]'
Comparing this to the example using circumflex for RegexMatch illustrates one benefit of UnicodeRegexMatch: since the input is Unicode, the circumflex character can simply be specified directly without concern whether the program was entered with codepage 1047 or 0037.
- The right-hand side of that statement (
'[' '5F':HexToString 'aeiou]'
) uses the implicit concatenation feature. - This use of UnicodeRegexMatch is like the standard SOUL $Verify function, although it indicates not just whether all characters in the given string are in the regex, but also the position (plus one) of the first character that is not in the regex.