UnicodeRegexReplace (Unicode function): Difference between revisions
m (link repair) |
No edit summary |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
{{Template:Unicode:UnicodeRegexReplace subtitle}} | {{Template:Unicode:UnicodeRegexReplace subtitle}} | ||
The <var>UnicodeRegexReplace</var> [[Intrinsic classes|intrinsic]] function searches a given <var>Unicode</var> string for matches of a regular expression, and it replaces matches with, or according to, a specified replacement string. | |||
The function stops after the first match and replace, or it can continue searching and replacing until no more matches are found. | |||
Matches are obtained according to the [[Regex processing#Regex rules|rules]] of regular expression matching. | |||
==Syntax== | ==Syntax== | ||
{{Template:Unicode:UnicodeRegexReplace syntax}} | {{Template:Unicode:UnicodeRegexReplace syntax}} | ||
Line 34: | Line 38: | ||
<tr><th><var>Options</var></th> | <tr><th><var>Options</var></th> | ||
<td>This optional, [[Notation conventions for methods#Named parameters|name required]], parameter is a String of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see [[Regex processing#Common regex options|Common regex options]] | <td>This optional, [[Notation conventions for methods#Named parameters|name required]], parameter is a String of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see [[Regex processing#Common regex options|Common regex options]]. | ||
</td></tr> | |||
</table> | </table> | ||
Line 71: | Line 56: | ||
==Examples== | ==Examples== | ||
This request replaces a word with a different spelling, then adds an entirely lowercased copy of an initial phrase: | |||
<p class="code">begin | |||
%regexU is unicode | |||
%strU is unicode | |||
%oU is unicode | |||
%strU = 'At the centre of it all, your eyes' | |||
%regexU = 'centre'; | |||
%oU = %str:unicodeRegexReplace(%regexU, 'center') | |||
printText 'center': {%strU:unicodeRegexReplace(%regexU, 'center') | |||
%regexU = '(.*[,])' | |||
printText $1 $L1 : {%oU:unicodeRegexReplace(%regexU, '$1 $L1')} | |||
end | |||
</p> | |||
The result of the above fragment is: | |||
<p class="output">'center': At the center of it all, your eyes | |||
$1 $L1 : At the center of it all, at the center of it all, your eyes | |||
</p> | |||
<p class="note"><b>Note:</b> To add a third occurrence of the phrase "at the center of it all" to the result above, change the regex to <code>(.*?[,])</code>, which [[Regex processing#Greedy and non-greedy quantifiers|non-greedily]] captures only the first occurrence. Using the <code>(.*[,])</code> regex against an initial string that has two occurrences of the phrase and two commas, the method would capture to and including the second comma, and would output four occurrences. </p> | |||
==See also== | ==See also== | ||
{{Template:Unicode:UnicodeRegexReplace footer}} | {{Template:Unicode:UnicodeRegexReplace footer}} | ||
[[Category:Regular expression processing]] |
Latest revision as of 22:06, 21 January 2022
Replace regex match(es) (Unicode class)
The UnicodeRegexReplace intrinsic function searches a given Unicode string for matches of a regular expression, and it replaces matches with, or according to, a specified replacement string.
The function stops after the first match and replace, or it can continue searching and replacing until no more matches are found.
Matches are obtained according to the rules of regular expression matching.
Syntax
%outUnicode = unicode:UnicodeRegexReplace( regex, replacement, - [Options= string]) Throws InvalidRegex
Syntax terms
%outUnicode | Unicode | ||||
---|---|---|---|---|---|
unicode | The input Unicode string, to which the regular expression regex is applied. | ||||
regex | A Unicode string that is interpreted as a regular expression and that is applied to the method object, unicode, to find the one or more substrings matched by regex. | ||||
replacement | The Unicode string that replaces the substrings of unicode that regex matches. Except when the A option is specified (as described below for the Options argument), you can include markers in the replacement value to indicate where to insert corresponding captured strings — strings matched by capturing groups (parenthesized subexpressions) in regex, if any.
As in Perl, these markers are in the form $n, where n is the number of the capture group, and 1 is the number of the first capture group. n must not be 0 or contain more than 9 digits. If a capturing group makes no matches (is positional, for example), or if there was no nth capture group corresponding to the $n marker in a replacement string, the (literal) value of $n is used in the replacement string instead of the empty string.
The only characters you can escape in a replacement string are dollar sign ( An invalid replacement string results in request cancellation. | ||||
Options | This optional, name required, parameter is a String of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see Common regex options. |
Usage notes
- It is strongly recommended that you protect your environment from regular expression processing demands on PDL and STBL space by setting, say,
UTABLE LPDLST 3000
andUTABLE LSTBL 9000
. See SOUL programming considerations. - Within a regular expression, characters enclosed by a pair of unescaped parentheses form a "subexpression." A subexpression is a capturing group if the opening parenthesis is not followed by a question mark (?). A capturing group that is nested within a non-capturing subexpression is still a capturing group.
- In Perl, $n markers (
$1
, for example) enclosed in single quotes are treated as literals instead of as "that which was captured by the first capturing parentheses." RegexReplace uses theA
option of the Options argument for this purpose. - Matching of regex may "succeed" but yet match no characters. For example, a quantifier like
?
is allowed by definition to match no characters, though it tries to match one. UnicodeRegexReplace honors such a zero-length match by substituting the specified replacement string at the current position. If the global option is in effect, the regex is then applied again one position to the right in the input string, and again, until the end of the string. The regex9?
globally applied to the stringabc
with a comma-comma (,,) replacement string results in this output string:,,a,,b,,c,,
. - For information about additional methods that support regular expressions, see Regex processing.
Examples
This request replaces a word with a different spelling, then adds an entirely lowercased copy of an initial phrase:
begin %regexU is unicode %strU is unicode %oU is unicode %strU = 'At the centre of it all, your eyes' %regexU = 'centre'; %oU = %str:unicodeRegexReplace(%regexU, 'center') printText 'center': {%strU:unicodeRegexReplace(%regexU, 'center') %regexU = '(.*[,])' printText $1 $L1 : {%oU:unicodeRegexReplace(%regexU, '$1 $L1')} end
The result of the above fragment is:
'center': At the center of it all, your eyes $1 $L1 : At the center of it all, at the center of it all, your eyes
Note: To add a third occurrence of the phrase "at the center of it all" to the result above, change the regex to (.*?[,])
, which non-greedily captures only the first occurrence. Using the (.*[,])
regex against an initial string that has two occurrences of the phrase and two commas, the method would capture to and including the second comma, and would output four occurrences.