RegexReplace (String function): Difference between revisions
m (link repair) |
ATitelbaum (talk | contribs) |
||
(3 intermediate revisions by one other user not shown) | |||
Line 36: | Line 36: | ||
<tr><th><var>Options</var></th> | <tr><th><var>Options</var></th> | ||
<td>This optional, [[Notation conventions for methods#Named parameters|name required]], parameter is a string of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see [[Regex_processing#Common_regex_options|Common regex options]] | <td>This optional, [[Notation conventions for methods#Named parameters|name required]], parameter is a string of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see [[Regex_processing#Common_regex_options|Common regex options]]. | ||
</td></tr> | |||
</table> | </table> | ||
Line 86: | Line 67: | ||
%replacement='$L1' | %replacement='$L1' | ||
%opt='g' | %opt='g' | ||
%outStr = %inStr: | %outStr = %inStr:regexReplace(%regex, %replacement, options=%opt) | ||
[[PrintText statement|printText]] OutputString: '{%outStr}' | [[PrintText statement|printText]] OutputString: '{%outStr}' | ||
end | end | ||
Line 96: | Line 77: | ||
<div id="greedy"></div> | <div id="greedy"></div> | ||
<li>Say you want to supply end tags to items | <li>Say you want to supply end tags to items of the form <code><img foo="bar"></code>, converting them to <code><img foo="bar"></img></code>. You decide to use the following regex to capture <code>img</code> tags that have attributes: | ||
<p class="code">(<img .*>)</p> | <p class="code">(<img .*>)</p> | ||
And you use the following replacement string to replace the captured string with the captured string plus an appended <code></img></code>: | And you use the following replacement string to replace the captured string with the captured string plus an appended <code></img></code>: | ||
Line 109: | Line 90: | ||
==See also== | ==See also== | ||
{{Template:String:RegexReplace footer}} | {{Template:String:RegexReplace footer}} | ||
[[Category:Regular expression processing]] |
Latest revision as of 01:43, 26 March 2022
Replace regex match(es) (String class)
The RegexReplace intrinsic function searches a given string for matches of a regular expression, and replaces matches with, or according to, a specified replacement string.
The function stops after the first match and replace, or it can continue searching and replacing until no more matches are found.
Matches are obtained according to the rules of regular expression matching.
Syntax
%outString = string:RegexReplace( regex, replacement, [Options= string]) Throws InvalidRegex
Syntax terms
%outString | A string set to the value of method object string with each matched substring replaced by the value of replacement. | ||||
---|---|---|---|---|---|
string | The method object string, within which matches for regex are sought. | ||||
regex | A string that is interpreted as a regular expression and that is applied to the method object string to find the one or more substrings matched by regex | ||||
replacement | The string that replaces the substrings of string that regex matches. Except when the A option is specified (as described below for the Options argument), you can include markers in the replacement value to indicate where to insert corresponding captured strings — strings matched by capturing groups (parenthesized subexpressions) in regex, if any.
As in Perl, these markers are in the form $n, where n is the number of the capture group, and 1 is the number of the first capture group. n must not be 0 or contain more than 9 digits. If a capturing group makes no matches (is positional, for example), or if there was no nth capture group corresponding to the $n marker in a replacement string, the (literal) value of $n is used in the replacement string instead of the empty string.
The only characters you can escape in a replacement string are dollar sign ( An invalid replacement string results in request cancellation. | ||||
Options | This optional, name required, parameter is a string of single-letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not, as you prefer. For more information about these options, see Common regex options. |
Usage notes
- It is strongly recommended that you protect your environment from regular expression processing demands on PDL and STBL space by setting, say,
UTABLE LPDLST 3000
andUTABLE LSTBL 9000
. See SOUL programming considerations. - Within a regular expression, characters enclosed by a pair of unescaped parentheses form a "subexpression." A subexpression is a capturing group if the opening parenthesis is not followed by a question mark (?). A capturing group that is nested within a non-capturing subexpression is still a capturing group.
- In Perl, $n markers (
$1
, for example) enclosed in single quotes are treated as literals instead of as "that which was captured by the first capturing parentheses." RegexReplace uses theA
option of the Options argument for this purpose. - Matching of regex may "succeed" but yet match no characters. For example, a quantifier like
?
is allowed by definition to match no characters, though it tries to match one. RegexReplace honors such a zero-length match by substituting the specified replacement string at the current position. If the global option is in effect, the regex is then applied again one position to the right in the input string, and again, until the end of the string. The regex9?
globally applied to the stringabc
with a comma-comma (,,) replacement string results in this output string:,,a,,b,,c,,
. - For information about additional methods that support regular expressions, see Regex processing.
Examples
- In the following example, the regex
(5.)
is applied repeatedly (global option) to the string5A5B5C5D5E
to replace the uppercase letters with their lowercase counterparts. The$L1
replacement value makes the replacement string equal to whatever is matched by the capturing group,(5.)
, in the regex (theL
causes the lowercase versions of the captured letters to be used).begin %regex longstring %inStr longstring %replacement longstring %outStr longstring %opt string len 10 %inStr='5A5B5C5D5E' %regex='(5.)' %replacement='$L1' %opt='g' %outStr = %inStr:regexReplace(%regex, %replacement, options=%opt) printText OutputString: '{%outStr}' end
The example result is:
OutputString: '5a5b5c5d5eBold text'
The non-capturing regex
5.
matches and replaces the same substrings as the capturing group(5.)
, but(5.)
is used above to take advantage of the self-referring marker for the replacement string,$L1
, which is valid only for capturing groups. - Say you want to supply end tags to items of the form
<img foo="bar">
, converting them to<img foo="bar"></img>
. You decide to use the following regex to captureimg
tags that have attributes:(<img .*>)
And you use the following replacement string to replace the captured string with the captured string plus an appended
</img>
:$1</img>
However, if the regex above is applied to the string
<body><img src="foo" width="24"></body>
, the end tag</img>
is not inserted after the first closing angle bracket (>
) after24
as you want. Instead, the matched string greedily extends to the second closing angle bracket, and the tag</img>
is positioned at the end:<body><img src="foo" width="24"></body></img>
One remedy for this situation is to use the following regex, which employs a negated character class to match non-closing-bracket characters:
(<img [^>]*>)
This regex does not extend beyond the first closing angle bracket in the target input string, and the resulting output string is:
<body><img src="foo" width="24"></img></body>