UnicodeReplace (Regex function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
(Automatically generated page update)
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{Template:Regex:UnicodeReplace subtitle}}
{{Template:Regex:UnicodeReplace subtitle}}


This page is [[under construction]].
This function replaces the parts of a Unicode string that match the regular expression in the <var>Regex</var> object and returns the Unicode string with the replacements.
==Syntax==
==Syntax==
{{Template:Regex:UnicodeReplace syntax}}
{{Template:Regex:UnicodeReplace syntax}}
===Syntax terms===
===Syntax terms===
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%unicode</th><td>Unicode</td></tr>
<tr><th>%unicode</th><td>A copy of the input unicode after matches are replaced using the appropriate replacement unicode.</td></tr>
<tr><th>regex</th>
<tr><th>regex</th>
<td><var>Regex</var> object</td></tr>
<td>The <var>Regex</var> object.</td></tr>
<tr><th>unicode</th>
<tr><th>unicode</th>
<td>Unicode</td></tr>
<td>The unicode to test against the Regex object.</td></tr>
<tr><th>replacement</th>
<tr><th>replacement</th>
<td>Unicode<br/>The default value of this argument is [[??]].</td></tr>
<td>The unicode that replaces the substrings of <var class="term">string</var> that <var class="term">regex</var> matches.  Except when the <code>A</code> option is specified (as described at [[Regex_processing#Common_regex_options|Common regex options]]), you can include markers in the <var class="term">replacement</var> value to indicate where to insert corresponding captured strings &mdash; strings matched by capturing groups (parenthesized subexpressions) in the regular expression, if any.
<p>
These markers are in the form <var class="term">$n</var>, where <i>n</i> is the number of the capture group, and 1 is the number of the first capture group. <i>n</i> must not be 0 or contain more than 9 digits.  If there was no <i>n</i>th capture group corresponding to the <var class="term">$n</var> marker in a replacement string, the (literal) value of <var class="term">$n</var> is used in the replacement string instead of the empty string.  <code>xxx$1</code> is an example of a valid replacement string, and <code>$0yyy</code> is an example of an invalid one. Or you can use the format <var class="term">$mn</var>, where <i>m</i> is one of the following modifiers:
</p>
<table class="syntaxNested">
<tr><th><var>U</var> or <var class="camel">u</var></th>
<td>Specifies that the specified captured string should be uppercased when inserted.</td></tr>
 
<tr><th><var>L</var> or <var class="camel">l</var></th>
<td>Indicates that the captured string should be lowercased when inserted.</td></tr>
</table>
The only characters you can escape in a replacement string are dollar sign (<code>$</code>), backslash (<code>\</code>), and the digits <code>0</code> through <code>9</code>. So only these escapes are respected: <code>\\</code>, <code>\$</code>, and <code>\0</code> through <code>\9</code>.  No other escapes are allowed in a replacement string &mdash; this includes "shorthand" escapes like <code>\d</code> &mdash; and an "unaccompanied" backslash (<code>\</code>) is an error.  For example, since the scan for the number that accompanies the meta-$ stops at the first non-numeric, you use <code>1$1\2</code> to indicate that the first captured string should go between the numbers 1 and 2 in the replacement string.
<p>An invalid replacement string results in request cancellation.</p></td></tr>
<tr><th><var>Options</var></th>
<tr><th><var>Options</var></th>
<td>string<br/>The default value of this argument is [[??]].</td></tr>
<td>
A string of single letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not. These options are a subset of [[Regex_processing#Common_regex_options|Common regex options]]. The only acceptable options (case-independent) are <code>A</code> for "as-is", <code>G</code> for "global" (replace all occurrences), and <code>T</code> for trace.</td></tr>
</table>
</table>
==Usage notes==
==Usage notes==
<ul>
<li>If the regular expression specified in the constructor call was not Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li>
<li>There is no way to undo the <code>A</code>, <code>G</code>, and <code>T</code> options if they were specified on the constructor so if a <var>Regex</var> objects sometimes needs these options and sometimes not, they should be specified on each <var>UnicodeReplace</var> call.</li>
</ul>
==Examples==
==Examples==
The following example:
<p class="code">b
%regex    is object regex             
%regex = new("([&amp;alpha;-&amp;omega;]{3,})(\d{3,})":u, replace="$2&amp;ndash;$1":u)         
print %regex:unicodeReplace("My license plate says &amp;phi;&amp;beta;&amp;kappa;7643":u)
print %regex:unicodeReplace("My license plate says &amp;phi;&amp;beta;&amp;kappa;7643":u, "nothing")
end
</p>
displays:
<p class="code">My license plate says 7643&amp;#x2013;&amp;#x03C6;&amp;#x03B2;&amp;#x03BA;
My license plate says nothing
</p>
==See also==
==See also==
{{Template:Regex:UnicodeReplace footer}}
{{Template:Regex:UnicodeReplace footer}}
[[Category:Regular expression processing]]

Latest revision as of 14:57, 24 March 2022

Replace regex match(es) for Unicode (Regex class)


This function replaces the parts of a Unicode string that match the regular expression in the Regex object and returns the Unicode string with the replacements.

Syntax

%unicode = regex:UnicodeReplace( unicode, [replacement], [Options= string])

Syntax terms

%unicodeA copy of the input unicode after matches are replaced using the appropriate replacement unicode.
regex The Regex object.
unicode The unicode to test against the Regex object.
replacement The unicode that replaces the substrings of string that regex matches. Except when the A option is specified (as described at Common regex options), you can include markers in the replacement value to indicate where to insert corresponding captured strings — strings matched by capturing groups (parenthesized subexpressions) in the regular expression, if any.

These markers are in the form $n, where n is the number of the capture group, and 1 is the number of the first capture group. n must not be 0 or contain more than 9 digits. If there was no nth capture group corresponding to the $n marker in a replacement string, the (literal) value of $n is used in the replacement string instead of the empty string. xxx$1 is an example of a valid replacement string, and $0yyy is an example of an invalid one. Or you can use the format $mn, where m is one of the following modifiers:

U or u Specifies that the specified captured string should be uppercased when inserted.
L or l Indicates that the captured string should be lowercased when inserted.

The only characters you can escape in a replacement string are dollar sign ($), backslash (\), and the digits 0 through 9. So only these escapes are respected: \\, \$, and \0 through \9. No other escapes are allowed in a replacement string — this includes "shorthand" escapes like \d — and an "unaccompanied" backslash (\) is an error. For example, since the scan for the number that accompanies the meta-$ stops at the first non-numeric, you use 1$1\2 to indicate that the first captured string should go between the numbers 1 and 2 in the replacement string.

An invalid replacement string results in request cancellation.

Options A string of single letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not. These options are a subset of Common regex options. The only acceptable options (case-independent) are A for "as-is", G for "global" (replace all occurrences), and T for trace.

Usage notes

  • If the regular expression specified in the constructor call was not Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.
  • There is no way to undo the A, G, and T options if they were specified on the constructor so if a Regex objects sometimes needs these options and sometimes not, they should be specified on each UnicodeReplace call.

Examples

The following example:

b %regex is object regex %regex = new("([&alpha;-&omega;]{3,})(\d{3,})":u, replace="$2&ndash;$1":u) print %regex:unicodeReplace("My license plate says &phi;&beta;&kappa;7643":u) print %regex:unicodeReplace("My license plate says &phi;&beta;&kappa;7643":u, "nothing") end

displays:

My license plate says 7643&#x2013;&#x03C6;&#x03B2;&#x03BA; My license plate says nothing

See also