UnicodeReplace (Regex function): Difference between revisions

Latest revision as of 14:57, 24 March 2022

Replace regex match(es) for Unicode (Regex class)

This function replaces the parts of a Unicode string that match the regular expression in the Regex object and returns the Unicode string with the replacements.

Syntax

%unicode = regex:UnicodeReplace( unicode, [replacement], [Options= string])

Syntax terms

%unicode A copy of the input unicode after matches are replaced using the appropriate replacement unicode.

regex The Regex object.

unicode The unicode to test against the Regex object.

replacement

The unicode that replaces the substrings of string that regex matches. Except when the A option is specified (as described at Common regex options), you can include markers in the replacement value to indicate where to insert corresponding captured strings — strings matched by capturing groups (parenthesized subexpressions) in the regular expression, if any.

These markers are in the form $n, where n is the number of the capture group, and 1 is the number of the first capture group. n must not be 0 or contain more than 9 digits. If there was no nth capture group corresponding to the $n marker in a replacement string, the (literal) value of $n is used in the replacement string instead of the empty string. xxx$1 is an example of a valid replacement string, and $0yyy is an example of an invalid one. Or you can use the format $mn, where m is one of the following modifiers:

`U` or `u`	Specifies that the specified captured string should be uppercased when inserted.
`L` or `l`	Indicates that the captured string should be lowercased when inserted.

The only characters you can escape in a replacement string are dollar sign ($), backslash (\), and the digits 0 through 9. So only these escapes are respected: \\, \$, and \0 through \9. No other escapes are allowed in a replacement string — this includes "shorthand" escapes like \d — and an "unaccompanied" backslash (\) is an error. For example, since the scan for the number that accompanies the meta-$ stops at the first non-numeric, you use 1$1\2 to indicate that the first captured string should go between the numbers 1 and 2 in the replacement string.

An invalid replacement string results in request cancellation.

Options A string of single letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not. These options are a subset of Common regex options. The only acceptable options (case-independent) are A for "as-is", G for "global" (replace all occurrences), and T for trace.

Usage notes

If the regular expression specified in the constructor call was not Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.
There is no way to undo the A, G, and T options if they were specified on the constructor so if a Regex objects sometimes needs these options and sometimes not, they should be specified on each UnicodeReplace call.

Examples

The following example:

b %regex is object regex %regex = new("([α-ω]{3,})(\d{3,})":u, replace="$2–$1":u) print %regex:unicodeReplace("My license plate says φβκ7643":u) print %regex:unicodeReplace("My license plate says φβκ7643":u, "nothing") end

displays:

My license plate says 7643–φβκ My license plate says nothing

@@ Line 25: / Line 25: @@
 The only characters you can escape in a replacement string are dollar sign (<code>$</code>), backslash (<code>\</code>), and the digits <code>0</code> through <code>9</code>. So only these escapes are respected: <code>\\</code>, <code>\$</code>, and <code>\0</code> through <code>\9</code>.  No other escapes are allowed in a replacement string &mdash; this includes "shorthand" escapes like <code>\d</code> &mdash; and an "unaccompanied" backslash (<code>\</code>) is an error.  For example, since the scan for the number that accompanies the meta-$ stops at the first non-numeric, you use <code>1$1\2</code> to indicate that the first captured string should go between the numbers 1 and 2 in the replacement string.
 <p>An invalid replacement string results in request cancellation.</p></td></tr>
-<tr><th><var>options</var></th>
+<tr><th><var>Options</var></th>
+<td>
 A string of single letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not. These options are a subset of [[Regex_processing#Common_regex_options|Common regex options]]. The only acceptable options (case-independent) are <code>A</code> for "as-is", <code>G</code> for "global" (replace all occurrences), and <code>T</code> for trace.</td></tr>
 </table>
 ==Usage notes==
 <ul>
-<li>If the regular expresion specified in the constructor call was not Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li>
+<li>If the regular expression specified in the constructor call was not Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li>
 <li>There is no way to undo the <code>A</code>, <code>G</code>, and <code>T</code> options if they were specified on the constructor so if a <var>Regex</var> objects sometimes needs these options and sometimes not, they should be specified on each <var>UnicodeReplace</var> call.</li>
 </ul>

Regex class	List of Regex methods	Regex methods syntax
Notation conventions for methods

UnicodeReplace (Regex function): Difference between revisions

Latest revision as of 14:57, 24 March 2022

Contents

Syntax

Syntax terms

Usage notes

Examples

See also

Navigation menu

UnicodeReplace (Regex function): Difference between revisions

Latest revision as of 14:57, 24 March 2022

Syntax

Syntax terms

Usage notes

Examples

See also

Navigation menu

Search