UnicodeReplace (Regex function)

From m204wiki
Revision as of 14:57, 24 March 2022 by Alex (talk | contribs) (→‎Syntax terms)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Replace regex match(es) for Unicode (Regex class)


This function replaces the parts of a Unicode string that match the regular expression in the Regex object and returns the Unicode string with the replacements.

Syntax

%unicode = regex:UnicodeReplace( unicode, [replacement], [Options= string])

Syntax terms

%unicodeA copy of the input unicode after matches are replaced using the appropriate replacement unicode.
regex The Regex object.
unicode The unicode to test against the Regex object.
replacement The unicode that replaces the substrings of string that regex matches. Except when the A option is specified (as described at Common regex options), you can include markers in the replacement value to indicate where to insert corresponding captured strings — strings matched by capturing groups (parenthesized subexpressions) in the regular expression, if any.

These markers are in the form $n, where n is the number of the capture group, and 1 is the number of the first capture group. n must not be 0 or contain more than 9 digits. If there was no nth capture group corresponding to the $n marker in a replacement string, the (literal) value of $n is used in the replacement string instead of the empty string. xxx$1 is an example of a valid replacement string, and $0yyy is an example of an invalid one. Or you can use the format $mn, where m is one of the following modifiers:

U or u Specifies that the specified captured string should be uppercased when inserted.
L or l Indicates that the captured string should be lowercased when inserted.

The only characters you can escape in a replacement string are dollar sign ($), backslash (\), and the digits 0 through 9. So only these escapes are respected: \\, \$, and \0 through \9. No other escapes are allowed in a replacement string — this includes "shorthand" escapes like \d — and an "unaccompanied" backslash (\) is an error. For example, since the scan for the number that accompanies the meta-$ stops at the first non-numeric, you use 1$1\2 to indicate that the first captured string should go between the numbers 1 and 2 in the replacement string.

An invalid replacement string results in request cancellation.

Options A string of single letter options, which may be specified in uppercase or lowercase, in any combination, and blank separated or not. These options are a subset of Common regex options. The only acceptable options (case-independent) are A for "as-is", G for "global" (replace all occurrences), and T for trace.

Usage notes

  • If the regular expression specified in the constructor call was not Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.
  • There is no way to undo the A, G, and T options if they were specified on the constructor so if a Regex objects sometimes needs these options and sometimes not, they should be specified on each UnicodeReplace call.

Examples

The following example:

b %regex is object regex %regex = new("([α-ω]{3,})(\d{3,})":u, replace="$2–$1":u) print %regex:unicodeReplace("My license plate says φβκ7643":u) print %regex:unicodeReplace("My license plate says φβκ7643":u, "nothing") end

displays:

My license plate says 7643–φβκ My license plate says nothing

See also