Split (Regex function)

From m204wiki
Revision as of 17:37, 24 March 2022 by Alex (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Split string using regex, creating new Stringlist (Regex class)


This function repeatedly locates matches of the regular expression in the Regex object against a string and, based on the result, adds parts of the string to a new Stringlist. It provides similar functionality to the String RegexSplit and Stringlist RegexSplit functions.

Syntax

%sl = regex:Split( string, [Add= regexSplitOutputOptions])

Syntax terms

%stringlistA new Stringlist object that receives parts of the input string.
regex The Regex object
string The string that is searched for matches to the regular expression in the Regex object.
Add A RegexSplitOutputOptions enumeration value, which specifies what substrings of string to store into %stringlist. Unmatched is the default.

RegexSplitOutputOptions enumeration

The values of this enumeration, used for the < are the following:

Unmatched Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of regex is #, and inString is C###D, the UnMatched option adds four Stringlist items: C, two empty items, then D.
Matched Store each matched substring only. Include those characters matched by capturing or non-capturing groups.
MatchedAndUnmatched Store each matched and each unmatched substring in alternating Stringlist items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.
Captured Store only those substrings matched by capturing groups in regex — as if RegexMatch were applied repeatedly using the Capture parameter.
CapturedAndUnmatched Store in alternating Stringlist items a) those substrings matched by capturing groups in regex, and b) each unmatched substring.

The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.

Usage notes

  • If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.

Examples

The following example splits a string into blank or comma separated values:

b %regex is object regex %regex = new("( +| *, *)") %regex:split("Hickory, dickory, doc, the mouse ran up the clock"):print end

The above displays:

Hickory dickory doc the mouse ran up the clock

The following example extracts all alpabetic character only words from a string:

b %regex is object regex %regex = new("[a-z]+", options="i") %regex:split("Humpty Dumpty sat on a wall", add=matched):print end

The above displays:

Humpty Dumpty sat on a wall

See also