Split (Regex function): Difference between revisions

Latest revision as of 17:37, 24 March 2022

Split string using regex, creating new Stringlist (Regex class)

This function repeatedly locates matches of the regular expression in the Regex object against a string and, based on the result, adds parts of the string to a new Stringlist. It provides similar functionality to the String RegexSplit and Stringlist RegexSplit functions.

Syntax

%sl = regex:Split( string, [Add= regexSplitOutputOptions])

Syntax terms

%stringlist	A new `Stringlist` object that receives parts of the input string.
regex	The `Regex` object
string	The string that is searched for matches to the regular expression in the `Regex` object.
`Add`	A `RegexSplitOutputOptions` enumeration value, which specifies what substrings of `string` to store into `%stringlist`. `Unmatched` is the default.

RegexSplitOutputOptions enumeration

The values of this enumeration, used for the < are the following:

Unmatched	Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of `regex` is `#`, and `inString` is `C###D`, the `UnMatched` option adds four `Stringlist` items: `C`, two empty items, then `D`.
Matched	Store each matched substring only. Include those characters matched by capturing or non-capturing groups.
MatchedAndUnmatched	Store each matched and each unmatched substring in alternating `Stringlist` items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.
Captured	Store only those substrings matched by capturing groups in `regex` — as if `RegexMatch` were applied repeatedly using the `Capture` parameter.
CapturedAndUnmatched	Store in alternating `Stringlist` items a) those substrings matched by capturing groups in `regex`, and b) each unmatched substring. The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.

Usage notes

If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.

Examples

The following example splits a string into blank or comma separated values:

b %regex is object regex %regex = new("( +| *, *)") %regex:split("Hickory, dickory, doc, the mouse ran up the clock"):print end

The above displays:

Hickory dickory doc the mouse ran up the clock

The following example extracts all alpabetic character only words from a string:

b %regex is object regex %regex = new("[a-z]+", options="i") %regex:split("Humpty Dumpty sat on a wall", add=matched):print end

The above displays:

Humpty Dumpty sat on a wall

@@ Line 1: / Line 1: @@
 {{Template:Regex:Split subtitle}}
-This page is [[under construction]].
+This function repeatedly locates matches of the regular expression in the <var>Regex</var> object against a string and, based on the result, adds parts of the string to a new <var>Stringlist</var>. It provides similar functionality to the [[RegexSplit (String function)|String RegexSplit]] and [[RegexSplit (Stringlist function)|Stringlist RegexSplit]] functions.
 ==Syntax==
 {{Template:Regex:Split syntax}}
 ===Syntax terms===
 <table class="syntaxTable">
-<tr><th>%stringlist</th><td><var>Stringlist</var> object</td></tr>
+<tr><th>%stringlist</th><td>A new <var>Stringlist</var> object that receives parts of the input string.</td></tr>
 <tr><th>regex</th>
-<td><var>Regex</var> object</td></tr>
+<td>The <var>Regex</var> object</td></tr>
 <tr><th>string</th>
-<td>string</td></tr>
+<td>The string that is searched for matches to the regular expression in the <var>Regex</var> object.</td></tr>
 <tr><th><var>Add</var></th>
-<td><var>RegexSplitOutputOptions</var> value<br/>The default value of this argument is [[??]].</td></tr>
+<td>A <var>RegexSplitOutputOptions</var> enumeration value, which specifies what substrings of <var>string</var> to store into <var>%stringlist</var>. <var>Unmatched</var> is the default.
+</td></tr>
+</table>
+===RegexSplitOutputOptions enumeration===
+The values of this [[Enumerations|enumeration]], used for the < are the following:
+<table class="thJustBold">
+<tr><th>Unmatched</th>
+<td>Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of <var class="term">regex</var> is <code>#</code>, and <var class="term">inString</var> is <code>C###D</code>, the <code>UnMatched</code> option adds four <var>Stringlist</var> items: <code>C</code>, two empty items, then <code>D</code>.</td></tr>
+<tr><th>Matched</th>
+<td>Store each matched substring only. Include those characters matched by capturing or non-capturing groups.</td></tr>
+<tr><th>MatchedAndUnmatched</th>
+<td>Store each matched and each unmatched substring in alternating <var>Stringlist</var> items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.</td></tr>
+<tr><th>Captured</th>
+<td>Store only those substrings matched by capturing groups in <var class="term">regex</var> &mdash; as if <var>[[Match (Regex function)|RegexMatch]]</var> were applied repeatedly using the <var>Capture</var> parameter.</td></tr>
+<tr><th>CapturedAndUnmatched</th>
+<td>Store in alternating <var>Stringlist</var> items a) those substrings matched by capturing groups in <var class="term">regex</var>, and b) each unmatched substring.
+<p>
+The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.</p>
+</td></tr>
 </table>
 ==Usage notes==
+<ul>
+<li>If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li>
+</ul>
 ==Examples==
+The following example splits a string into blank or comma separated values:
+<p class="code">b
+%regex   is object regex
+%regex = new("( +| *, *)")
+%regex:split("Hickory, dickory, doc, the mouse  ran up  the clock"):print
+end
+</p>
+The above displays:
+<p class="code">Hickory
+dickory
+doc
+the
+mouse
+ran
+up
+the
+clock
+</p>
+The following example extracts all alpabetic character only words from a string:
+<p class="code">b
+%regex   is object regex
+%regex = new("[a-z]+", options="i")
+%regex:split("Humpty Dumpty sat on a wall", add=matched):print
+end
+</p>
+The above displays:
+<p class="code">Humpty
+Dumpty
+sat
+on
+a
+wall
+</p>
 ==See also==
 {{Template:Regex:Split footer}}
+[[Category:Regular expression processing]]

Regex class	List of Regex methods	Regex methods syntax
Notation conventions for methods

Split (Regex function): Difference between revisions

Latest revision as of 17:37, 24 March 2022

Contents

Syntax

Syntax terms

RegexSplitOutputOptions enumeration

Usage notes

Examples

See also

Navigation menu

Split (Regex function): Difference between revisions

Latest revision as of 17:37, 24 March 2022

Syntax

Syntax terms

RegexSplitOutputOptions enumeration

Usage notes

Examples

See also

Navigation menu

Search