Split (Regex function): Difference between revisions
(Automatically generated page update) |
No edit summary |
||
Line 1: | Line 1: | ||
{{Template:Regex:Split subtitle}} | {{Template:Regex:Split subtitle}} | ||
This | This function repeatedly locates matches of the regular expression in the <var>Regex</var> object against a string and, based on the result, adds parts of the string to a new <var>Stringlist</var>. It provides similar functionality to the [[RegexSplit (String function)|String RegexSplit]] and [[RegexSplit (Stringlist function)|Stringlist RegexSplit]] functions. | ||
==Syntax== | ==Syntax== | ||
{{Template:Regex:Split syntax}} | {{Template:Regex:Split syntax}} | ||
===Syntax terms=== | ===Syntax terms=== | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>%stringlist</th><td><var>Stringlist</var> object</td></tr> | <tr><th>%stringlist</th><td>A new <var>Stringlist</var> object that receives parts of the input string.</td></tr> | ||
<tr><th>regex</th> | <tr><th>regex</th> | ||
<td><var>Regex</var> object</td></tr> | <td>The <var>Regex</var> object</td></tr> | ||
<tr><th>string</th> | <tr><th>string</th> | ||
<td>string</td></tr> | <td>The string that is searched for matches to the regular expression in the <var>Regex</var> object.</td></tr> | ||
<tr><th><var>Add</var></th> | <tr><th><var>Add</var></th> | ||
<td><var>RegexSplitOutputOptions</var> value< | <td>A <var>RegexSplitOutputOptions</var> enumeration value, which specifies what substrings of <var>string</var> to store into <var>%stringlist</var>. <var>Unmatched</var> is the default. | ||
</td></tr> | |||
</table> | |||
===RegexSplitOutputOptions enumeration=== | |||
The values of this [[Enumerations|enumeration]], used for the < are the following: | |||
<table class="thJustBold"> | |||
<tr><th>Unmatched</th> | |||
<td>Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of <var class="term">regex</var> is <code>#</code>, and <var class="term">inString</var> is <code>C###D</code>, the <code>UnMatched</code> option adds four <var>Stringlist</var> items: <code>C</code>, two empty items, then <code>D</code>.</td></tr> | |||
<tr><th>Matched</th> | |||
<td>Store each matched substring only. Include those characters matched by capturing or non-capturing groups.</td></tr> | |||
<tr><th>MatchedAndUnmatched</th> | |||
<td>Store each matched and each unmatched substring in alternating <var>Stringlist</var> items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.</td></tr> | |||
<tr><th>Captured</th> | |||
<td>Store only those substrings matched by capturing groups in <var class="term">regex</var> — as if <var>[[Match (Regex function)|RegexMatch]]</var> were applied repeatedly using the <var>Capture</var> parameter.</td></tr> | |||
<tr><th>CapturedAndUnmatched</th> | |||
<td>Store in alternating <var>Stringlist</var> items a) those substrings matched by capturing groups in <var class="term">regex</var>, and b) each unmatched substring. | |||
<p> | |||
The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.</p> | |||
</td></tr> | |||
</table> | </table> | ||
==Usage notes== | ==Usage notes== | ||
<ul> | |||
<li>If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li> | |||
</ul> | |||
==Examples== | ==Examples== | ||
The following example splits a string into blank or comma separated values: | |||
<p class="code">b | |||
%regex is object regex | |||
%regex = new("( +| *, *)") | |||
%regex:split("Hickory, dickory, doc, the mouse ran up the clock"):print | |||
end | |||
</p> | |||
The above displays: | |||
<p class="code">Hickory | |||
dickory | |||
doc | |||
the | |||
mouse | |||
ran | |||
up | |||
the | |||
clock | |||
</p> | |||
The following example extracts all alpabetic character only words from a string: | |||
<p class="code">b | |||
%regex is object regex | |||
%regex = new("[a-z]+", options="i") | |||
%regex:split("Humpty Dumpty sat on a wall", add=matched):print | |||
end | |||
</p> | |||
The above displays: | |||
<p class="code">Humpty | |||
Dumpty | |||
sat | |||
on | |||
a | |||
wall | |||
</p> | |||
==See also== | ==See also== | ||
{{Template:Regex:Split footer}} | {{Template:Regex:Split footer}} | ||
[[Category:Regular expression processing]] |
Latest revision as of 17:37, 24 March 2022
Split string using regex, creating new Stringlist (Regex class)
This function repeatedly locates matches of the regular expression in the Regex object against a string and, based on the result, adds parts of the string to a new Stringlist. It provides similar functionality to the String RegexSplit and Stringlist RegexSplit functions.
Syntax
%sl = regex:Split( string, [Add= regexSplitOutputOptions])
Syntax terms
%stringlist | A new Stringlist object that receives parts of the input string. |
---|---|
regex | The Regex object |
string | The string that is searched for matches to the regular expression in the Regex object. |
Add | A RegexSplitOutputOptions enumeration value, which specifies what substrings of string to store into %stringlist. Unmatched is the default. |
RegexSplitOutputOptions enumeration
The values of this enumeration, used for the < are the following:
Unmatched | Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of regex is # , and inString is C###D , the UnMatched option adds four Stringlist items: C , two empty items, then D . |
---|---|
Matched | Store each matched substring only. Include those characters matched by capturing or non-capturing groups. |
MatchedAndUnmatched | Store each matched and each unmatched substring in alternating Stringlist items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring. |
Captured | Store only those substrings matched by capturing groups in regex — as if RegexMatch were applied repeatedly using the Capture parameter. |
CapturedAndUnmatched | Store in alternating Stringlist items a) those substrings matched by capturing groups in regex, and b) each unmatched substring.
The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on. |
Usage notes
- If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.
Examples
The following example splits a string into blank or comma separated values:
b %regex is object regex %regex = new("( +| *, *)") %regex:split("Hickory, dickory, doc, the mouse ran up the clock"):print end
The above displays:
Hickory dickory doc the mouse ran up the clock
The following example extracts all alpabetic character only words from a string:
b %regex is object regex %regex = new("[a-z]+", options="i") %regex:split("Humpty Dumpty sat on a wall", add=matched):print end
The above displays:
Humpty Dumpty sat on a wall