Split (Regex function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
(Automatically generated page update)
 
No edit summary
 
Line 1: Line 1:
{{Template:Regex:Split subtitle}}
{{Template:Regex:Split subtitle}}


This page is [[under construction]].
This function repeatedly locates matches of the regular expression in the <var>Regex</var> object against a string and, based on the result, adds parts of the string to a new <var>Stringlist</var>. It provides similar functionality to the [[RegexSplit (String function)|String RegexSplit]] and [[RegexSplit (Stringlist function)|Stringlist RegexSplit]] functions.
==Syntax==
==Syntax==
{{Template:Regex:Split syntax}}
{{Template:Regex:Split syntax}}
===Syntax terms===
===Syntax terms===
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%stringlist</th><td><var>Stringlist</var> object</td></tr>
<tr><th>%stringlist</th><td>A new <var>Stringlist</var> object that receives parts of the input string.</td></tr>
<tr><th>regex</th>
<tr><th>regex</th>
<td><var>Regex</var> object</td></tr>
<td>The <var>Regex</var> object</td></tr>
<tr><th>string</th>
<tr><th>string</th>
<td>string</td></tr>
<td>The string that is searched for matches to the regular expression in the <var>Regex</var> object.</td></tr>
<tr><th><var>Add</var></th>
<tr><th><var>Add</var></th>
<td><var>RegexSplitOutputOptions</var> value<br/>The default value of this argument is [[??]].</td></tr>
<td>A <var>RegexSplitOutputOptions</var> enumeration value, which specifies what substrings of <var>string</var> to store into <var>%stringlist</var>. <var>Unmatched</var> is the default.
</td></tr>
</table>
 
===RegexSplitOutputOptions enumeration===
The values of this [[Enumerations|enumeration]], used for the < are the following:
<table class="thJustBold">
<tr><th>Unmatched</th>
<td>Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of <var class="term">regex</var> is <code>#</code>, and <var class="term">inString</var> is <code>C###D</code>, the <code>UnMatched</code> option adds four <var>Stringlist</var> items: <code>C</code>, two empty items, then <code>D</code>.</td></tr>
 
<tr><th>Matched</th>
<td>Store each matched substring only. Include those characters matched by capturing or non-capturing groups.</td></tr>
 
<tr><th>MatchedAndUnmatched</th>
<td>Store each matched and each unmatched substring in alternating <var>Stringlist</var> items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.</td></tr>
 
<tr><th>Captured</th>
<td>Store only those substrings matched by capturing groups in <var class="term">regex</var> &mdash; as if <var>[[Match (Regex function)|RegexMatch]]</var> were applied repeatedly using the <var>Capture</var> parameter.</td></tr>
 
<tr><th>CapturedAndUnmatched</th>
<td>Store in alternating <var>Stringlist</var> items a) those substrings matched by capturing groups in <var class="term">regex</var>, and b) each unmatched substring.
<p>
The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.</p>
</td></tr>
</table>
</table>
==Usage notes==
==Usage notes==
<ul>
<li>If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a <var>Regex</var> object was created with a Unicode regular expression check the [[IsUnicode (Regex property)|IsUnicode property]].</li>
</ul>
==Examples==
==Examples==
The following example splits a string into blank or comma separated values:
<p class="code">b                                                                       
                                                                       
%regex  is object regex                                               
                                                                       
%regex = new("( +| *, *)")                                             
%regex:split("Hickory, dickory, doc, the mouse  ran up  the clock"):print
                                                                       
end                                                                     
</p>
The above displays:
<p class="code">Hickory
dickory
doc   
the   
mouse 
ran   
up   
the   
clock 
</p>
The following example extracts all alpabetic character only words from a string:
<p class="code">b                                                           
                                                             
%regex  is object regex                                     
                                                             
%regex = new("[a-z]+", options="i")                         
%regex:split("Humpty Dumpty sat on a wall", add=matched):print
                                                             
end                                                         
</p>
The above displays:
<p class="code">Humpty
Dumpty
sat 
on   
a   
wall 
</p>
==See also==
==See also==
{{Template:Regex:Split footer}}
{{Template:Regex:Split footer}}
[[Category:Regular expression processing]]

Latest revision as of 17:37, 24 March 2022

Split string using regex, creating new Stringlist (Regex class)


This function repeatedly locates matches of the regular expression in the Regex object against a string and, based on the result, adds parts of the string to a new Stringlist. It provides similar functionality to the String RegexSplit and Stringlist RegexSplit functions.

Syntax

%sl = regex:Split( string, [Add= regexSplitOutputOptions])

Syntax terms

%stringlistA new Stringlist object that receives parts of the input string.
regex The Regex object
string The string that is searched for matches to the regular expression in the Regex object.
Add A RegexSplitOutputOptions enumeration value, which specifies what substrings of string to store into %stringlist. Unmatched is the default.

RegexSplitOutputOptions enumeration

The values of this enumeration, used for the < are the following:

Unmatched Store only each unmatched substring and any empty substrings due to adjacent separators (consecutive matching substrings). For example, if the value of regex is #, and inString is C###D, the UnMatched option adds four Stringlist items: C, two empty items, then D.
Matched Store each matched substring only. Include those characters matched by capturing or non-capturing groups.
MatchedAndUnmatched Store each matched and each unmatched substring in alternating Stringlist items. The first item contains the first unmatched substring, the second item contains the first matched substring, and so on, ending with the last matched substring and the last unmatched substring.
Captured Store only those substrings matched by capturing groups in regex — as if RegexMatch were applied repeatedly using the Capture parameter.
CapturedAndUnmatched Store in alternating Stringlist items a) those substrings matched by capturing groups in regex, and b) each unmatched substring.

The first item contains the first unmatched substring, if any; otherwise, it contains the substring captured by the first capturing group. The next item contains the substring captured by the next, if any, capturing group; otherwise, it contains the next unmatched string, and so on.

Usage notes

  • If the regular expression specified in the constructor call was Unicode, this method causes request cancellation. To test if a Regex object was created with a Unicode regular expression check the IsUnicode property.

Examples

The following example splits a string into blank or comma separated values:

b %regex is object regex %regex = new("( +| *, *)") %regex:split("Hickory, dickory, doc, the mouse ran up the clock"):print end

The above displays:

Hickory dickory doc the mouse ran up the clock

The following example extracts all alpabetic character only words from a string:

b %regex is object regex %regex = new("[a-z]+", options="i") %regex:split("Humpty Dumpty sat on a wall", add=matched):print end

The above displays:

Humpty Dumpty sat on a wall

See also