StringTokenizer (String function): Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 14: | Line 14: | ||
<td>The string to be tokenized.</td></tr> | <td>The string to be tokenized.</td></tr> | ||
<tr><th>TokenChars</th> | <tr><th><var>TokenChars</var></th> | ||
<td>This name required string argument <var>TokenChars</var> is a set of single-character token-delimiters (delimiters that are also tokens) that may be separated by whitespace characters. | <td>This name required string argument <var>TokenChars</var> is a set of single-character token-delimiters (delimiters that are also tokens) that may be separated by whitespace characters. | ||
<var>TokenChars</var> is an optional argument that defaults to a null string. </td></tr> | <var>TokenChars</var> is an optional argument that defaults to a null string. </td></tr> | ||
<tr><th>Spaces</th> | <tr><th><var>Spaces</var></th> | ||
<td>This name required string argument <var>Spaces</var> is a set of "whitespace" characters, that is, characters that separate tokens. Each of these characters is a "non-token delimiter," a delimiter that is not itself a token. | <td>This name required string argument <var>Spaces</var> is a set of "whitespace" characters, that is, characters that separate tokens. Each of these characters is a "non-token delimiter," a delimiter that is not itself a token. | ||
<var>Spaces</var> is an optional argument that defaults to a blank character. </td></tr> | <var>Spaces</var> is an optional argument that defaults to a blank character. </td></tr> | ||
<tr><th>Quotes</th> | <tr><th><var>Quotes</var></th> | ||
<td>This name required string argument <var>Quotes</var> is a set of quotation characters. The text between each disjoint pair of identical quotation characters (a "quoted region") is treated as a single token, and any delimiter characters (Quote, Space, or TokenChar) within a quoted region are treated as non-delimiters. | <td>This name required string argument <var>Quotes</var> is a set of quotation characters. The text between each disjoint pair of identical quotation characters (a "quoted region") is treated as a single token, and any delimiter characters (Quote, Space, or TokenChar) within a quoted region are treated as non-delimiters. | ||
Line 34: | Line 34: | ||
<li>If you are specifying Spaces, Quotes, or TokenChars, | <li>If you are specifying Spaces, Quotes, or TokenChars, | ||
each character in the string is a quotation | each character in the string is a quotation | ||
character — that is, you may not separate characters — and no character | character &mdash; that is, you may not separate characters &mdash; and no character | ||
may repeat (except for apostrophe, which may be doubled). | may repeat (except for apostrophe, which may be doubled). | ||
<li>A quoted region is not affected by the [[TokensToLower (StringTokenizer property)|TokensToLower]] | <li>A quoted region is not affected by the [[TokensToLower (StringTokenizer property)|TokensToLower]] |
Revision as of 00:21, 13 April 2011
Create a tokenizer using the method object string (String class)
[Introduced in Sirius Mods 7.8]
This method returns a new instance of a StringTokenizer object using the method string as the tokenizer string.
It has three optional arguments that let you specify the delimiter characters
that determine the tokens in the string that is being tokenized.
Syntax
%stringTokenizer = string:StringTokenizer[( [TokenChars= string], - [Spaces= string], - [Quotes= string], - [Separators= string])]
Syntax terms
%stringTokenizer | A StringTokenizer object expression to contain the new object instance. |
---|---|
string | The string to be tokenized. |
TokenChars | This name required string argument TokenChars is a set of single-character token-delimiters (delimiters that are also tokens) that may be separated by whitespace characters. TokenChars is an optional argument that defaults to a null string. |
Spaces | This name required string argument Spaces is a set of "whitespace" characters, that is, characters that separate tokens. Each of these characters is a "non-token delimiter," a delimiter that is not itself a token. Spaces is an optional argument that defaults to a blank character. |
Quotes | This name required string argument Quotes is a set of quotation characters. The text between each disjoint pair of identical quotation characters (a "quoted region") is treated as a single token, and any delimiter characters (Quote, Space, or TokenChar) within a quoted region are treated as non-delimiters. Quotes is an optional argument that defaults to a null string. |
Usage notes
- A character may belong to at most one of the Spaces, Quotes, or TokenChars sets of characters.
- If you are specifying Spaces, Quotes, or TokenChars, each character in the string is a quotation character — that is, you may not separate characters — and no character may repeat (except for apostrophe, which may be doubled).
- A quoted region is not affected by the TokensToLower and TokensToUpper properties.
Examples
begin %tok is object stringtokenizer %tok = 'foo bar':stringTokenizer printText {~} is '{%tok:string}' repeat while not %tok:atEnd printText {~} is '{%tok:nextToken}' end repeat end
The result is:
%tok:string is 'foo bar' %tok:nextToken is 'foo' %tok:nextToken is 'bar'
See also
New_(StringTokenizer_constructor)