StringTokenizer class
The StringTokenizer class is used to divide an input string (method object) into substrings (tokens). The tokens are separated by either of two types of delimiters:
- Delimiters that are not tokens themselves (analogs of whitespace and quotation-mark characters in the English language)
- Delimiters that are also tokens themselves (token-characters, that may be of interest or significance, like a character escape, for example)
A token is thus a sequence of consecutive characters that are not delimiters, or it is a single token-delimiter character. The delimiters are user definable, are specified per StringTokenizer object at creation time, and can be modified thereafter.
StringTokenizer operations maintain two positions within the method object string:
- The location of the most recent token's first character (the "current token position")
- The location from which to begin parsing for the next token (the "tokenizing position")
You can explicitly modify these positions individually.
To navigate the simplest path through the given string, you "walk" forward (left to right) from the beginning of the string using token-sized steps (that is, from whole token to next whole token to next whole token, and so on). The following is a simple example of this in which three tokens are separated by blank, non-token delimiters:
%tok = new %tok:string = 'a tokenization example' %tok:nextToken %tok:nextToken %tok:nextToken
Each of the NextToken method calls above returns a token: respectively, "a", "tokenization", and "example".
The StringTokenizer class also has methods that let you take character-sized steps forward in the string, as well as methods that let you modify the position markers and thereby select tokens or sub-tokens in the order you require. You can also locate specified tokens, and you can return substrings that are the characters in the entire string that precede a position or that follow a position.
The StringTokenizer class methods are described in the following subsections. In the method templates, %tok is used to represent the object to which the method is being applied, sometimes called the "method object". Additional conventions are described in Conventions_and_terminology. Many of the method examples make use of the PrintText statement, which is new as of version 7.2 of the Sirius Mods.
The StringTokenizer class is new as of Sirius Mods version 7.3.