Class StreamTokenizer
Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.
Inheritance
Inherited Members
Namespace: Lucene.Net.Support.IO
Assembly: Lucene.Net.dll
Syntax
public class StreamTokenizer
Constructors
| Improve this Doc View SourceStreamTokenizer(Stream)
Constructs a new StreamTokenizer with input
as source input
stream. This constructor is deprecated; instead, the constructor that
takes a System.IO.TextReader as an arugment should be used.
Declaration
[Obsolete("Use StreamTokenizer(TextReader)")]
public StreamTokenizer(Stream input)
Parameters
Type | Name | Description |
---|---|---|
System.IO.Stream | input | the source stream from which to parse tokens. |
Exceptions
Type | Condition |
---|---|
System.ArgumentNullException | If |
StreamTokenizer(TextReader)
Constructs a new {@code StreamTokenizer} with {@code r} as source reader. The tokenizer's initial state is as follows:
- All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
- All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
- Single quote ''' and double quote '"' are string quote characters.
- Numbers are parsed.
- End of lines are considered to be white space rather than separate tokens.
- C-style and C++-style comments are not recognized.
Declaration
public StreamTokenizer(TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
System.IO.TextReader | reader | The source text reader from which to parse tokens. |
Fields
| Improve this Doc View SourceTT_EOF
The constant representing the end of the stream.
Declaration
public const int TT_EOF = -1
Field Value
Type | Description |
---|---|
System.Int32 |
TT_EOL
The constant representing the end of the line.
Declaration
public const int TT_EOL = 10
Field Value
Type | Description |
---|---|
System.Int32 |
TT_NUMBER
The constant representing a number token.
Declaration
public const int TT_NUMBER = -2
Field Value
Type | Description |
---|---|
System.Int32 |
TT_WORD
The constant representing a word token.
Declaration
public const int TT_WORD = -3
Field Value
Type | Description |
---|---|
System.Int32 |
Properties
| Improve this Doc View SourceIsEOLSignificant
Specifies whether the end of a line is significant and should be returned
as TT_EOF in TokenType by this tokenizer.
true
if EOL is significant, false
otherwise.
Declaration
public virtual bool IsEOLSignificant { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
LineNumber
Gets the current line number.
Declaration
public int LineNumber { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
LowerCaseMode
Specifies whether word tokens should be converted to lower case when they
are stored in StringValue. true
if StringValue
should be converted to lower case, false
otherwise.
Declaration
public bool LowerCaseMode { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
NumberValue
Declaration
public double NumberValue { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
SlashSlashComments
Specifies whether "slash-slash" (C++-style) comments shall be recognized.
This kind of comment ends at the end of the line.
true
if //
should be recognized as the start
of a comment, false
otherwise.
Declaration
public bool SlashSlashComments { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
SlashStarComments
Specifies whether "slash-star" (C-style) comments shall be recognized.
Slash-star comments cannot be nested and end when a star-slash
combination is found.
true
if /*
should be recognized as the start
of a comment, false
otherwise.
Declaration
public bool SlashStarComments { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
StringValue
Declaration
public string StringValue { get; set; }
Property Value
Type | Description |
---|---|
System.String |
TokenType
After calling {@code nextToken()}, {@code ttype} contains the type of token that has been read. When a single character is read, its value converted to an integer is stored in {@code ttype}. For a quoted string, the value is the quoted character. Otherwise, its value is one of the following:
- TT_WORD - the token is a word.
- TT_NUMBER - the token is a number.
- TT_EOL - the end of line has been reached. Depends on
whether IsEOLSignificant is
true
. - TT_EOF - the end of the stream has been reached.
Declaration
public int TokenType { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
| Improve this Doc View SourceCommentChar(Int32)
Specifies that the character ch
shall be treated as a comment
character.
Declaration
public virtual void CommentChar(int ch)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | ch | The character to be considered a comment character. |
NextToken()
Parses the next token from this tokenizer's source stream or reader. The type of the token is stored in the TokenType field, additional information may be stored in the NumberValue or StringValue fields.
Declaration
public int NextToken()
Returns
Type | Description |
---|---|
System.Int32 | The value of TokenType. |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | If an I/O error occurs while parsing the next token. |
OrdinaryChar(Int32)
Specifies that the character ch
shall be treated as an ordinary
character by this tokenizer. That is, it has no special meaning as a
comment character, word component, white space, string delimiter or
number.
Declaration
public void OrdinaryChar(int ch)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | ch | The character to be considered an ordinary character. |
OrdinaryChars(Int32, Int32)
Specifies that the characters in the range from low
to hi
shall be treated as an ordinary character by this tokenizer. That is,
they have no special meaning as a comment character, word component,
white space, string delimiter or number.
Declaration
public void OrdinaryChars(int low, int hi)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | low | The first character in the range of ordinary characters. |
System.Int32 | hi | The last character in the range of ordinary characters. |
ParseNumbers()
Specifies that this tokenizer shall parse numbers.
Declaration
public void ParseNumbers()
PushBack()
Indicates that the current token should be pushed back and returned again the next time NextToken() is called.
Declaration
public void PushBack()
QuoteChar(Int32)
Specifies that the character ch
shall be treated as a quote
character.
Declaration
public void QuoteChar(int ch)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | ch | The character to be considered a quote character. |
ResetSyntax()
Specifies that all characters shall be treated as ordinary characters.
Declaration
public void ResetSyntax()
ToString()
Returns the state of this tokenizer in a readable format.
Declaration
public override string ToString()
Returns
Type | Description |
---|---|
System.String | The current state of this tokenizer. |
Overrides
WhitespaceChars(Int32, Int32)
Specifies that the characters in the range from low
to hi
shall be treated as whitespace characters by this tokenizer.
Declaration
public void WhitespaceChars(int low, int hi)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | low | The first character in the range of whitespace characters. |
System.Int32 | hi | The last character in the range of whitespace characters. |
WordChars(Int32, Int32)
Specifies that the characters in the range from low
to hi
shall be treated as word characters by this tokenizer. A word consists of
a word character followed by zero or more word or number characters.
Declaration
public void WordChars(int low, int hi)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | low | The first character in the range of word characters. |
System.Int32 | hi | The last character in the range of word characters. |