Class StreamTokenizer

Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.

Inheritance

System.Object

StreamTokenizer

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

Namespace: Lucene.Net.Support.IO

Assembly: Lucene.Net.dll

Syntax

public class StreamTokenizer

Constructors

| Improve this Doc View Source

StreamTokenizer(Stream)

Constructs a new StreamTokenizer with input as source input stream. This constructor is deprecated; instead, the constructor that takes a System.IO.TextReader as an arugment should be used.

Declaration

[Obsolete("Use StreamTokenizer(TextReader)")]
public StreamTokenizer(Stream input)

Parameters

Type	Name	Description
System.IO.Stream	input	the source stream from which to parse tokens.

Exceptions

Type	Condition
System.ArgumentNullException	If `input` is `null`.

| Improve this Doc View Source

StreamTokenizer(TextReader)

Constructs a new {@code StreamTokenizer} with {@code r} as source reader. The tokenizer's initial state is as follows:

All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
Single quote ''' and double quote '"' are string quote characters.
Numbers are parsed.
End of lines are considered to be white space rather than separate tokens.
C-style and C++-style comments are not recognized.

Declaration

public StreamTokenizer(TextReader reader)

Parameters

Type	Name	Description
System.IO.TextReader	reader	The source text reader from which to parse tokens.

Fields

| Improve this Doc View Source

TT_EOF

The constant representing the end of the stream.

Declaration

public const int TT_EOF = -1

Field Value

Type	Description
System.Int32

| Improve this Doc View Source

TT_EOL

The constant representing the end of the line.

Declaration

public const int TT_EOL = 10

Field Value

Type	Description
System.Int32

| Improve this Doc View Source

TT_NUMBER

The constant representing a number token.

Declaration

public const int TT_NUMBER = -2

Field Value

Type	Description
System.Int32

| Improve this Doc View Source

TT_WORD

The constant representing a word token.

Declaration

public const int TT_WORD = -3

Field Value

Type	Description
System.Int32

Properties

| Improve this Doc View Source

IsEOLSignificant

Specifies whether the end of a line is significant and should be returned as TT_EOF in TokenType by this tokenizer. true if EOL is significant, false otherwise.

Declaration

public virtual bool IsEOLSignificant { get; set; }

Property Value

Type	Description
System.Boolean

| Improve this Doc View Source

LineNumber

Gets the current line number.

Declaration

public int LineNumber { get; }

Property Value

Type	Description
System.Int32

| Improve this Doc View Source

LowerCaseMode

Specifies whether word tokens should be converted to lower case when they are stored in StringValue. true if StringValue should be converted to lower case, false otherwise.

Declaration

public bool LowerCaseMode { get; set; }

Property Value

Type	Description
System.Boolean

| Improve this Doc View Source

NumberValue

Contains a number if the current token is a number (TokenType == TT_NUMBER).

Declaration

public double NumberValue { get; set; }

Property Value

Type	Description
System.Double

| Improve this Doc View Source

SlashSlashComments

Specifies whether "slash-slash" (C++-style) comments shall be recognized. This kind of comment ends at the end of the line. true if // should be recognized as the start of a comment, false otherwise.

Declaration

public bool SlashSlashComments { get; set; }

Property Value

Type	Description
System.Boolean

| Improve this Doc View Source

SlashStarComments

Specifies whether "slash-star" (C-style) comments shall be recognized. Slash-star comments cannot be nested and end when a star-slash combination is found. true if /* should be recognized as the start of a comment, false otherwise.

Declaration

public bool SlashStarComments { get; set; }

Property Value

Type	Description
System.Boolean

| Improve this Doc View Source

StringValue

Contains a string if the current token is a word (TokenType == TT_WORD).

Declaration

public string StringValue { get; set; }

Property Value

Type	Description
System.String

| Improve this Doc View Source

TokenType

After calling {@code nextToken()}, {@code ttype} contains the type of token that has been read. When a single character is read, its value converted to an integer is stored in {@code ttype}. For a quoted string, the value is the quoted character. Otherwise, its value is one of the following:

TT_WORD - the token is a word.
TT_NUMBER - the token is a number.
TT_EOL - the end of line has been reached. Depends on whether IsEOLSignificant is true.
TT_EOF - the end of the stream has been reached.

Declaration

public int TokenType { get; }

Property Value

Type	Description
System.Int32

Methods

| Improve this Doc View Source

CommentChar(Int32)

Specifies that the character ch shall be treated as a comment character.

Declaration

public virtual void CommentChar(int ch)

Parameters

Type	Name	Description
System.Int32	ch	The character to be considered a comment character.

| Improve this Doc View Source

NextToken()

Parses the next token from this tokenizer's source stream or reader. The type of the token is stored in the TokenType field, additional information may be stored in the NumberValue or StringValue fields.

Declaration

public int NextToken()

Returns

Type	Description
System.Int32	The value of TokenType.

Exceptions

Type	Condition
System.IO.IOException	If an I/O error occurs while parsing the next token.

| Improve this Doc View Source

OrdinaryChar(Int32)

Specifies that the character ch shall be treated as an ordinary character by this tokenizer. That is, it has no special meaning as a comment character, word component, white space, string delimiter or number.

Declaration

public void OrdinaryChar(int ch)

Parameters

Type	Name	Description
System.Int32	ch	The character to be considered an ordinary character.

| Improve this Doc View Source

OrdinaryChars(Int32, Int32)

Specifies that the characters in the range from low to hi shall be treated as an ordinary character by this tokenizer. That is, they have no special meaning as a comment character, word component, white space, string delimiter or number.

Declaration

public void OrdinaryChars(int low, int hi)

Parameters

Type	Name	Description
System.Int32	low	The first character in the range of ordinary characters.
System.Int32	hi	The last character in the range of ordinary characters.

| Improve this Doc View Source

ParseNumbers()

Specifies that this tokenizer shall parse numbers.

Declaration

public void ParseNumbers()

| Improve this Doc View Source

PushBack()

Indicates that the current token should be pushed back and returned again the next time NextToken() is called.

Declaration

public void PushBack()

| Improve this Doc View Source

QuoteChar(Int32)

Specifies that the character ch shall be treated as a quote character.

Declaration

public void QuoteChar(int ch)

Parameters

Type	Name	Description
System.Int32	ch	The character to be considered a quote character.

| Improve this Doc View Source

ResetSyntax()

Specifies that all characters shall be treated as ordinary characters.

Declaration

public void ResetSyntax()

| Improve this Doc View Source

ToString()

Returns the state of this tokenizer in a readable format.

Declaration

public override string ToString()

Returns

Type	Description
System.String	The current state of this tokenizer.

Overrides

System.Object.ToString()

| Improve this Doc View Source

WhitespaceChars(Int32, Int32)

Specifies that the characters in the range from low to hi shall be treated as whitespace characters by this tokenizer.

Declaration

public void WhitespaceChars(int low, int hi)

Parameters

Type	Name	Description
System.Int32	low	The first character in the range of whitespace characters.
System.Int32	hi	The last character in the range of whitespace characters.

| Improve this Doc View Source

WordChars(Int32, Int32)

Specifies that the characters in the range from low to hi shall be treated as word characters by this tokenizer. A word consists of a word character followed by zero or more word or number characters.

Declaration

public void WordChars(int low, int hi)

Parameters

Type	Name	Description
System.Int32	low	The first character in the range of word characters.
System.Int32	hi	The last character in the range of word characters.

Extension Methods

Number.IsNumber(Object)