Class TokenStreamToAutomaton
Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the ITermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
Inheritance
Inherited Members
Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net.dll
Syntax
public class TokenStreamToAutomaton
Constructors
| Improve this Doc View SourceTokenStreamToAutomaton()
Sole constructor.
Declaration
public TokenStreamToAutomaton()
Fields
| Improve this Doc View SourceHOLE
We add this arc to represent a hole.
Declaration
public const int HOLE = 30
Field Value
Type | Description |
---|---|
System.Int32 |
POS_SEP
We create transition between two adjacent tokens.
Declaration
public const int POS_SEP = 31
Field Value
Type | Description |
---|---|
System.Int32 |
Properties
| Improve this Doc View SourcePreservePositionIncrements
Whether to generate holes in the automaton for missing positions, true
by default.
Declaration
public virtual bool PreservePositionIncrements { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
UnicodeArcs
Whether to make transition labels Unicode code points instead of UTF8 bytes,
false
by default
Declaration
public virtual bool UnicodeArcs { get; set; }
Property Value
Type | Description |
---|---|
System.Boolean |
Methods
| Improve this Doc View SourceChangeToken(BytesRef)
Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.
Declaration
protected virtual BytesRef ChangeToken(BytesRef in)
Parameters
Type | Name | Description |
---|---|---|
BytesRef | in |
Returns
Type | Description |
---|---|
BytesRef |
ToAutomaton(TokenStream)
Pulls the graph (including IPositionLengthAttribute from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
Declaration
public virtual Automaton ToAutomaton(TokenStream in)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in |
Returns
Type | Description |
---|---|
Automaton |