Class TokenStreamToAutomaton
Consumes a TokenStream and creates an Automaton where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the ITermToBytesRefAttribute. Between tokens we insert POS_SEP and for holes we insert HOLE.
Note
This API is experimental and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Analysis
Assembly: Lucene.Net.dll
Syntax
public class TokenStreamToAutomaton
Constructors
TokenStreamToAutomaton()
Sole constructor.
Declaration
public TokenStreamToAutomaton()
Fields
HOLE
We add this arc to represent a hole.
Declaration
public const int HOLE = 30
Field Value
Type | Description |
---|---|
int |
POS_SEP
We create transition between two adjacent tokens.
Declaration
public const int POS_SEP = 31
Field Value
Type | Description |
---|---|
int |
Properties
PreservePositionIncrements
Whether to generate holes in the automaton for missing positions, true
by default.
Declaration
public virtual bool PreservePositionIncrements { get; set; }
Property Value
Type | Description |
---|---|
bool |
UnicodeArcs
Whether to make transition labels Unicode code points instead of UTF8 bytes,
false
by default
Declaration
public virtual bool UnicodeArcs { get; set; }
Property Value
Type | Description |
---|---|
bool |
Methods
ChangeToken(BytesRef)
Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.
Declaration
protected virtual BytesRef ChangeToken(BytesRef @in)
Parameters
Type | Name | Description |
---|---|---|
BytesRef | in |
Returns
Type | Description |
---|---|
BytesRef |
ToAutomaton(TokenStream)
Pulls the graph (including IPositionLengthAttribute from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
Declaration
public virtual Automaton ToAutomaton(TokenStream @in)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | in |
Returns
Type | Description |
---|---|
Automaton |