Class ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
Characters from the following Unicode blocks are converted; however, only those characters with reasonable ASCII alternatives are converted:
- C1 Controls and Latin-1 Supplement: http://www.unicode.org/charts/PDF/U0080.pdf
- Latin Extended-A: http://www.unicode.org/charts/PDF/U0100.pdf
- Latin Extended-B: http://www.unicode.org/charts/PDF/U0180.pdf
- Latin Extended Additional: http://www.unicode.org/charts/PDF/U1E00.pdf
- Latin Extended-C: http://www.unicode.org/charts/PDF/U2C60.pdf
- Latin Extended-D: http://www.unicode.org/charts/PDF/UA720.pdf
- IPA Extensions: http://www.unicode.org/charts/PDF/U0250.pdf
- Phonetic Extensions: http://www.unicode.org/charts/PDF/U1D00.pdf
- Phonetic Extensions Supplement: http://www.unicode.org/charts/PDF/U1D80.pdf
- General Punctuation: http://www.unicode.org/charts/PDF/U2000.pdf
- Superscripts and Subscripts: http://www.unicode.org/charts/PDF/U2070.pdf
- Enclosed Alphanumerics: http://www.unicode.org/charts/PDF/U2460.pdf
- Dingbats: http://www.unicode.org/charts/PDF/U2700.pdf
- Supplemental Punctuation: http://www.unicode.org/charts/PDF/U2E00.pdf
- Alphabetic Presentation Forms: http://www.unicode.org/charts/PDF/UFB00.pdf
- Halfwidth and Fullwidth Forms: http://www.unicode.org/charts/PDF/UFF00.pdf
See: http://en.wikipedia.org/wiki/Latin_characters_in_Unicode
For example, 'à' will be replaced by 'a'.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Miscellaneous
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public sealed class ASCIIFoldingFilter : TokenFilter, IDisposable
Constructors
| Improve this Doc View SourceASCIIFoldingFilter(TokenStream)
Declaration
public ASCIIFoldingFilter(TokenStream input)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | input |
ASCIIFoldingFilter(TokenStream, Boolean)
Create a new ASCIIFoldingFilter.
Declaration
public ASCIIFoldingFilter(TokenStream input, bool preserveOriginal)
Parameters
Type | Name | Description |
---|---|---|
TokenStream | input | TokenStream to filter |
System.Boolean | preserveOriginal | should the original tokens be kept on the input stream with a 0 position increment from the folded tokens? |
Properties
| Improve this Doc View SourcePreserveOriginal
Does the filter preserve the original tokens?
Declaration
public bool PreserveOriginal { get; }
Property Value
Type | Description |
---|---|
System.Boolean |
Methods
| Improve this Doc View SourceFoldToASCII(Char[], Int32)
Converts characters above ASCII to their ASCII equivalents. For example, accents are removed from accented characters.
Declaration
public void FoldToASCII(char[] input, int length)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | input | The string to fold |
System.Int32 | length | The number of characters in the input string |
FoldToASCII(Char[], Int32, Char[], Int32, Int32)
Converts characters above ASCII to their ASCII equivalents. For example, accents are removed from accented characters.
Declaration
public static int FoldToASCII(char[] input, int inputPos, char[] output, int outputPos, int length)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | input | The characters to fold |
System.Int32 | inputPos | Index of the first character to fold |
System.Char[] | output | The result of the folding. Should be of size >= |
System.Int32 | outputPos | Index of output where to put the result of the folding |
System.Int32 | length | The number of characters to fold |
Returns
Type | Description |
---|---|
System.Int32 | length of output |
IncrementToken()
Declaration
public override bool IncrementToken()
Returns
Type | Description |
---|---|
System.Boolean |
Overrides
| Improve this Doc View SourceReset()
Declaration
public override void Reset()