Class CharacterUtils
CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Lucene.Net.Util.LuceneVersion instance.
Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Inheritance
Inherited Members
Namespace: Lucene.Net.Analysis.Util
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public abstract class CharacterUtils
Methods
| Improve this Doc View SourceCodePointAt(ICharSequence, Int32)
Returns the code point at the given index of the J2N.Text.ICharSequence.
Depending on the Lucene.Net.Util.LuceneVersion passed to
GetInstance(LuceneVersion) this method mimics the behavior
of Character.CodePointAt(char[], int)
as it would have been
available on a Java 1.4 JVM or on a later virtual machine version.
Declaration
public abstract int CodePointAt(ICharSequence seq, int offset)
Parameters
Type | Name | Description |
---|---|---|
J2N.Text.ICharSequence | seq | a character sequence |
System.Int32 | offset | the offset to the char values in the chars array to be converted |
Returns
Type | Description |
---|---|
System.Int32 | the Unicode code point at the given index |
Exceptions
Type | Condition |
---|---|
System.ArgumentNullException |
|
System.ArgumentOutOfRangeException |
|
CodePointAt(Char[], Int32, Int32)
Returns the code point at the given index of the char array where only elements
with index less than the limit are used.
Depending on the Lucene.Net.Util.LuceneVersion passed to
GetInstance(LuceneVersion) this method mimics the behavior
of Character.CodePointAt(char[], int)
as it would have been
available on a Java 1.4 JVM or on a later virtual machine version.
Declaration
public abstract int CodePointAt(char[] chars, int offset, int limit)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | chars | a character array |
System.Int32 | offset | the offset to the char values in the chars array to be converted |
System.Int32 | limit | the index afer the last element that should be used to calculate codepoint. |
Returns
Type | Description |
---|---|
System.Int32 | the Unicode code point at the given index |
Exceptions
Type | Condition |
---|---|
System.ArgumentNullException |
|
System.ArgumentOutOfRangeException |
|
CodePointAt(String, Int32)
Returns the code point at the given index of the System.String.
Depending on the Lucene.Net.Util.LuceneVersion passed to
GetInstance(LuceneVersion) this method mimics the behavior
of Character.CodePointAt(char[], int)
as it would have been
available on a Java 1.4 JVM or on a later virtual machine version.
Declaration
public abstract int CodePointAt(string seq, int offset)
Parameters
Type | Name | Description |
---|---|---|
System.String | seq | a character sequence |
System.Int32 | offset | the offset to the char values in the chars array to be converted |
Returns
Type | Description |
---|---|
System.Int32 | the Unicode code point at the given index |
Exceptions
Type | Condition |
---|---|
System.ArgumentNullException |
|
System.ArgumentOutOfRangeException |
|
CodePointCount(ICharSequence)
Return the number of characters in seq
.
Declaration
public abstract int CodePointCount(ICharSequence seq)
Parameters
Type | Name | Description |
---|---|---|
J2N.Text.ICharSequence | seq |
Returns
Type | Description |
---|---|
System.Int32 |
CodePointCount(Char[])
Return the number of characters in seq
.
Declaration
public abstract int CodePointCount(char[] seq)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | seq |
Returns
Type | Description |
---|---|
System.Int32 |
CodePointCount(String)
Return the number of characters in seq
.
Declaration
public abstract int CodePointCount(string seq)
Parameters
Type | Name | Description |
---|---|---|
System.String | seq |
Returns
Type | Description |
---|---|
System.Int32 |
CodePointCount(StringBuilder)
Return the number of characters in seq
.
Declaration
public abstract int CodePointCount(StringBuilder seq)
Parameters
Type | Name | Description |
---|---|---|
System.Text.StringBuilder | seq |
Returns
Type | Description |
---|---|
System.Int32 |
Fill(CharacterUtils.CharacterBuffer, TextReader)
Convenience method which calls Fill(buffer, reader, buffer.Buffer.Length)
.
Declaration
public virtual bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader)
Parameters
Type | Name | Description |
---|---|---|
CharacterUtils.CharacterBuffer | buffer | |
System.IO.TextReader | reader |
Returns
Type | Description |
---|---|
System.Boolean |
Fill(CharacterUtils.CharacterBuffer, TextReader, Int32)
Fills the CharacterUtils.CharacterBuffer with characters read from the given reader System.IO.TextReader. This method tries to read
numChars
characters into the CharacterUtils.CharacterBuffer, each call to fill will start
filling the buffer from offset 0
up to numChars
.
In case code points can span across 2 java characters, this method may
only fill numChars - 1
characters in order not to split in
the middle of a surrogate pair, even if there are remaining characters in
the System.IO.TextReader.
Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method implements supplementary character awareness when filling the given buffer. For all Lucene.Net.Util.LuceneVersion > 3.0 Fill(CharacterUtils.CharacterBuffer, TextReader, Int32) guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.
A return value of false
means that this method call exhausted
the reader, but there may be some bytes which have been read, which can be
verified by checking whether buffer.Length > 0
.
Declaration
public abstract bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader, int numChars)
Parameters
Type | Name | Description |
---|---|---|
CharacterUtils.CharacterBuffer | buffer | the buffer to fill. |
System.IO.TextReader | reader | the reader to read characters from. |
System.Int32 | numChars | the number of chars to read |
Returns
Type | Description |
---|---|
System.Boolean | if and only if reader.read returned -1 while trying to fill the buffer |
Exceptions
Type | Condition |
---|---|
System.IO.IOException | if the reader throws an System.IO.IOException. |
GetInstance(LuceneVersion)
Returns a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance.
Declaration
public static CharacterUtils GetInstance(LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Util.LuceneVersion | matchVersion | a version instance |
Returns
Type | Description |
---|---|
CharacterUtils | a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance. |
GetJava4Instance(LuceneVersion)
Return a CharacterUtils instance compatible with Java 1.4.
Declaration
public static CharacterUtils GetJava4Instance(LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
Lucene.Net.Util.LuceneVersion | matchVersion |
Returns
Type | Description |
---|---|
CharacterUtils |
NewCharacterBuffer(Int32)
Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.
Declaration
public static CharacterUtils.CharacterBuffer NewCharacterBuffer(int bufferSize)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | bufferSize | the internal char buffer size, must be |
Returns
Type | Description |
---|---|
CharacterUtils.CharacterBuffer | a new CharacterUtils.CharacterBuffer instance. |
OffsetByCodePoints(Char[], Int32, Int32, Int32, Int32)
Return the index within buf[start:start+count]
which is by offset
code points from index
.
Declaration
public abstract int OffsetByCodePoints(char[] buf, int start, int count, int index, int offset)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | buf | |
System.Int32 | start | |
System.Int32 | count | |
System.Int32 | index | |
System.Int32 | offset |
Returns
Type | Description |
---|---|
System.Int32 |
ToChars(Int32[], Int32, Int32, Char[], Int32)
Converts a sequence of unicode code points to a sequence of .NET characters.
Declaration
public int ToChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
Parameters
Type | Name | Description |
---|---|---|
System.Int32[] | src | |
System.Int32 | srcOff | |
System.Int32 | srcLen | |
System.Char[] | dest | |
System.Int32 | destOff |
Returns
Type | Description |
---|---|
System.Int32 | the number of chars written to the destination buffer |
ToCodePoints(Char[], Int32, Int32, Int32[], Int32)
Converts a sequence of .NET characters to a sequence of unicode code points.
Declaration
public int ToCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | src | |
System.Int32 | srcOff | |
System.Int32 | srcLen | |
System.Int32[] | dest | |
System.Int32 | destOff |
Returns
Type | Description |
---|---|
System.Int32 | The number of code points written to the destination buffer. |
ToLower(Char[], Int32, Int32)
Converts each unicode codepoint to lowerCase via System.Globalization.TextInfo.ToLower(System.String) in the invariant culture starting at the given offset.
Declaration
public virtual void ToLower(char[] buffer, int offset, int length)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | buffer | the char buffer to lowercase |
System.Int32 | offset | the offset to start at |
System.Int32 | length | the number of characters in the buffer to lower case |
ToUpper(Char[], Int32, Int32)
Converts each unicode codepoint to UpperCase via System.Globalization.TextInfo.ToUpper(System.String) in the invariant culture starting at the given offset.
Declaration
public virtual void ToUpper(char[] buffer, int offset, int length)
Parameters
Type | Name | Description |
---|---|---|
System.Char[] | buffer | the char buffer to UPPERCASE |
System.Int32 | offset | the offset to start at |
System.Int32 | length | the number of characters in the buffer to lower case |