Class CharacterUtils

CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Lucene.Net.Util.LuceneVersion instance.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

Inheritance

object

CharacterUtils

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Util

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public abstract class CharacterUtils

Methods

CodePointAt(ICharSequence, int)

Returns the code point at the given index of the J2N.Text.ICharSequence. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Declaration

public abstract int CodePointAt(ICharSequence seq, int offset)

Parameters

Type	Name	Description
ICharSequence	seq	a character sequence
int	offset	the offset to the char values in the chars array to be converted

Returns

Type	Description
int	the Unicode code point at the given index

Exceptions

Type	Condition
ArgumentNullException	if the sequence is null.
ArgumentOutOfRangeException	if the value offset is negative or not less than the length of the character sequence.

CodePointAt(char[], int, int)

Returns the code point at the given index of the char array where only elements with index less than the limit are used. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Declaration

public abstract int CodePointAt(char[] chars, int offset, int limit)

Parameters

Type	Name	Description
char[]	chars	a character array
int	offset	the offset to the char values in the chars array to be converted
int	limit	the index afer the last element that should be used to calculate codepoint.

Returns

Type	Description
int	the Unicode code point at the given index

Exceptions

Type	Condition
ArgumentNullException	if the array is null.
ArgumentOutOfRangeException	if the value offset is negative or not less than the length of the char array.

CodePointAt(string, int)

Returns the code point at the given index of the string. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Declaration

public abstract int CodePointAt(string seq, int offset)

Parameters

Type	Name	Description
string	seq	a character sequence
int	offset	the offset to the char values in the chars array to be converted

Returns

Type	Description
int	the Unicode code point at the given index

Exceptions

Type	Condition
ArgumentNullException	if the sequence is null.
ArgumentOutOfRangeException	if the value offset is negative or not less than the length of the character sequence.

CodePointCount(ICharSequence)

Return the number of characters in seq.

Declaration

public abstract int CodePointCount(ICharSequence seq)

Parameters

Type	Name	Description
ICharSequence	seq

Returns

Type	Description
int

CodePointCount(char[])

Return the number of characters in seq.

Declaration

public abstract int CodePointCount(char[] seq)

Parameters

Type	Name	Description
char[]	seq

Returns

Type	Description
int

CodePointCount(string)

Return the number of characters in seq.

Declaration

public abstract int CodePointCount(string seq)

Parameters

Type	Name	Description
string	seq

Returns

Type	Description
int

CodePointCount(StringBuilder)

Return the number of characters in seq.

Declaration

public abstract int CodePointCount(StringBuilder seq)

Parameters

Type	Name	Description
StringBuilder	seq

Returns

Type	Description
int

Fill(CharacterBuffer, TextReader)

Convenience method which calls Fill(buffer, reader, buffer.Buffer.Length).

Declaration

public virtual bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader)

Parameters

Type	Name	Description
CharacterUtils.CharacterBuffer	buffer
TextReader	reader

Returns

Type	Description
bool

Fill(CharacterBuffer, TextReader, int)

Fills the CharacterUtils.CharacterBuffer with characters read from the given reader TextReader. This method tries to read

numChars

characters into the CharacterUtils.CharacterBuffer, each call to fill will start filling the buffer from offset 0 up to numChars. In case code points can span across 2 java characters, this method may only fill numChars - 1 characters in order not to split in the middle of a surrogate pair, even if there are remaining characters in the TextReader.

Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method implements supplementary character awareness when filling the given buffer. For all Lucene.Net.Util.LuceneVersion > 3.0 Fill(CharacterBuffer, TextReader, int) guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.

A return value of false means that this method call exhausted the reader, but there may be some bytes which have been read, which can be verified by checking whether buffer.Length > 0.

Declaration

public abstract bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader, int numChars)

Parameters

Type	Name	Description
CharacterUtils.CharacterBuffer	buffer	the buffer to fill.
TextReader	reader	the reader to read characters from.
int	numChars	the number of chars to read

Returns

Type	Description
bool	`false` if and only if reader.read returned -1 while trying to fill the buffer

Exceptions

Type	Condition
IOException	if the reader throws an IOException.

GetInstance(LuceneVersion)

Returns a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance.

Declaration

public static CharacterUtils GetInstance(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	a version instance

Returns

Type	Description
CharacterUtils	a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance.

GetJava4Instance(LuceneVersion)

Return a CharacterUtils instance compatible with Java 1.4.

Declaration

public static CharacterUtils GetJava4Instance(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion

Returns

Type	Description
CharacterUtils

NewCharacterBuffer(int)

Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.

Declaration

public static CharacterUtils.CharacterBuffer NewCharacterBuffer(int bufferSize)

Parameters

Type	Name	Description
int	bufferSize	the internal char buffer size, must be `>= 2`

Returns

Type	Description
CharacterUtils.CharacterBuffer	a new CharacterUtils.CharacterBuffer instance.

OffsetByCodePoints(char[], int, int, int, int)

Return the index within buf[start:start+count] which is by offset code points from index.

Declaration

public abstract int OffsetByCodePoints(char[] buf, int start, int count, int index, int offset)

Parameters

Type	Name	Description
char[]	buf
int	start
int	count
int	index
int	offset

Returns

Type	Description
int

ToChars(int[], int, int, char[], int)

Converts a sequence of unicode code points to a sequence of .NET characters.

Declaration

public int ToChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)

Parameters

Type	Name	Description
int[]	src
int	srcOff
int	srcLen
char[]	dest
int	destOff

Returns

Type	Description
int	the number of chars written to the destination buffer

ToCodePoints(char[], int, int, int[], int)

Converts a sequence of .NET characters to a sequence of unicode code points.

Declaration

public int ToCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)

Parameters

Type	Name	Description
char[]	src
int	srcOff
int	srcLen
int[]	dest
int	destOff

Returns

Type	Description
int	The number of code points written to the destination buffer.

ToLower(char[], int, int)

Converts each unicode codepoint to lowerCase via ToLower(string) in the invariant culture starting at the given offset.

Declaration

public virtual void ToLower(char[] buffer, int offset, int length)

Parameters

Type	Name	Description
char[]	buffer	the char buffer to lowercase
int	offset	the offset to start at
int	length	the number of characters in the buffer to lower case

ToUpper(char[], int, int)

Converts each unicode codepoint to UpperCase via ToUpper(string) in the invariant culture starting at the given offset.

Declaration

public virtual void ToUpper(char[] buffer, int offset, int length)

Parameters

Type	Name	Description
char[]	buffer	the char buffer to UPPERCASE
int	offset	the offset to start at
int	length	the number of characters in the buffer to lower case