Class CharacterUtils

CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a LuceneVersion instance.

This is a Lucene.NET INTERNAL API, use at your own risk

Inheritance

System.Object

CharacterUtils

Namespace: Lucene.Net.Analysis.Util

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public abstract class CharacterUtils : object

Methods

| Improve this Doc View Source

CodePointAt(ICharSequence, Int32)

Declaration

public abstract int CodePointAt(ICharSequence seq, int offset)

Parameters

Type	Name	Description
ICharSequence	seq
System.Int32	offset

Returns

Type	Description
System.Int32

| Improve this Doc View Source

CodePointAt(Char[], Int32, Int32)

Returns the code point at the given index of the char array where only elements with index less than the limit are used. Depending on the LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Declaration

public abstract int CodePointAt(char[] chars, int offset, int limit)

Parameters

Type	Name	Description
System.Char[]	chars	a character array
System.Int32	offset	the offset to the char values in the chars array to be converted
System.Int32	limit	the index afer the last element that should be used to calculate codepoint.

Returns

Type	Description
System.Int32	the Unicode code point at the given index

| Improve this Doc View Source

CodePointAt(String, Int32)

Returns the code point at the given index of the ICharSequence. Depending on the LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

Declaration

public abstract int CodePointAt(string seq, int offset)

Parameters

Type	Name	Description
System.String	seq	a character sequence
System.Int32	offset	the offset to the char values in the chars array to be converted

Returns

Type	Description
System.Int32	the Unicode code point at the given index

| Improve this Doc View Source

CodePointCount(String)

Return the number of characters in seq.

Declaration

public abstract int CodePointCount(string seq)

Parameters

Type	Name	Description
System.String	seq

Returns

Type	Description
System.Int32

| Improve this Doc View Source

Fill(CharacterUtils.CharacterBuffer, TextReader)

Convenience method which calls Fill(buffer, reader, buffer.Buffer.Length).

Declaration

public virtual bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader)

Parameters

Type	Name	Description
CharacterUtils.CharacterBuffer	buffer
TextReader	reader

Returns

Type	Description
System.Boolean

| Improve this Doc View Source

Fill(CharacterUtils.CharacterBuffer, TextReader, Int32)

Fills the CharacterUtils.CharacterBuffer with characters read from the given reader . This method tries to read

numChars

characters into the CharacterUtils.CharacterBuffer, each call to fill will start filling the buffer from offset 0 up to numChars. In case code points can span across 2 java characters, this method may only fill numChars - 1 characters in order not to split in the middle of a surrogate pair, even if there are remaining characters in the .

Depending on the LuceneVersion passed to GetInstance(LuceneVersion) this method implements supplementary character awareness when filling the given buffer. For all LuceneVersion > 3.0 Fill(CharacterUtils.CharacterBuffer, TextReader, Int32) guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.

A return value of false means that this method call exhausted the reader, but there may be some bytes which have been read, which can be verified by checking whether buffer.Length > 0.

Declaration

public abstract bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader, int numChars)

Parameters

Type	Name	Description
CharacterUtils.CharacterBuffer	buffer	the buffer to fill.
TextReader	reader	the reader to read characters from.
System.Int32	numChars	the number of chars to read

Returns

Type	Description
System.Boolean	`false` if and only if reader.read returned -1 while trying to fill the buffer

| Improve this Doc View Source

GetInstance(LuceneVersion)

Returns a CharacterUtils implementation according to the given LuceneVersion instance.

Declaration

public static CharacterUtils GetInstance(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion	a version instance

Returns

Type	Description
CharacterUtils	a CharacterUtils implementation according to the given LuceneVersion instance.

| Improve this Doc View Source

GetJava4Instance(LuceneVersion)

Return a CharacterUtils instance compatible with Java 1.4.

Declaration

public static CharacterUtils GetJava4Instance(LuceneVersion matchVersion)

Parameters

Type	Name	Description
LuceneVersion	matchVersion

Returns

Type	Description
CharacterUtils

| Improve this Doc View Source

NewCharacterBuffer(Int32)

Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.

Declaration

public static CharacterUtils.CharacterBuffer NewCharacterBuffer(int bufferSize)

Parameters

Type	Name	Description
System.Int32	bufferSize	the internal char buffer size, must be `>= 2`

Returns

Type	Description
CharacterUtils.CharacterBuffer	a new CharacterUtils.CharacterBuffer instance.

| Improve this Doc View Source

OffsetByCodePoints(Char[], Int32, Int32, Int32, Int32)

Return the index within buf[start:start+count] which is by offset code points from index.

Declaration

public abstract int OffsetByCodePoints(char[] buf, int start, int count, int index, int offset)

Parameters

Type	Name	Description
System.Char[]	buf
System.Int32	start
System.Int32	count
System.Int32	index
System.Int32	offset

Returns

Type	Description
System.Int32

| Improve this Doc View Source

ToChars(Int32[], Int32, Int32, Char[], Int32)

Converts a sequence of unicode code points to a sequence of .NET characters.

Declaration

public int ToChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)

Parameters

Type	Name	Description
System.Int32[]	src
System.Int32	srcOff
System.Int32	srcLen
System.Char[]	dest
System.Int32	destOff

Returns

Type	Description
System.Int32	the number of chars written to the destination buffer

| Improve this Doc View Source

ToCodePoints(Char[], Int32, Int32, Int32[], Int32)

Converts a sequence of .NET characters to a sequence of unicode code points.

Declaration

public int ToCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)

Parameters

Type	Name	Description
System.Char[]	src
System.Int32	srcOff
System.Int32	srcLen
System.Int32[]	dest
System.Int32	destOff

Returns

Type	Description
System.Int32	the number of code points written to the destination buffer

| Improve this Doc View Source

ToLower(Char[], Int32, Int32)

Converts each unicode codepoint to lowerCase via starting at the given offset.

Declaration

public virtual void ToLower(char[] buffer, int offset, int limit)

Parameters

Type	Name	Description
System.Char[]	buffer	the char buffer to lowercase
System.Int32	offset	the offset to start at
System.Int32	limit	the max char in the buffer to lower case

| Improve this Doc View Source

ToUpper(Char[], Int32, Int32)

Converts each unicode codepoint to UpperCase via starting at the given offset.

Declaration

public virtual void ToUpper(char[] buffer, int offset, int limit)

Parameters

Type	Name	Description
System.Char[]	buffer	the char buffer to UPPERCASE
System.Int32	offset	the offset to start at
System.Int32	limit	the max char in the buffer to lower case