Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class CharacterUtils

    CharacterUtils provides a unified interface to Character-related operations to implement backwards compatible character operations based on a Lucene.Net.Util.LuceneVersion instance.

    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    Inheritance
    object
    CharacterUtils
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Util
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public abstract class CharacterUtils

    Methods

    CodePointAt(ICharSequence, int)

    Returns the code point at the given index of the J2N.Text.ICharSequence. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

    Declaration
    public abstract int CodePointAt(ICharSequence seq, int offset)
    Parameters
    Type Name Description
    ICharSequence seq

    a character sequence

    int offset

    the offset to the char values in the chars array to be converted

    Returns
    Type Description
    int

    the Unicode code point at the given index

    Exceptions
    Type Condition
    ArgumentNullException
    • if the sequence is null.
    ArgumentOutOfRangeException
    • if the value offset is negative or not less than the length of the character sequence.

    CodePointAt(char[], int, int)

    Returns the code point at the given index of the char array where only elements with index less than the limit are used. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

    Declaration
    public abstract int CodePointAt(char[] chars, int offset, int limit)
    Parameters
    Type Name Description
    char[] chars

    a character array

    int offset

    the offset to the char values in the chars array to be converted

    int limit

    the index afer the last element that should be used to calculate codepoint.

    Returns
    Type Description
    int

    the Unicode code point at the given index

    Exceptions
    Type Condition
    ArgumentNullException
    • if the array is null.
    ArgumentOutOfRangeException
    • if the value offset is negative or not less than the length of the char array.

    CodePointAt(string, int)

    Returns the code point at the given index of the string. Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method mimics the behavior of Character.CodePointAt(char[], int) as it would have been available on a Java 1.4 JVM or on a later virtual machine version.

    Declaration
    public abstract int CodePointAt(string seq, int offset)
    Parameters
    Type Name Description
    string seq

    a character sequence

    int offset

    the offset to the char values in the chars array to be converted

    Returns
    Type Description
    int

    the Unicode code point at the given index

    Exceptions
    Type Condition
    ArgumentNullException
    • if the sequence is null.
    ArgumentOutOfRangeException
    • if the value offset is negative or not less than the length of the character sequence.

    CodePointCount(ICharSequence)

    Return the number of characters in seq.

    Declaration
    public abstract int CodePointCount(ICharSequence seq)
    Parameters
    Type Name Description
    ICharSequence seq
    Returns
    Type Description
    int

    CodePointCount(char[])

    Return the number of characters in seq.

    Declaration
    public abstract int CodePointCount(char[] seq)
    Parameters
    Type Name Description
    char[] seq
    Returns
    Type Description
    int

    CodePointCount(string)

    Return the number of characters in seq.

    Declaration
    public abstract int CodePointCount(string seq)
    Parameters
    Type Name Description
    string seq
    Returns
    Type Description
    int

    CodePointCount(StringBuilder)

    Return the number of characters in seq.

    Declaration
    public abstract int CodePointCount(StringBuilder seq)
    Parameters
    Type Name Description
    StringBuilder seq
    Returns
    Type Description
    int

    Fill(CharacterBuffer, TextReader)

    Convenience method which calls Fill(buffer, reader, buffer.Buffer.Length).

    Declaration
    public virtual bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader)
    Parameters
    Type Name Description
    CharacterUtils.CharacterBuffer buffer
    TextReader reader
    Returns
    Type Description
    bool

    Fill(CharacterBuffer, TextReader, int)

    Fills the CharacterUtils.CharacterBuffer with characters read from the given reader TextReader. This method tries to read

    numChars
    characters into the CharacterUtils.CharacterBuffer, each call to fill will start filling the buffer from offset 0 up to numChars. In case code points can span across 2 java characters, this method may only fill numChars - 1 characters in order not to split in the middle of a surrogate pair, even if there are remaining characters in the TextReader.

    Depending on the Lucene.Net.Util.LuceneVersion passed to GetInstance(LuceneVersion) this method implements supplementary character awareness when filling the given buffer. For all Lucene.Net.Util.LuceneVersion > 3.0 Fill(CharacterBuffer, TextReader, int) guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.

    A return value of false means that this method call exhausted the reader, but there may be some bytes which have been read, which can be verified by checking whether buffer.Length > 0.

    Declaration
    public abstract bool Fill(CharacterUtils.CharacterBuffer buffer, TextReader reader, int numChars)
    Parameters
    Type Name Description
    CharacterUtils.CharacterBuffer buffer

    the buffer to fill.

    TextReader reader

    the reader to read characters from.

    int numChars

    the number of chars to read

    Returns
    Type Description
    bool
    false
    if and only if reader.read returned -1 while trying to fill the buffer
    Exceptions
    Type Condition
    IOException

    if the reader throws an IOException.

    GetInstance(LuceneVersion)

    Returns a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance.

    Declaration
    public static CharacterUtils GetInstance(LuceneVersion matchVersion)
    Parameters
    Type Name Description
    LuceneVersion matchVersion

    a version instance

    Returns
    Type Description
    CharacterUtils

    a CharacterUtils implementation according to the given Lucene.Net.Util.LuceneVersion instance.

    GetJava4Instance(LuceneVersion)

    Return a CharacterUtils instance compatible with Java 1.4.

    Declaration
    public static CharacterUtils GetJava4Instance(LuceneVersion matchVersion)
    Parameters
    Type Name Description
    LuceneVersion matchVersion
    Returns
    Type Description
    CharacterUtils

    NewCharacterBuffer(int)

    Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.

    Declaration
    public static CharacterUtils.CharacterBuffer NewCharacterBuffer(int bufferSize)
    Parameters
    Type Name Description
    int bufferSize

    the internal char buffer size, must be >= 2

    Returns
    Type Description
    CharacterUtils.CharacterBuffer

    a new CharacterUtils.CharacterBuffer instance.

    OffsetByCodePoints(char[], int, int, int, int)

    Return the index within buf[start:start+count] which is by offset code points from index.

    Declaration
    public abstract int OffsetByCodePoints(char[] buf, int start, int count, int index, int offset)
    Parameters
    Type Name Description
    char[] buf
    int start
    int count
    int index
    int offset
    Returns
    Type Description
    int

    ToChars(int[], int, int, char[], int)

    Converts a sequence of unicode code points to a sequence of .NET characters.

    Declaration
    public int ToChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
    Parameters
    Type Name Description
    int[] src
    int srcOff
    int srcLen
    char[] dest
    int destOff
    Returns
    Type Description
    int

    the number of chars written to the destination buffer

    ToCodePoints(char[], int, int, int[], int)

    Converts a sequence of .NET characters to a sequence of unicode code points.

    Declaration
    public int ToCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
    Parameters
    Type Name Description
    char[] src
    int srcOff
    int srcLen
    int[] dest
    int destOff
    Returns
    Type Description
    int

    The number of code points written to the destination buffer.

    ToLower(char[], int, int)

    Converts each unicode codepoint to lowerCase via ToLower(string) in the invariant culture starting at the given offset.

    Declaration
    public virtual void ToLower(char[] buffer, int offset, int length)
    Parameters
    Type Name Description
    char[] buffer

    the char buffer to lowercase

    int offset

    the offset to start at

    int length

    the number of characters in the buffer to lower case

    ToUpper(char[], int, int)

    Converts each unicode codepoint to UpperCase via ToUpper(string) in the invariant culture starting at the given offset.

    Declaration
    public virtual void ToUpper(char[] buffer, int offset, int length)
    Parameters
    Type Name Description
    char[] buffer

    the char buffer to UPPERCASE

    int offset

    the offset to start at

    int length

    the number of characters in the buffer to lower case

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.