Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class JapaneseIterationMarkCharFilter

    Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

    Inheritance
    object
    MarshalByRefObject
    TextReader
    CharFilter
    JapaneseIterationMarkCharFilter
    Implements
    IDisposable
    Inherited Members
    CharFilter.m_input
    CharFilter.Dispose(bool)
    CharFilter.CorrectOffset(int)
    CharFilter.Skip(int)
    CharFilter.Reset()
    CharFilter.IsReady
    CharFilter.IsMarkSupported
    CharFilter.Mark(int)
    TextReader.Null
    TextReader.Close()
    TextReader.Dispose()
    TextReader.Peek()
    TextReader.Read(Span<char>)
    TextReader.ReadAsync(char[], int, int)
    TextReader.ReadAsync(Memory<char>, CancellationToken)
    TextReader.ReadBlock(char[], int, int)
    TextReader.ReadBlock(Span<char>)
    TextReader.ReadBlockAsync(char[], int, int)
    TextReader.ReadBlockAsync(Memory<char>, CancellationToken)
    TextReader.ReadLine()
    TextReader.ReadLineAsync()
    TextReader.ReadLineAsync(CancellationToken)
    TextReader.ReadToEnd()
    TextReader.ReadToEndAsync()
    TextReader.ReadToEndAsync(CancellationToken)
    TextReader.Synchronized(TextReader)
    MarshalByRefObject.GetLifetimeService()
    MarshalByRefObject.InitializeLifetimeService()
    MarshalByRefObject.MemberwiseClone(bool)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Ja
    Assembly: Lucene.Net.Analysis.Kuromoji.dll
    Syntax
    public class JapaneseIterationMarkCharFilter : CharFilter, IDisposable
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Constructors

    JapaneseIterationMarkCharFilter(TextReader)

    Constructor. Normalizes both kanji and kana iteration marks by default.

    Declaration
    public JapaneseIterationMarkCharFilter(TextReader input)
    Parameters
    Type Name Description
    TextReader input

    Char stream.

    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    JapaneseIterationMarkCharFilter(TextReader, bool, bool)

    Constructor

    Declaration
    public JapaneseIterationMarkCharFilter(TextReader input, bool normalizeKanji, bool normalizeKana)
    Parameters
    Type Name Description
    TextReader input

    Char stream.

    bool normalizeKanji

    Indicates whether kanji iteration marks should be normalized.

    bool normalizeKana

    Indicates whether kana iteration marks should be normalized.

    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Fields

    NORMALIZE_KANA_DEFAULT

    Normalize kana iteration marks by default

    Declaration
    public static readonly bool NORMALIZE_KANA_DEFAULT
    Field Value
    Type Description
    bool
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    NORMALIZE_KANJI_DEFAULT

    Normalize kanji iteration marks by default

    Declaration
    public static readonly bool NORMALIZE_KANJI_DEFAULT
    Field Value
    Type Description
    bool
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Methods

    Correct(int)

    Subclasses override to correct the current offset.

    Declaration
    protected override int Correct(int currentOff)
    Parameters
    Type Name Description
    int currentOff

    current offset

    Returns
    Type Description
    int

    corrected offset

    Overrides
    CharFilter.Correct(int)
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Read()

    Reads the next character from the text reader and advances the character position by one character.

    Declaration
    public override int Read()
    Returns
    Type Description
    int

    The next character from the text reader, or -1 if no more characters are available.

    Overrides
    Lucene.Net.Analysis.CharFilter.Read()
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Read(char[], int, int)

    Reads a specified maximum number of characters from the current reader and writes the data to a buffer, beginning at the specified index.

    Declaration
    public override int Read(char[] buffer, int offset, int length)
    Parameters
    Type Name Description
    char[] buffer

    When this method returns, contains the specified character array with the values between index and (index + count - 1) replaced by the characters read from the current source.

    int offset

    The position in buffer at which to begin writing.

    int length

    The maximum number of characters to read. If the end of the reader is reached before the specified number of characters is read into the buffer, the method returns.

    Returns
    Type Description
    int

    The number of characters that have been read. The number will be less than or equal to count, depending on whether the data is available within the reader. This method returns 0 (zero) if it is called when no more characters are left to read.

    Overrides
    CharFilter.Read(char[], int, int)
    Remarks

    Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

    Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

    The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.