Class JapaneseIterationMarkCharFilter

Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

Inheritance

object

MarshalByRefObject

TextReader

CharFilter

JapaneseIterationMarkCharFilter

Implements

IDisposable

Inherited Members

CharFilter.m_input

CharFilter.Dispose(bool)

CharFilter.CorrectOffset(int)

CharFilter.Skip(int)

CharFilter.Reset()

CharFilter.IsReady

CharFilter.IsMarkSupported

CharFilter.Mark(int)

TextReader.Null

TextReader.Close()

TextReader.Dispose()

TextReader.Peek()

TextReader.Read(Span<char>)

TextReader.ReadAsync(char[], int, int)

TextReader.ReadAsync(Memory<char>, CancellationToken)

TextReader.ReadBlock(char[], int, int)

TextReader.ReadBlock(Span<char>)

TextReader.ReadBlockAsync(char[], int, int)

TextReader.ReadBlockAsync(Memory<char>, CancellationToken)

TextReader.ReadLine()

TextReader.ReadLineAsync()

TextReader.ReadLineAsync(CancellationToken)

TextReader.ReadToEnd()

TextReader.ReadToEndAsync()

TextReader.ReadToEndAsync(CancellationToken)

TextReader.Synchronized(TextReader)

MarshalByRefObject.GetLifetimeService()

MarshalByRefObject.InitializeLifetimeService()

MarshalByRefObject.MemberwiseClone(bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Ja

Assembly: Lucene.Net.Analysis.Kuromoji.dll

Syntax

public class JapaneseIterationMarkCharFilter : CharFilter, IDisposable

Remarks

Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

Constructors

JapaneseIterationMarkCharFilter(TextReader)

Constructor. Normalizes both kanji and kana iteration marks by default.

Declaration

public JapaneseIterationMarkCharFilter(TextReader input)

Parameters

Type	Name	Description
TextReader	input	Char stream.

Remarks

JapaneseIterationMarkCharFilter(TextReader, bool, bool)

Constructor

Declaration

public JapaneseIterationMarkCharFilter(TextReader input, bool normalizeKanji, bool normalizeKana)

Parameters

Type	Name	Description
TextReader	input	Char stream.
bool	normalizeKanji	Indicates whether kanji iteration marks should be normalized.
bool	normalizeKana	Indicates whether kana iteration marks should be normalized.

Remarks

Fields

NORMALIZE_KANA_DEFAULT

Normalize kana iteration marks by default

Declaration

public static readonly bool NORMALIZE_KANA_DEFAULT

Field Value

Type	Description
bool

Remarks

NORMALIZE_KANJI_DEFAULT

Normalize kanji iteration marks by default

Declaration

public static readonly bool NORMALIZE_KANJI_DEFAULT

Field Value

Type	Description
bool

Remarks

Methods

Correct(int)

Subclasses override to correct the current offset.

Declaration

protected override int Correct(int currentOff)

Parameters

Type	Name	Description
int	currentOff	current offset

Returns

Type	Description
int	corrected offset

Overrides

CharFilter.Correct(int)

Remarks

Read()

Reads the next character from the text reader and advances the character position by one character.

Declaration

public override int Read()

Returns

Type	Description
int	The next character from the text reader, or -1 if no more characters are available.

Overrides

Lucene.Net.Analysis.CharFilter.Read()

Remarks

Read(char[], int, int)

Reads a specified maximum number of characters from the current reader and writes the data to a buffer, beginning at the specified index.

Declaration

public override int Read(char[] buffer, int offset, int length)

Parameters

Type	Name	Description
char[]	buffer	When this method returns, contains the specified character array with the values between index and (index + count - 1) replaced by the characters read from the current source.
int	offset	The position in buffer at which to begin writing.
int	length	The maximum number of characters to read. If the end of the reader is reached before the specified number of characters is read into the buffer, the method returns.

Returns

Type	Description
int	The number of characters that have been read. The number will be less than or equal to count, depending on whether the data is available within the reader. This method returns 0 (zero) if it is called when no more characters are left to read.