Class JapaneseIterationMarkCharFilter
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
Implements
Inherited Members
Namespace: Lucene.Net.Analysis.Ja
Assembly: Lucene.Net.Analysis.Kuromoji.dll
Syntax
public class JapaneseIterationMarkCharFilter : CharFilter, IDisposable
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.Constructors
JapaneseIterationMarkCharFilter(TextReader)
Constructor. Normalizes both kanji and kana iteration marks by default.
Declaration
public JapaneseIterationMarkCharFilter(TextReader input)
Parameters
Type | Name | Description |
---|---|---|
TextReader | input | Char stream. |
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.JapaneseIterationMarkCharFilter(TextReader, bool, bool)
Constructor
Declaration
public JapaneseIterationMarkCharFilter(TextReader input, bool normalizeKanji, bool normalizeKana)
Parameters
Type | Name | Description |
---|---|---|
TextReader | input | Char stream. |
bool | normalizeKanji | Indicates whether kanji iteration marks should be normalized. |
bool | normalizeKana | Indicates whether kana iteration marks should be normalized. |
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.Fields
NORMALIZE_KANA_DEFAULT
Normalize kana iteration marks by default
Declaration
public static readonly bool NORMALIZE_KANA_DEFAULT
Field Value
Type | Description |
---|---|
bool |
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.NORMALIZE_KANJI_DEFAULT
Normalize kanji iteration marks by default
Declaration
public static readonly bool NORMALIZE_KANJI_DEFAULT
Field Value
Type | Description |
---|---|
bool |
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.Methods
Correct(int)
Subclasses override to correct the current offset.
Declaration
protected override int Correct(int currentOff)
Parameters
Type | Name | Description |
---|---|---|
int | currentOff | current offset |
Returns
Type | Description |
---|---|
int | corrected offset |
Overrides
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.Read()
Reads the next character from the text reader and advances the character position by one character.
Declaration
public override int Read()
Returns
Type | Description |
---|---|
int | The next character from the text reader, or -1 if no more characters are available. |
Overrides
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.Read(char[], int, int)
Reads a specified maximum number of characters from the current reader and writes the data to a buffer, beginning at the specified index.
Declaration
public override int Read(char[] buffer, int offset, int length)
Parameters
Type | Name | Description |
---|---|---|
char[] | buffer | When this method returns, contains the specified character array with the values between index and (index + count - 1) replaced by the characters read from the current source. |
int | offset | The position in buffer at which to begin writing. |
int | length | The maximum number of characters to read. If the end of the reader is reached before the specified number of characters is read into the buffer, the method returns. |
Returns
Type | Description |
---|---|
int | The number of characters that have been read. The number will be less than or equal to count, depending on whether the data is available within the reader. This method returns 0 (zero) if it is called when no more characters are left to read. |
Overrides
Remarks
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream. The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.