Show / Hide Table of Contents

    Enum JapaneseTokenizerMode

    Tokenization mode: this determines how the tokenizer handles compound and unknown words.

    Namespace: Lucene.Net.Analysis.Ja
    Assembly: Lucene.Net.Analysis.Kuromoji.dll
    Syntax
    public enum JapaneseTokenizerMode : int

    Fields

    Name Description
    EXTENDED

    Extended mode outputs unigrams for unknown words.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    NORMAL

    Ordinary segmentation: no decomposition for compounds,

    SEARCH

    Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)