Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Enum JapaneseTokenizerMode

    Tokenization mode: this determines how the tokenizer handles compound and unknown words.

    Namespace: Lucene.Net.Analysis.Ja
    Assembly: Lucene.Net.Analysis.Kuromoji.dll
    Syntax
    public enum JapaneseTokenizerMode

    Fields

    Name Description
    EXTENDED

    Extended mode outputs unigrams for unknown words.

    This is a Lucene.NET EXPERIMENTAL API, use at your own risk
    NORMAL

    Ordinary segmentation: no decomposition for compounds,

    SEARCH

    Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.

    • Improve this Doc
    • View Source
    Back to top Copyright © 2021 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.