Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Enum JapaneseTokenizerMode

    Tokenization mode: this determines how the tokenizer handles compound and unknown words.

    Namespace: Lucene.Net.Analysis.Ja
    Assembly: Lucene.Net.Analysis.Kuromoji.dll
    Syntax
    public enum JapaneseTokenizerMode

    Fields

    Name Description
    EXTENDED

    Extended mode outputs unigrams for unknown words.

    Note

    This API is experimental and might change in incompatible ways in the next release.

    NORMAL

    Ordinary segmentation: no decomposition for compounds,

    SEARCH

    Segmentation geared towards search: this includes a decompounding process for long nouns, also including the full compound token as a synonym.

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.