Show / Hide Table of Contents

    list-high-freq-terms

    Name

    index-list-high-freq-terms - Lists the top N most frequent terms by document frequency.

    Synopsis

    lucene index list-high-freq-terms [<INDEX_DIRECTORY>] [-t|--total-term-frequency] [-n|--number-of-terms] [-f|--field] [?|-h|--help]

    Description

    Extracts the top N most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.

    Arguments

    INDEX_DIRECTORY

    The directory of the index. If omitted, it defaults to the current working directory.

    Options

    ?|-h|--help

    Prints out a short help for the command.

    -t|--total-term-frequency

    Specifies that both the document frequency and term frequency are reported, ordered by descending total term frequency.

    -n|--number-of-terms <NUMBER>

    The number of terms to consider. If omitted, defaults to 100.

    -f|--field <FIELD>

    The field to consider. If omitted, considers all fields.

    Examples

    List the high frequency terms in the index located at F:\product-index\ on the description field, reporting both document frequency and term frequency:

    lucene index list-high-freq-terms F:\product-index --total-term-frequency --field description

    List the high frequency terms in the index located at C:\lucene-index\ on the name field, tracking 30 terms:

    lucene index list-high-freq-terms C:\lucene-index --f name -n 30

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)