Fork me on GitHub
Search Results for

    Show / Hide Table of Contents

    list-high-freq-terms

    Name

    index-list-high-freq-terms - Lists the top N most frequent terms by document frequency.

    Synopsis

    lucene index list-high-freq-terms [<INDEX_DIRECTORY>] [-t|--total-term-frequency] [-n|--number-of-terms] [-f|--field] [?|-h|--help]
    

    Description

    Extracts the top N most frequent terms (by document frequency) from an existing Lucene index and reports their document frequency.

    Arguments

    INDEX_DIRECTORY

    The directory of the index. If omitted, it defaults to the current working directory.

    Options

    ?|-h|--help

    Prints out a short help for the command.

    -t|--total-term-frequency

    Specifies that both the document frequency and term frequency are reported, ordered by descending total term frequency.

    -n|--number-of-terms <NUMBER>

    The number of terms to consider. If omitted, defaults to 100.

    -f|--field <FIELD>

    The field to consider. If omitted, considers all fields.

    Examples

    List the high frequency terms in the index located at F:\product-index\ on the description field, reporting both document frequency and term frequency:

    lucene index list-high-freq-terms F:\product-index --total-term-frequency --field description
    

    List the high frequency terms in the index located at C:\lucene-index\ on the name field, tracking 30 terms:

    lucene index list-high-freq-terms C:\lucene-index --f name -n 30
    
    • Improve this Doc
    In This Article
    Back to top Copyright © 2021 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.