Show / Hide Table of Contents

    Class WordlistLoader

    Loader for text files that represent a list of stopwords.

    IOUtils to obtain instances.

    This is a Lucene.NET INTERNAL API, use at your own risk
    Inheritance
    System.Object
    WordlistLoader
    Namespace: Lucene.Net.Analysis.Util
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public class WordlistLoader : object

    Methods

    | Improve this Doc View Source

    GetLines(Stream, Encoding)

    Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

    A comment line is any line that starts with the character "#"

    Declaration
    public static IList<string> GetLines(Stream stream, Encoding encoding)
    Parameters
    Type Name Description
    Stream stream
    Encoding encoding
    Returns
    Type Description
    IList<System.String>

    a list of non-blank non-comment lines with whitespace trimmed

    | Improve this Doc View Source

    GetSnowballWordSet(TextReader, CharArraySet)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.

    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    containing a Snowball stopword list

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetSnowballWordSet(TextReader, LuceneVersion)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.

    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    containing a Snowball stopword list

    LuceneVersion matchVersion

    the Lucene LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    | Improve this Doc View Source

    GetStemDict(TextReader, CharArrayMap<String>)

    Reads a stem dictionary. Each line contains:

    word\tstem

    (i.e. two tab separated words)

    Declaration
    public static CharArrayMap<string> GetStemDict(TextReader reader, CharArrayMap<string> result)
    Parameters
    Type Name Description
    TextReader reader
    CharArrayMap<System.String> result
    Returns
    Type Description
    CharArrayMap<System.String>

    stem dictionary that overrules the stemming algorithm

    | Improve this Doc View Source

    GetWordSet(TextReader, CharArraySet)

    Reads lines from a and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    containing the wordlist

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, LuceneVersion)

    Reads lines from a and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    containing the wordlist

    LuceneVersion matchVersion

    the LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, String, CharArraySet)

    Reads lines from a and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    containing the wordlist

    System.String comment

    The string representing a comment.

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, String, LuceneVersion)

    Reads lines from a and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    containing the wordlist

    System.String comment

    The string representing a comment.

    LuceneVersion matchVersion

    the LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)