Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class WordlistLoader

    Loader for text files that represent a list of stopwords.

    Lucene.Net.Util.IOUtils to obtain TextReader instances.
    Note

    This API is for internal purposes only and might change in incompatible ways in the next release.

    Inheritance
    object
    WordlistLoader
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Analysis.Util
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public static class WordlistLoader

    Methods

    GetLines(Stream, Encoding)

    Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

    A comment line is any line that starts with the character "#"

    Declaration
    public static IList<string> GetLines(Stream stream, Encoding encoding)
    Parameters
    Type Name Description
    Stream stream
    Encoding encoding
    Returns
    Type Description
    IList<string>

    a list of non-blank non-comment lines with whitespace trimmed

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetSnowballWordSet(TextReader, CharArraySet)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.
    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing a Snowball stopword list

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    GetSnowballWordSet(TextReader, LuceneVersion)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.
    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing a Snowball stopword list

    LuceneVersion matchVersion

    the Lucene Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    GetStemDict(TextReader, CharArrayDictionary<string>)

    Reads a stem dictionary. Each line contains:

    word\tstem

    (i.e. two tab separated words)

    Declaration
    public static CharArrayDictionary<string> GetStemDict(TextReader reader, CharArrayDictionary<string> result)
    Parameters
    Type Name Description
    TextReader reader
    CharArrayDictionary<string> result
    Returns
    Type Description
    CharArrayDictionary<string>

    stem dictionary that overrules the stemming algorithm

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    GetWordSet(TextReader, CharArraySet)

    Reads lines from a TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing the wordlist

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    GetWordSet(TextReader, LuceneVersion)

    Reads lines from a TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing the wordlist

    LuceneVersion matchVersion

    the Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    GetWordSet(TextReader, string, CharArraySet)

    Reads lines from a TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing the wordlist

    string comment

    The string representing a comment.

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    GetWordSet(TextReader, string, LuceneVersion)

    Reads lines from a TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    TextReader reader

    TextReader containing the wordlist

    string comment

    The string representing a comment.

    LuceneVersion matchVersion

    the Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.