Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class WordlistLoader

    Loader for text files that represent a list of stopwords.

    IOUtils to obtain System.IO.TextReader instances.

    This is a Lucene.NET INTERNAL API, use at your own risk
    Inheritance
    System.Object
    WordlistLoader
    Inherited Members
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Analysis.Util
    Assembly: Lucene.Net.Analysis.Common.dll
    Syntax
    public static class WordlistLoader

    Methods

    | Improve this Doc View Source

    GetLines(Stream, Encoding)

    Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

    A comment line is any line that starts with the character "#"

    Declaration
    public static IList<string> GetLines(Stream stream, Encoding encoding)
    Parameters
    Type Name Description
    System.IO.Stream stream
    System.Text.Encoding encoding
    Returns
    Type Description
    System.Collections.Generic.IList<System.String>

    a list of non-blank non-comment lines with whitespace trimmed

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetSnowballWordSet(TextReader, CharArraySet)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.

    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing a Snowball stopword list

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetSnowballWordSet(TextReader, LuceneVersion)

    Reads stopwords from a stopword list in Snowball format.

    The snowball format is the following:

    • Lines may contain multiple words separated by whitespace.
    • The comment character is the vertical line (|).
    • Lines may contain trailing comments.

    Declaration
    public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing a Snowball stopword list

    Lucene.Net.Util.LuceneVersion matchVersion

    the Lucene Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    | Improve this Doc View Source

    GetStemDict(TextReader, CharArrayMap<String>)

    Reads a stem dictionary. Each line contains:

    word\tstem

    (i.e. two tab separated words)

    Declaration
    public static CharArrayMap<string> GetStemDict(TextReader reader, CharArrayMap<string> result)
    Parameters
    Type Name Description
    System.IO.TextReader reader
    CharArrayMap<System.String> result
    Returns
    Type Description
    CharArrayMap<System.String>

    stem dictionary that overrules the stemming algorithm

    Exceptions
    Type Condition
    System.IO.IOException

    If there is a low-level I/O error.

    | Improve this Doc View Source

    GetWordSet(TextReader, CharArraySet)

    Reads lines from a System.IO.TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing the wordlist

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, LuceneVersion)

    Reads lines from a System.IO.TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing the wordlist

    Lucene.Net.Util.LuceneVersion matchVersion

    the Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, String, CharArraySet)

    Reads lines from a System.IO.TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing the wordlist

    System.String comment

    The string representing a comment.

    CharArraySet result

    the CharArraySet to fill with the readers words

    Returns
    Type Description
    CharArraySet

    the given CharArraySet with the reader's words

    | Improve this Doc View Source

    GetWordSet(TextReader, String, LuceneVersion)

    Reads lines from a System.IO.TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

    Declaration
    public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
    Parameters
    Type Name Description
    System.IO.TextReader reader

    System.IO.TextReader containing the wordlist

    System.String comment

    The string representing a comment.

    Lucene.Net.Util.LuceneVersion matchVersion

    the Lucene.Net.Util.LuceneVersion

    Returns
    Type Description
    CharArraySet

    A CharArraySet with the reader's words

    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.