Class WordlistLoader

Loader for text files that represent a list of stopwords.

Lucene.Net.Util.IOUtils to obtain TextReader instances.

Note

This API is for internal purposes only and might change in incompatible ways in the next release.

Inheritance

object

WordlistLoader

Inherited Members

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Namespace: Lucene.Net.Analysis.Util

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public static class WordlistLoader

Methods

GetLines(Stream, Encoding)

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

Declaration

public static IList<string> GetLines(Stream stream, Encoding encoding)

Parameters

Type	Name	Description
Stream	stream
Encoding	encoding

Returns

Type	Description
IList<string>	a list of non-blank non-comment lines with whitespace trimmed

Exceptions

Type	Condition
IOException	If there is a low-level I/O error.

GetSnowballWordSet(TextReader, CharArraySet)

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

Lines may contain multiple words separated by whitespace.
The comment character is the vertical line (|).
Lines may contain trailing comments.

Declaration

public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing a Snowball stopword list
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

GetSnowballWordSet(TextReader, LuceneVersion)

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

Lines may contain multiple words separated by whitespace.
The comment character is the vertical line (|).
Lines may contain trailing comments.

Declaration

public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing a Snowball stopword list
LuceneVersion	matchVersion	the Lucene Lucene.Net.Util.LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words

GetStemDict(TextReader, CharArrayDictionary<string>)

Reads a stem dictionary. Each line contains:

word\tstem

(i.e. two tab separated words)

Declaration

public static CharArrayDictionary<string> GetStemDict(TextReader reader, CharArrayDictionary<string> result)

Parameters

Type	Name	Description
TextReader	reader
CharArrayDictionary<string>	result

Returns

Type	Description
CharArrayDictionary<string>	stem dictionary that overrules the stemming algorithm

Exceptions

Type	Condition
IOException	If there is a low-level I/O error.

GetWordSet(TextReader, CharArraySet)

Reads lines from a TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Declaration

public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing the wordlist
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

GetWordSet(TextReader, LuceneVersion)

Declaration

public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing the wordlist
LuceneVersion	matchVersion	the Lucene.Net.Util.LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words

GetWordSet(TextReader, string, CharArraySet)

Reads lines from a TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Declaration

public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing the wordlist
string	comment	The string representing a comment.
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

GetWordSet(TextReader, string, LuceneVersion)

Declaration

public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)

Parameters

Type	Name	Description
TextReader	reader	TextReader containing the wordlist
string	comment	The string representing a comment.
LuceneVersion	matchVersion	the Lucene.Net.Util.LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words