Class WordlistLoader

Loader for text files that represent a list of stopwords.

IOUtils to obtain System.IO.TextReader instances.

This is a Lucene.NET INTERNAL API, use at your own risk

Inheritance

System.Object

WordlistLoader

Inherited Members

System.Object.Equals(System.Object)

System.Object.Equals(System.Object, System.Object)

System.Object.GetHashCode()

System.Object.GetType()

System.Object.MemberwiseClone()

System.Object.ReferenceEquals(System.Object, System.Object)

System.Object.ToString()

Namespace: Lucene.Net.Analysis.Util

Assembly: Lucene.Net.Analysis.Common.dll

Syntax

public class WordlistLoader

Methods

| Improve this Doc View Source

GetLines(Stream, Encoding)

Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.

A comment line is any line that starts with the character "#"

Declaration

public static IList<string> GetLines(Stream stream, Encoding encoding)

Parameters

Type	Name	Description
System.IO.Stream	stream
System.Text.Encoding	encoding

Returns

Type	Description
System.Collections.Generic.IList<System.String>	a list of non-blank non-comment lines with whitespace trimmed

Exceptions

Type	Condition
System.IO.IOException	If there is a low-level I/O error.

| Improve this Doc View Source

GetSnowballWordSet(TextReader, CharArraySet)

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

Lines may contain multiple words separated by whitespace.
The comment character is the vertical line (|).
Lines may contain trailing comments.

Declaration

public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing a Snowball stopword list
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

| Improve this Doc View Source

GetSnowballWordSet(TextReader, LuceneVersion)

Reads stopwords from a stopword list in Snowball format.

The snowball format is the following:

Lines may contain multiple words separated by whitespace.
The comment character is the vertical line (|).
Lines may contain trailing comments.

Declaration

public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing a Snowball stopword list
LuceneVersion	matchVersion	the Lucene LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words

| Improve this Doc View Source

GetStemDict(TextReader, CharArrayMap<String>)

Reads a stem dictionary. Each line contains:

word\tstem

(i.e. two tab separated words)

Declaration

public static CharArrayMap<string> GetStemDict(TextReader reader, CharArrayMap<string> result)

Parameters

Type	Name	Description
System.IO.TextReader	reader
CharArrayMap<System.String>	result

Returns

Type	Description
CharArrayMap<System.String>	stem dictionary that overrules the stemming algorithm

Exceptions

Type	Condition
System.IO.IOException	If there is a low-level I/O error.

| Improve this Doc View Source

GetWordSet(TextReader, CharArraySet)

Reads lines from a System.IO.TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Declaration

public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing the wordlist
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

| Improve this Doc View Source

GetWordSet(TextReader, LuceneVersion)

Declaration

public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing the wordlist
LuceneVersion	matchVersion	the LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words

| Improve this Doc View Source

GetWordSet(TextReader, String, CharArraySet)

Reads lines from a System.IO.TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the System.IO.TextReader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Declaration

public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing the wordlist
System.String	comment	The string representing a comment.
CharArraySet	result	the CharArraySet to fill with the readers words

Returns

Type	Description
CharArraySet	the given CharArraySet with the reader's words

| Improve this Doc View Source

GetWordSet(TextReader, String, LuceneVersion)

Declaration

public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)

Parameters

Type	Name	Description
System.IO.TextReader	reader	System.IO.TextReader containing the wordlist
System.String	comment	The string representing a comment.
LuceneVersion	matchVersion	the LuceneVersion

Returns

Type	Description
CharArraySet	A CharArraySet with the reader's words