Class WordlistLoader
Loader for text files that represent a list of stopwords.
IOUtils to obtain
Inheritance
Namespace: Lucene.Net.Analysis.Util
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public class WordlistLoader : object
Methods
| Improve this Doc View SourceGetLines(Stream, Encoding)
Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.
A comment line is any line that starts with the character "#"
Declaration
public static IList<string> GetLines(Stream stream, Encoding encoding)
Parameters
Type | Name | Description |
---|---|---|
Stream | stream | |
Encoding | encoding |
Returns
Type | Description |
---|---|
IList<System.String> | a list of non-blank non-comment lines with whitespace trimmed |
GetSnowballWordSet(TextReader, CharArraySet)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetSnowballWordSet(TextReader, LuceneVersion)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
LuceneVersion | matchVersion | the Lucene LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |
GetStemDict(TextReader, CharArrayMap<String>)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
Declaration
public static CharArrayMap<string> GetStemDict(TextReader reader, CharArrayMap<string> result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
CharArrayMap<System.String> | result |
Returns
Type | Description |
---|---|
CharArrayMap<System.String> | stem dictionary that overrules the stemming algorithm |
GetWordSet(TextReader, CharArraySet)
Reads lines from a
Declaration
public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetWordSet(TextReader, LuceneVersion)
Reads lines from a
Declaration
public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
LuceneVersion | matchVersion | the LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |
GetWordSet(TextReader, String, CharArraySet)
Reads lines from a
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
System.String | comment | The string representing a comment. |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetWordSet(TextReader, String, LuceneVersion)
Reads lines from a
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
System.String | comment | The string representing a comment. |
LuceneVersion | matchVersion | the LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |