Class WordlistLoader
Loader for text files that represent a list of stopwords.
Lucene.Net.Util.IOUtils to obtain TextReader instances.Note
This API is for internal purposes only and might change in incompatible ways in the next release.
Inherited Members
Namespace: Lucene.Net.Analysis.Util
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public static class WordlistLoader
Methods
GetLines(Stream, Encoding)
Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.
A comment line is any line that starts with the character "#"
Declaration
public static IList<string> GetLines(Stream stream, Encoding encoding)
Parameters
Type | Name | Description |
---|---|---|
Stream | stream | |
Encoding | encoding |
Returns
Type | Description |
---|---|
IList<string> | a list of non-blank non-comment lines with whitespace trimmed |
Exceptions
Type | Condition |
---|---|
IOException | If there is a low-level I/O error. |
GetSnowballWordSet(TextReader, CharArraySet)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing a Snowball stopword list |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetSnowballWordSet(TextReader, LuceneVersion)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing a Snowball stopword list |
LuceneVersion | matchVersion | the Lucene Lucene.Net.Util.LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |
GetStemDict(TextReader, CharArrayDictionary<string>)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
Declaration
public static CharArrayDictionary<string> GetStemDict(TextReader reader, CharArrayDictionary<string> result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | |
CharArrayDictionary<string> | result |
Returns
Type | Description |
---|---|
CharArrayDictionary<string> | stem dictionary that overrules the stemming algorithm |
Exceptions
Type | Condition |
---|---|
IOException | If there is a low-level I/O error. |
GetWordSet(TextReader, CharArraySet)
Reads lines from a TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Declaration
public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing the wordlist |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetWordSet(TextReader, LuceneVersion)
Reads lines from a TextReader and adds every line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Declaration
public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing the wordlist |
LuceneVersion | matchVersion | the Lucene.Net.Util.LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |
GetWordSet(TextReader, string, CharArraySet)
Reads lines from a TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing the wordlist |
string | comment | The string representing a comment. |
CharArraySet | result | the CharArraySet to fill with the readers words |
Returns
Type | Description |
---|---|
CharArraySet | the given CharArraySet with the reader's words |
GetWordSet(TextReader, string, LuceneVersion)
Reads lines from a TextReader and adds every non-comment line as an entry to a CharArraySet (omitting leading and trailing whitespace). Every line of the TextReader should contain only one word. The words need to be in lowercase if you make use of an Lucene.Net.Analysis.Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
TextReader | reader | TextReader containing the wordlist |
string | comment | The string representing a comment. |
LuceneVersion | matchVersion | the Lucene.Net.Util.LuceneVersion |
Returns
Type | Description |
---|---|
CharArraySet | A CharArraySet with the reader's words |