Class WordlistLoader
Loader for text files that represent a list of stopwords.
IOUtils to obtain System.
Inheritance
Inherited Members
Namespace: Lucene.Net.Analysis.Util
Assembly: Lucene.Net.Analysis.Common.dll
Syntax
public static class WordlistLoader
Methods
| Improve this Doc View SourceGetLines(Stream, Encoding)
Accesses a resource by name and returns the (non comment) lines containing data using the given character encoding.
A comment line is any line that starts with the character "#"
Declaration
public static IList<string> GetLines(Stream stream, Encoding encoding)
Parameters
Type | Name | Description |
---|---|---|
System. |
stream | |
System. |
encoding |
Returns
Type | Description |
---|---|
System. |
a list of non-blank non-comment lines with whitespace trimmed |
Exceptions
Type | Condition |
---|---|
System. |
If there is a low-level I/O error. |
GetSnowballWordSet(TextReader, CharArraySet)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
Char |
result | the Char |
Returns
Type | Description |
---|---|
Char |
the given Char |
GetSnowballWordSet(TextReader, LuceneVersion)
Reads stopwords from a stopword list in Snowball format.
The snowball format is the following:
- Lines may contain multiple words separated by whitespace.
- The comment character is the vertical line (|).
- Lines may contain trailing comments.
Declaration
public static CharArraySet GetSnowballWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
Lucene. |
matchVersion | the Lucene Lucene. |
Returns
Type | Description |
---|---|
Char |
A Char |
GetStemDict(TextReader, CharArrayMap<String>)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab separated words)
Declaration
public static CharArrayMap<string> GetStemDict(TextReader reader, CharArrayMap<string> result)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | |
Char |
result |
Returns
Type | Description |
---|---|
Char |
stem dictionary that overrules the stemming algorithm |
Exceptions
Type | Condition |
---|---|
System. |
If there is a low-level I/O error. |
GetWordSet(TextReader, CharArraySet)
Reads lines from a System.
Declaration
public static CharArraySet GetWordSet(TextReader reader, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
Char |
result | the Char |
Returns
Type | Description |
---|---|
Char |
the given Char |
GetWordSet(TextReader, LuceneVersion)
Reads lines from a System.
Declaration
public static CharArraySet GetWordSet(TextReader reader, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
Lucene. |
matchVersion | the Lucene. |
Returns
Type | Description |
---|---|
Char |
A Char |
GetWordSet(TextReader, String, CharArraySet)
Reads lines from a System.
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, CharArraySet result)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
System. |
comment | The string representing a comment. |
Char |
result | the Char |
Returns
Type | Description |
---|---|
Char |
the given Char |
GetWordSet(TextReader, String, LuceneVersion)
Reads lines from a System.
Declaration
public static CharArraySet GetWordSet(TextReader reader, string comment, LuceneVersion matchVersion)
Parameters
Type | Name | Description |
---|---|---|
System. |
reader | System. |
System. |
comment | The string representing a comment. |
Lucene. |
matchVersion | the Lucene. |
Returns
Type | Description |
---|---|
Char |
A CharArraySet with the reader's words |