Class Lang
Language guessing utility.
Inherited Members
Namespace: Lucene.Net.Analysis.Phonetic.Language.Bm
Assembly: Lucene.Net.Analysis.Phonetic.dll
Syntax
public class Lang
Remarks
This class encapsulates rules used to guess the possible languages that a word originates from. This is done by reference to a whole series of rules distributed in resource files.
Instances of this class are typically managed through the static factory method GetInstance(NameType). Unless you are developing your own language guessing rules, you will not need to interact with this class directly. This class is intended to be immutable and thread-safe. Lang resources Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:Lucene.Net.Analysis.Phonetic.Language.Bm.lang.txt
The format of these resources is the following:
Rules: |
Whitespace separated strings.
There should be 3 columns to each row, and these will be interpreted as:
|
End-of-line comments: | Any occurrence of '//' will cause all text following on that line to be discarded as a comment. |
Multi-line comments: | Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found. |
Blank lines: | All blank lines will be skipped. |
Methods
GetInstance(NameType)
Gets a Lang instance for one of the supported NameTypes.
Declaration
public static Lang GetInstance(NameType nameType)
Parameters
Type | Name | Description |
---|---|---|
NameType | nameType | The NameType to look up. |
Returns
Type | Description |
---|---|
Lang | A Lang encapsulating the language guessing rules for that name type. |
Remarks
This class encapsulates rules used to guess the possible languages that a word originates from. This is done by reference to a whole series of rules distributed in resource files.
Instances of this class are typically managed through the static factory method GetInstance(NameType). Unless you are developing your own language guessing rules, you will not need to interact with this class directly. This class is intended to be immutable and thread-safe. Lang resources Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:Lucene.Net.Analysis.Phonetic.Language.Bm.lang.txt
The format of these resources is the following:
Rules: |
Whitespace separated strings.
There should be 3 columns to each row, and these will be interpreted as:
|
End-of-line comments: | Any occurrence of '//' will cause all text following on that line to be discarded as a comment. |
Multi-line comments: | Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found. |
Blank lines: | All blank lines will be skipped. |
GuessLanguage(string)
Guesses the language of a word.
Declaration
public virtual string GuessLanguage(string text)
Parameters
Type | Name | Description |
---|---|---|
string | text | The word. |
Returns
Type | Description |
---|---|
string | The language that the word originates from or ANY if there was no unique match. |
Remarks
This class encapsulates rules used to guess the possible languages that a word originates from. This is done by reference to a whole series of rules distributed in resource files.
Instances of this class are typically managed through the static factory method GetInstance(NameType). Unless you are developing your own language guessing rules, you will not need to interact with this class directly. This class is intended to be immutable and thread-safe. Lang resources Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:Lucene.Net.Analysis.Phonetic.Language.Bm.lang.txt
The format of these resources is the following:
Rules: |
Whitespace separated strings.
There should be 3 columns to each row, and these will be interpreted as:
|
End-of-line comments: | Any occurrence of '//' will cause all text following on that line to be discarded as a comment. |
Multi-line comments: | Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found. |
Blank lines: | All blank lines will be skipped. |
GuessLanguages(string)
Guesses the languages of a word.
Declaration
public virtual LanguageSet GuessLanguages(string input)
Parameters
Type | Name | Description |
---|---|---|
string | input | The word. |
Returns
Type | Description |
---|---|
LanguageSet | A Set of Strings of language names that are potential matches for the input word. |
Remarks
This class encapsulates rules used to guess the possible languages that a word originates from. This is done by reference to a whole series of rules distributed in resource files.
Instances of this class are typically managed through the static factory method GetInstance(NameType). Unless you are developing your own language guessing rules, you will not need to interact with this class directly. This class is intended to be immutable and thread-safe. Lang resources Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:Lucene.Net.Analysis.Phonetic.Language.Bm.lang.txt
The format of these resources is the following:
Rules: |
Whitespace separated strings.
There should be 3 columns to each row, and these will be interpreted as:
|
End-of-line comments: | Any occurrence of '//' will cause all text following on that line to be discarded as a comment. |
Multi-line comments: | Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found. |
Blank lines: | All blank lines will be skipped. |
LoadFromResource(string, Languages)
Loads language rules from a resource.
In normal use, you will obtain instances of Lang through the GetInstance(NameType) method. You will only need to call this yourself if you are developing custom language mapping rules.Declaration
public static Lang LoadFromResource(string languageRulesResourceName, Languages languages)
Parameters
Type | Name | Description |
---|---|---|
string | languageRulesResourceName | The fully-qualified or partially-qualified resource name to load. |
Languages | languages | The languages that these rules will support. |
Returns
Type | Description |
---|---|
Lang | A Lang encapsulating the loaded language-guessing rules. |
Remarks
This class encapsulates rules used to guess the possible languages that a word originates from. This is done by reference to a whole series of rules distributed in resource files.
Instances of this class are typically managed through the static factory method GetInstance(NameType). Unless you are developing your own language guessing rules, you will not need to interact with this class directly. This class is intended to be immutable and thread-safe. Lang resources Language guessing rules are typically loaded from resource files. These are UTF-8 encoded text files. They are systematically named following the pattern:Lucene.Net.Analysis.Phonetic.Language.Bm.lang.txt
The format of these resources is the following:
Rules: |
Whitespace separated strings.
There should be 3 columns to each row, and these will be interpreted as:
|
End-of-line comments: | Any occurrence of '//' will cause all text following on that line to be discarded as a comment. |
Multi-line comments: | Any line starting with '/*' will start multi-line commenting mode. This will skip all content until a line ending in '*' and '/' is found. |
Blank lines: | All blank lines will be skipped. |