Class TrecDocParser
Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.
Inheritance
Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
Assembly: Lucene.Net.Benchmark.dll
Syntax
public abstract class TrecDocParser : object
Fields
| Improve this Doc View SourceDEFAULT_PATH_TYPE
trec parser type used for unknown extensions
Declaration
public static readonly TrecDocParser.ParsePathType DEFAULT_PATH_TYPE
Field Value
Type | Description |
---|---|
Trec |
Methods
| Improve this Doc View SourceExtract(StringBuilder, String, String, Int32, String[])
Extract from buf
the text of interest within specified tags.
Declaration
public static string Extract(StringBuilder buf, string startTag, string endTag, int maxPos, string[] noisePrefixes)
Parameters
Type | Name | Description |
---|---|---|
String |
buf | Entire input text. |
System. |
startTag | Tag marking start of text of interest. |
System. |
endTag | Tag marking end of text of interest. |
System. |
maxPos | if ≥ 0 sets a limit on start of text of interest. |
System. |
noisePrefixes | Text of interest or null if not found. |
Returns
Type | Description |
---|---|
System. |
Parse(DocData, String, TrecContentSource, StringBuilder, TrecDocParser.ParsePathType)
Parse the text prepared in docBuf into a result DocData, no synchronization is required.
Declaration
public abstract DocData Parse(DocData docData, string name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
Parameters
Type | Name | Description |
---|---|---|
Doc |
docData | Reusable result. |
System. |
name | Name that should be set to the result. |
Trec |
trecSrc | Calling trec content source. |
String |
docBuf | Text to parse. |
Trec |
pathType | Type of parsed file, or UNKNOWN if unknown - may be used by parsers to alter their behavior according to the file path type. |
Returns
Type | Description |
---|---|
Doc |
PathType(FileInfo)
Compute the path type of a file by inspecting name of file and its parents.
Declaration
public static TrecDocParser.ParsePathType PathType(FileInfo f)
Parameters
Type | Name | Description |
---|---|---|
File |
f |
Returns
Type | Description |
---|---|
Trec |
StripTags(StringBuilder, Int32)
strip tags from
buf
: each tag is replaced by a single blank.
Declaration
public static string StripTags(StringBuilder buf, int start)
Parameters
Type | Name | Description |
---|---|---|
String |
buf | |
System. |
start |
Returns
Type | Description |
---|---|
System. |
Text obtained when stripping all tags from |
StripTags(String, Int32)
Strip tags from input.
Declaration
public static string StripTags(string buf, int start)
Parameters
Type | Name | Description |
---|---|---|
System. |
buf | |
System. |
start |
Returns
Type | Description |
---|---|
System. |