Class TrecDocParser
Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.
Inheritance
Inherited Members
Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
Assembly: Lucene.Net.Benchmark.dll
Syntax
public abstract class TrecDocParser
  Fields
| Improve this Doc View SourceDEFAULT_PATH_TYPE
trec parser type used for unknown extensions
Declaration
public static readonly TrecDocParser.ParsePathType DEFAULT_PATH_TYPE
  Field Value
| Type | Description | 
|---|---|
| TrecDocParser.ParsePathType | 
Methods
| Improve this Doc View SourceExtract(StringBuilder, String, String, Int32, String[])
Extract from buf the text of interest within specified tags.
Declaration
public static string Extract(StringBuilder buf, string startTag, string endTag, int maxPos, string[] noisePrefixes)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.Text.StringBuilder | buf | Entire input text.  | 
      
| System.String | startTag | Tag marking start of text of interest.  | 
      
| System.String | endTag | Tag marking end of text of interest.  | 
      
| System.Int32 | maxPos | if ≥ 0 sets a limit on start of text of interest.  | 
      
| System.String[] | noisePrefixes | Text of interest or null if not found.  | 
      
Returns
| Type | Description | 
|---|---|
| System.String | 
Parse(DocData, String, TrecContentSource, StringBuilder, TrecDocParser.ParsePathType)
Parse the text prepared in docBuf into a result DocData, no synchronization is required.
Declaration
public abstract DocData Parse(DocData docData, string name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
  Parameters
| Type | Name | Description | 
|---|---|---|
| DocData | docData | Reusable result.  | 
      
| System.String | name | Name that should be set to the result.  | 
      
| TrecContentSource | trecSrc | Calling trec content source.  | 
      
| System.Text.StringBuilder | docBuf | Text to parse.  | 
      
| TrecDocParser.ParsePathType | pathType | Type of parsed file, or UNKNOWN if unknown - may be used by parsers to alter their behavior according to the file path type.  | 
      
Returns
| Type | Description | 
|---|---|
| DocData | 
PathType(FileInfo)
Compute the path type of a file by inspecting name of file and its parents.
Declaration
public static TrecDocParser.ParsePathType PathType(FileInfo f)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.IO.FileInfo | f | 
Returns
| Type | Description | 
|---|---|
| TrecDocParser.ParsePathType | 
StripTags(String, Int32)
Strip tags from input.
Declaration
public static string StripTags(string buf, int start)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.String | buf | |
| System.Int32 | start | 
Returns
| Type | Description | 
|---|---|
| System.String | 
See Also
| Improve this Doc View SourceStripTags(StringBuilder, Int32)
strip tags from
buf: each tag is replaced by a single blank.
Declaration
public static string StripTags(StringBuilder buf, int start)
  Parameters
| Type | Name | Description | 
|---|---|---|
| System.Text.StringBuilder | buf | |
| System.Int32 | start | 
Returns
| Type | Description | 
|---|---|
| System.String | Text obtained when stripping all tags from   |