Show / Hide Table of Contents

    Class TrecDocParser

    Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.

    Inheritance
    System.Object
    TrecDocParser
    TrecFBISParser
    TrecFR94Parser
    TrecFTParser
    TrecGov2Parser
    TrecLATimesParser
    TrecParserByPath
    Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public abstract class TrecDocParser : object

    Fields

    | Improve this Doc View Source

    DEFAULT_PATH_TYPE

    trec parser type used for unknown extensions

    Declaration
    public static readonly TrecDocParser.ParsePathType DEFAULT_PATH_TYPE
    Field Value
    Type Description
    TrecDocParser.ParsePathType

    Methods

    | Improve this Doc View Source

    Extract(StringBuilder, String, String, Int32, String[])

    Extract from buf the text of interest within specified tags.

    Declaration
    public static string Extract(StringBuilder buf, string startTag, string endTag, int maxPos, string[] noisePrefixes)
    Parameters
    Type Name Description
    StringBuilder buf

    Entire input text.

    System.String startTag

    Tag marking start of text of interest.

    System.String endTag

    Tag marking end of text of interest.

    System.Int32 maxPos

    if ≥ 0 sets a limit on start of text of interest.

    System.String[] noisePrefixes

    Text of interest or null if not found.

    Returns
    Type Description
    System.String
    | Improve this Doc View Source

    Parse(DocData, String, TrecContentSource, StringBuilder, TrecDocParser.ParsePathType)

    Parse the text prepared in docBuf into a result DocData, no synchronization is required.

    Declaration
    public abstract DocData Parse(DocData docData, string name, TrecContentSource trecSrc, StringBuilder docBuf, TrecDocParser.ParsePathType pathType)
    Parameters
    Type Name Description
    DocData docData

    Reusable result.

    System.String name

    Name that should be set to the result.

    TrecContentSource trecSrc

    Calling trec content source.

    StringBuilder docBuf

    Text to parse.

    TrecDocParser.ParsePathType pathType

    Type of parsed file, or UNKNOWN if unknown - may be used by parsers to alter their behavior according to the file path type.

    Returns
    Type Description
    DocData
    | Improve this Doc View Source

    PathType(FileInfo)

    Compute the path type of a file by inspecting name of file and its parents.

    Declaration
    public static TrecDocParser.ParsePathType PathType(FileInfo f)
    Parameters
    Type Name Description
    FileInfo f
    Returns
    Type Description
    TrecDocParser.ParsePathType
    | Improve this Doc View Source

    StripTags(StringBuilder, Int32)

    strip tags from

    buf
    : each tag is replaced by a single blank.

    Declaration
    public static string StripTags(StringBuilder buf, int start)
    Parameters
    Type Name Description
    StringBuilder buf
    System.Int32 start
    Returns
    Type Description
    System.String

    Text obtained when stripping all tags from buf (input is unmodified).

    | Improve this Doc View Source

    StripTags(String, Int32)

    Strip tags from input.

    Declaration
    public static string StripTags(string buf, int start)
    Parameters
    Type Name Description
    System.String buf
    System.Int32 start
    Returns
    Type Description
    System.String
    See Also
    StripTags(StringBuilder, Int32)
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)