Show / Hide Table of Contents

    Class TrecContentSource

    Implements a ContentSource over the TREC collection.

    Inheritance
    System.Object
    ContentItemsSource
    ContentSource
    TrecContentSource
    Inherited Members
    ContentItemsSource.m_forever
    ContentItemsSource.m_logStep
    ContentItemsSource.m_verbose
    ContentItemsSource.m_encoding
    ContentItemsSource.AddBytes(Int64)
    ContentItemsSource.AddItem()
    ContentItemsSource.CollectFiles(DirectoryInfo, IList<FileInfo>)
    ContentItemsSource.ShouldLog()
    ContentItemsSource.Dispose()
    ContentItemsSource.BytesCount
    ContentItemsSource.ItemsCount
    ContentItemsSource.Config
    ContentItemsSource.TotalBytesCount
    ContentItemsSource.TotalItemsCount
    ContentItemsSource.PrintStatistics(String)
    Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class TrecContentSource : ContentSource
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    Fields

    | Improve this Doc View Source

    DOC

    Declaration
    public static readonly string DOC
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    DOCNO

    Declaration
    public static readonly string DOCNO
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    NEW_LINE

    separator between lines in the buffer

    Declaration
    public static readonly string NEW_LINE
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    TERMINATING_DOC

    Declaration
    public static readonly string TERMINATING_DOC
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    TERMINATING_DOCNO

    Declaration
    public static readonly string TERMINATING_DOCNO
    Field Value
    Type Description
    System.String

    Methods

    | Improve this Doc View Source

    Dispose(Boolean)

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing
    Overrides
    ContentItemsSource.Dispose(Boolean)
    | Improve this Doc View Source

    GetNextDocData(DocData)

    Declaration
    public override DocData GetNextDocData(DocData docData)
    Parameters
    Type Name Description
    DocData docData
    Returns
    Type Description
    DocData
    Overrides
    ContentSource.GetNextDocData(DocData)
    | Improve this Doc View Source

    ParseDate(String)

    Declaration
    public virtual DateTime? ParseDate(string dateStr)
    Parameters
    Type Name Description
    System.String dateStr
    Returns
    Type Description
    System.Nullable<DateTime>
    | Improve this Doc View Source

    ResetInputs()

    Declaration
    public override void ResetInputs()
    Overrides
    ContentItemsSource.ResetInputs()
    | Improve this Doc View Source

    SetConfig(Config)

    Declaration
    public override void SetConfig(Config config)
    Parameters
    Type Name Description
    Config config
    Overrides
    ContentItemsSource.SetConfig(Config)
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)