Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class TrecContentSource

    Implements a ContentSource over the TREC collection.

    Inheritance
    System.Object
    ContentItemsSource
    ContentSource
    TrecContentSource
    Implements
    System.IDisposable
    Inherited Members
    ContentItemsSource.m_forever
    ContentItemsSource.m_logStep
    ContentItemsSource.m_verbose
    ContentItemsSource.m_encoding
    ContentItemsSource.AddBytes(Int64)
    ContentItemsSource.AddItem()
    ContentItemsSource.CollectFiles(DirectoryInfo, IList<FileInfo>)
    ContentItemsSource.ShouldLog()
    ContentItemsSource.Dispose()
    ContentItemsSource.BytesCount
    ContentItemsSource.ItemsCount
    ContentItemsSource.Config
    ContentItemsSource.TotalBytesCount
    ContentItemsSource.TotalItemsCount
    ContentItemsSource.PrintStatistics(String)
    System.Object.Equals(System.Object)
    System.Object.Equals(System.Object, System.Object)
    System.Object.GetHashCode()
    System.Object.GetType()
    System.Object.MemberwiseClone()
    System.Object.ReferenceEquals(System.Object, System.Object)
    System.Object.ToString()
    Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class TrecContentSource : ContentSource, IDisposable
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    Fields

    | Improve this Doc View Source

    DOC

    Declaration
    public static readonly string DOC
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    DOCNO

    Declaration
    public static readonly string DOCNO
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    NEW_LINE

    separator between lines in the buffer

    Declaration
    public static readonly string NEW_LINE
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    TERMINATING_DOC

    Declaration
    public static readonly string TERMINATING_DOC
    Field Value
    Type Description
    System.String
    | Improve this Doc View Source

    TERMINATING_DOCNO

    Declaration
    public static readonly string TERMINATING_DOCNO
    Field Value
    Type Description
    System.String

    Methods

    | Improve this Doc View Source

    Dispose(Boolean)

    Releases resources used by the TrecContentSource and if overridden in a derived class, optionally releases unmanaged resources.

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    System.Boolean disposing

    true to release both managed and unmanaged resources; false to release only unmanaged resources.

    Overrides
    ContentItemsSource.Dispose(Boolean)
    | Improve this Doc View Source

    GetNextDocData(DocData)

    Declaration
    public override DocData GetNextDocData(DocData docData)
    Parameters
    Type Name Description
    DocData docData
    Returns
    Type Description
    DocData
    Overrides
    ContentSource.GetNextDocData(DocData)
    | Improve this Doc View Source

    ParseDate(String)

    Declaration
    public virtual DateTime? ParseDate(string dateStr)
    Parameters
    Type Name Description
    System.String dateStr
    Returns
    Type Description
    System.Nullable<System.DateTime>
    | Improve this Doc View Source

    ResetInputs()

    Declaration
    public override void ResetInputs()
    Overrides
    ContentItemsSource.ResetInputs()
    | Improve this Doc View Source

    SetConfig(Config)

    Declaration
    public override void SetConfig(Config config)
    Parameters
    Type Name Description
    Config config
    Overrides
    ContentItemsSource.SetConfig(Config)

    Implements

    System.IDisposable
    • Improve this Doc
    • View Source
    Back to top Copyright © 2020 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.