Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class TrecContentSource

    Implements a ContentSource over the TREC collection.

    Inheritance
    object
    ContentItemsSource
    ContentSource
    TrecContentSource
    Implements
    IDisposable
    Inherited Members
    ContentItemsSource.m_forever
    ContentItemsSource.m_logStep
    ContentItemsSource.m_verbose
    ContentItemsSource.m_encoding
    ContentItemsSource.AddBytes(long)
    ContentItemsSource.AddItem()
    ContentItemsSource.CollectFiles(DirectoryInfo, IList<FileInfo>)
    ContentItemsSource.ShouldLog()
    ContentItemsSource.Dispose()
    ContentItemsSource.BytesCount
    ContentItemsSource.ItemsCount
    ContentItemsSource.Config
    ContentItemsSource.TotalBytesCount
    ContentItemsSource.TotalItemsCount
    ContentItemsSource.PrintStatistics(string)
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class TrecContentSource : ContentSource, IDisposable
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    Fields

    DOC

    Implements a ContentSource over the TREC collection.

    Declaration
    public static readonly string DOC
    Field Value
    Type Description
    string
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    DOCNO

    Implements a ContentSource over the TREC collection.

    Declaration
    public static readonly string DOCNO
    Field Value
    Type Description
    string
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    NEW_LINE

    separator between lines in the buffer

    Declaration
    public static readonly string NEW_LINE
    Field Value
    Type Description
    string
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    TERMINATING_DOC

    Implements a ContentSource over the TREC collection.

    Declaration
    public static readonly string TERMINATING_DOC
    Field Value
    Type Description
    string
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    TERMINATING_DOCNO

    Implements a ContentSource over the TREC collection.

    Declaration
    public static readonly string TERMINATING_DOCNO
    Field Value
    Type Description
    string
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    Methods

    Dispose(bool)

    Releases resources used by the TrecContentSource and if overridden in a derived class, optionally releases unmanaged resources.

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing

    true to release both managed and unmanaged resources; false to release only unmanaged resources.

    Overrides
    ContentItemsSource.Dispose(bool)
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    GetNextDocData(DocData)

    Returns the next DocData from the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.

    Declaration
    public override DocData GetNextDocData(DocData docData)
    Parameters
    Type Name Description
    DocData docData
    Returns
    Type Description
    DocData
    Overrides
    ContentSource.GetNextDocData(DocData)
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    ParseDate(string)

    Implements a ContentSource over the TREC collection.

    Declaration
    public virtual DateTime? ParseDate(string dateStr)
    Parameters
    Type Name Description
    string dateStr
    Returns
    Type Description
    DateTime?
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    ResetInputs()

    Resets the input for this content source, so that the test would behave as if it was just started, input-wise.

    NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call base.ResetInputs() in case you override this method.
    Declaration
    public override void ResetInputs()
    Overrides
    ContentItemsSource.ResetInputs()
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    SetConfig(Config)

    Sets the Config for this content source. If you override this method, you must call base.SetConfig(config).

    Declaration
    public override void SetConfig(Config config)
    Parameters
    Type Name Description
    Config config
    Overrides
    ContentItemsSource.SetConfig(Config)
    Remarks

    Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
    • trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
    • html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
    • content.source.encodingif not specified, ISO-8859-1 is used.
    • if true, do not append iteration number to docname

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.