Class TrecContentSource
Implements a ContentSource over the TREC collection.
Implements
Inherited Members
Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class TrecContentSource : ContentSource, IDisposable
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
Fields
DOC
Implements a ContentSource over the TREC collection.
Declaration
public static readonly string DOC
Field Value
Type | Description |
---|---|
string |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
DOCNO
Implements a ContentSource over the TREC collection.
Declaration
public static readonly string DOCNO
Field Value
Type | Description |
---|---|
string |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
NEW_LINE
separator between lines in the buffer
Declaration
public static readonly string NEW_LINE
Field Value
Type | Description |
---|---|
string |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
TERMINATING_DOC
Implements a ContentSource over the TREC collection.
Declaration
public static readonly string TERMINATING_DOC
Field Value
Type | Description |
---|---|
string |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
TERMINATING_DOCNO
Implements a ContentSource over the TREC collection.
Declaration
public static readonly string TERMINATING_DOCNO
Field Value
Type | Description |
---|---|
string |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
Methods
Dispose(bool)
Releases resources used by the TrecContentSource and if overridden in a derived class, optionally releases unmanaged resources.
Declaration
protected override void Dispose(bool disposing)
Parameters
Type | Name | Description |
---|---|---|
bool | disposing |
|
Overrides
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
GetNextDocData(DocData)
Returns the next DocData from the content source. Implementations must account for multi-threading, as multiple threads can call this method simultaneously.
Declaration
public override DocData GetNextDocData(DocData docData)
Parameters
Type | Name | Description |
---|---|---|
DocData | docData |
Returns
Type | Description |
---|---|
DocData |
Overrides
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
ParseDate(string)
Implements a ContentSource over the TREC collection.
Declaration
public virtual DateTime? ParseDate(string dateStr)
Parameters
Type | Name | Description |
---|---|---|
string | dateStr |
Returns
Type | Description |
---|---|
DateTime? |
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
ResetInputs()
Resets the input for this content source, so that the test would behave as if it was just started, input-wise.
NOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to callbase.ResetInputs()
in case you override this method.
Declaration
public override void ResetInputs()
Overrides
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname
SetConfig(Config)
Sets the Config for this content source. If you override this
method, you must call base.SetConfig(config)
.
Declaration
public override void SetConfig(Config config)
Parameters
Type | Name | Description |
---|---|---|
Config | config |
Overrides
Remarks
Supports the following configuration parameters (on top of ContentSource):
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory where the TREC files reside. Can be set to a relative path if "work.dir" is also specified (default=trec).
- trec.doc.parserspecifies the TrecDocParser class to use for parsing the TREC documents content (default=TrecGov2Parser).
- html.parserspecifies the IHTMLParser class to use for parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
- content.source.encodingif not specified, ISO-8859-1 is used.
- if
true
, do not append iteration number to docname