Namespace Lucene.Net.Benchmarks.ByTask.Feeds
Sources for benchmark inputs: documents and queries.
Classes
AbstractQueryMaker
Abstract base query maker.
Each query maker should just implement the Prepare
ContentItemsSource
Base class for source of data for benchmarking.
ContentSource
Represents content from a specified source, such as TREC, Reuters etc. A
Content
For supported configuration parameters see Content
DemoHTMLParser
Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.
DemoHTMLParser.Parser
The actual parser to read HTML documents.
DirContentSource
A Content
- work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
- docs.dirspecifies the directory the Dir collection. Can be set to a relative path if "work.dir" is also specified (default=dir-out).
DirContentSource.Iterator
Iterator over the files in the directory.
DocData
Output of parsing (e.g. HTML parsing) of an input document.
DocMaker
Creates Document objects. Uses a Content
DocMaker.DocState
Document state, supports reuse of field instances
across documents (see reuseFields
parameter).
EnwikiContentSource
A Content.bz2
file directly (it will be decompressed on the fly). Config
properties:
- keep.image.only.docsfalse|true (default true).
- docs.file<path to the file>
EnwikiQueryMaker
A QueryMaker that uses common and uncommon actual Wikipedia queries for searching the English Wikipedia collection. 90 queries total.
FacetSource
Source items for facets.
For supported configuration parameters see Content
FileBasedQueryMaker
Create queries from a
GeonamesLineParser
A line parser for Geonames.org data.
See 'geoname' table.
Requires Spatial
HeaderLineParser
Line
Int64ToEnglishContentSource
Creates documents whose content is a
.
Int64ToEnglishQueryMaker
Creates queries whose content is a spelled-out
.
LineDocSource
A Content
LineParser
Reader of a single input line into Doc
NoMoreDataException
Exception indicating there is no more data.
Thrown by Docs Makers if doc.maker.forever
is false
and docs sources of that maker where exhausted.
This is useful for iterating all document of a source, in case we don't know in advance how many docs there are.
RandomFacetSource
Simple implementation of a random facet source.
ReutersContentSource
A Content
Config properties:
- work.dirpath to the root of docs and indexes dirs (default work).
- docs.dirpath to the docs dir (default reuters-out).
ReutersQueryMaker
A IQuery
SimpleLineParser
Line
SimpleQueryMaker
A IQuery
SimpleSloppyPhraseQueryMaker
Create sloppy phrase queries for performance test, in an index created using simple doc maker.
SingleDocSource
Creates the same document each time Get
SortableSingleDocSource
Adds fields appropriate for sorting: country, random_string and sort_field (int). Supports the following parameters:
- sort.rngdefines the range for sort-by-int field (default 20000).
- rand.seeddefines the seed to initialize Random with (default 13).
SpatialDocMaker
Indexes spatial data according to a configured Spatial
SpatialFileQueryMaker
Reads spatial data from the body field docs from an internally created Line
TrecContentSource
Implements a Content
TrecDocParser
Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.
TrecFBISParser
Parser for the FBIS docs in trec disks 4+5 collection format
TrecFR94Parser
Parser for the FR94 docs in trec disks 4+5 collection format
TrecFTParser
Parser for the FT docs in trec disks 4+5 collection format
TrecGov2Parser
Parser for the GOV2 collection format
TrecLATimesParser
Parser for the FT docs in trec disks 4+5 collection format
TrecParserByPath
Parser for trec docs which selects the parser to apply according
to the source files path, defaulting to Trec
Interfaces
IHTMLParser
HTML Parsing Interface for test purposes.
IQueryMaker
Create queries for the test.
IShapeConverter
Converts one shape to another. Created by
Make
Enums
TrecDocParser.ParsePathType
Types of trec parse paths,