Show / Hide Table of Contents

    Namespace Lucene.Net.Benchmarks.ByTask.Feeds

    Sources for benchmark inputs: documents and queries.

    Classes

    AbstractQueryMaker

    Abstract base query maker. Each query maker should just implement the PrepareQueries() method.

    ContentItemsSource

    Base class for source of data for benchmarking.

    ContentSource

    Represents content from a specified source, such as TREC, Reuters etc. A ContentSource is responsible for creating DocData objects for its documents to be consumed by DocMaker. It also keeps track of various statistics, such as how many documents were generated, size in bytes etc.

    For supported configuration parameters see ContentItemsSource.

    DemoHTMLParser

    Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.

    DemoHTMLParser.Parser

    The actual parser to read HTML documents.

    DirContentSource

    A ContentSource using the Dir collection for its input. Supports the following configuration parameters (on top of ContentSource):

    • work.dirspecifies the working directory. Required if "docs.dir" denotes a relative path (default=work).
    • docs.dirspecifies the directory the Dir collection. Can be set to a relative path if "work.dir" is also specified (default=dir-out).

    DirContentSource.Iterator

    Iterator over the files in the directory.

    DocData

    Output of parsing (e.g. HTML parsing) of an input document.

    DocMaker

    Creates Document objects. Uses a ContentSource to generate DocData objects.

    DocMaker.DocState

    Document state, supports reuse of field instances across documents (see reuseFields parameter).

    EnwikiContentSource

    A ContentSource which reads the English Wikipedia dump. You can read the .bz2 file directly (it will be decompressed on the fly). Config properties:

    • keep.image.only.docsfalse|true (default true).
    • docs.file<path to the file>

    EnwikiQueryMaker

    A QueryMaker that uses common and uncommon actual Wikipedia queries for searching the English Wikipedia collection. 90 queries total.

    FacetSource

    Source items for facets.

    For supported configuration parameters see ContentItemsSource.

    FileBasedQueryMaker

    Create queries from a . One per line, pass them through the QueryParser. Lines beginning with # are treated as comments.

    GeonamesLineParser

    A line parser for Geonames.org data. See 'geoname' table. Requires SpatialDocMaker.

    HeaderLineParser

    LineParser which sets field names and order by the header - any header - of the lines file. It is less efficient than SimpleLineParser but more powerful.

    Int64ToEnglishContentSource

    Creates documents whose content is a number starting from + 10.

    Int64ToEnglishQueryMaker

    Creates queries whose content is a spelled-out number starting from + 10.

    LineDocSource

    A ContentSource reading one line at a time as a Document from a single file. This saves IO cost (over DirContentSource) of recursing through a directory and opening a new file for every document.

    LineParser

    Reader of a single input line into DocData.

    NoMoreDataException

    Exception indicating there is no more data. Thrown by Docs Makers if doc.maker.forever is false and docs sources of that maker where exhausted. This is useful for iterating all document of a source, in case we don't know in advance how many docs there are.

    RandomFacetSource

    Simple implementation of a random facet source.

    ReutersContentSource

    A ContentSource reading from the Reuters collection.

    Config properties:

    • work.dirpath to the root of docs and indexes dirs (default work).
    • docs.dirpath to the docs dir (default reuters-out).

    ReutersQueryMaker

    A IQueryMaker that makes queries devised manually (by Grant Ingersoll) for searching in the Reuters collection.

    SimpleLineParser

    LineParser which ignores the header passed to its constructor and assumes simply that field names and their order are the same as in DEFAULT_FIELDS.

    SimpleQueryMaker

    A IQueryMaker that makes queries for a collection created using SingleDocSource.

    SimpleSloppyPhraseQueryMaker

    Create sloppy phrase queries for performance test, in an index created using simple doc maker.

    SingleDocSource

    Creates the same document each time GetNextDocData(DocData) is called.

    SortableSingleDocSource

    Adds fields appropriate for sorting: country, random_string and sort_field (int). Supports the following parameters:

    • sort.rngdefines the range for sort-by-int field (default 20000).
    • rand.seeddefines the seed to initialize Random with (default 13).

    SpatialDocMaker

    Indexes spatial data according to a configured SpatialStrategy with optional shape transformation via a configured IShapeConverter. The converter can turn points into circles and bounding boxes, in order to vary the type of indexing performance tests. Unless it's subclass-ed to do otherwise, this class configures a , SpatialPrefixTree, and RecursivePrefixTreeStrategy. The Strategy is made available to a query maker via the static method GetSpatialStrategy(Int32). See spatial.alg for a listing of spatial parameters, in particular those starting with "spatial." and "doc.spatial".

    SpatialFileQueryMaker

    Reads spatial data from the body field docs from an internally created LineDocSource. It's parsed by and then further manipulated via a configurable IShapeConverter. When using point data, it's likely you'll want to configure the shape converter so that the query shapes actually cover a region. The queries are all created & cached in advance. This query maker works in conjunction with SpatialDocMaker. See spatial.alg for a listing of options, in particular the options starting with "query.".

    TrecContentSource

    Implements a ContentSource over the TREC collection.

    TrecDocParser

    Parser for trec doc content, invoked on doc text excluding <DOC> and <DOCNO> which are handled in TrecContentSource. Required to be stateless and hence thread safe.

    TrecFBISParser

    Parser for the FBIS docs in trec disks 4+5 collection format

    TrecFR94Parser

    Parser for the FR94 docs in trec disks 4+5 collection format

    TrecFTParser

    Parser for the FT docs in trec disks 4+5 collection format

    TrecGov2Parser

    Parser for the GOV2 collection format

    TrecLATimesParser

    Parser for the FT docs in trec disks 4+5 collection format

    TrecParserByPath

    Parser for trec docs which selects the parser to apply according to the source files path, defaulting to TrecGov2Parser.

    Interfaces

    IHTMLParser

    HTML Parsing Interface for test purposes.

    IQueryMaker

    Create queries for the test.

    IShapeConverter

    Converts one shape to another. Created by MakeShapeConverter(SpatialStrategy, Config, String).

    Enums

    TrecDocParser.ParsePathType

    Types of trec parse paths,

    • Improve this Doc
    Back to top Copyright © 2020 Licensed to the Apache Software Foundation (ASF)