Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class WriteLineDocTask

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Inheritance
    object
    PerfTask
    WriteLineDocTask
    WriteEnwikiLineDocTask
    Implements
    IDisposable
    Inherited Members
    PerfTask.m_logStep
    PerfTask.m_params
    PerfTask.NEW_LINE
    PerfTask.SetRunInBackground(int)
    PerfTask.RunInBackground
    PerfTask.BackgroundDeltaPriority
    PerfTask.Stop
    PerfTask.StopNow()
    PerfTask.Clone()
    PerfTask.Dispose()
    PerfTask.RunAndMaybeStats(bool)
    PerfTask.GetName()
    PerfTask.SetName(string)
    PerfTask.RunData
    PerfTask.Depth
    PerfTask.ToString()
    PerfTask.ShouldNeverLogAtStart
    PerfTask.ShouldNotRecordStats
    PerfTask.Setup()
    PerfTask.TearDown()
    PerfTask.Params
    PerfTask.DisableCounting
    PerfTask.AlgLineNum
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    Namespace: Lucene.Net.Benchmarks.ByTask.Tasks
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class WriteLineDocTask : PerfTask, IDisposable
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    Constructors

    WriteLineDocTask(PerfRunData)

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    public WriteLineDocTask(PerfRunData runData)
    Parameters
    Type Name Description
    PerfRunData runData
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    WriteLineDocTask(PerfRunData, bool)

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    public WriteLineDocTask(PerfRunData runData, bool performWriteHeader)
    Parameters
    Type Name Description
    PerfRunData runData
    bool performWriteHeader
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    Fields

    DEFAULT_FIELDS

    Fields to be written by default

    Declaration
    public static readonly string[] DEFAULT_FIELDS
    Field Value
    Type Description
    string[]
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    DEFAULT_SUFFICIENT_FIELDS

    Default fields which at least one of them is required to not skip the doc.

    Declaration
    public static readonly string DEFAULT_SUFFICIENT_FIELDS
    Field Value
    Type Description
    string
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    FIELDS_HEADER_INDICATOR

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    public const string FIELDS_HEADER_INDICATOR = "FIELDS_HEADER_INDICATOR###"
    Field Value
    Type Description
    string
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    SEP

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    public const char SEP = '\t'
    Field Value
    Type Description
    char
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    m_fname

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    protected readonly string m_fname
    Field Value
    Type Description
    string
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    m_lineFileOut

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    protected readonly TextWriter m_lineFileOut
    Field Value
    Type Description
    TextWriter
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    Properties

    SupportsParams

    Sub classes that support parameters must override this method to return true if this task supports command line params.

    Declaration
    public override bool SupportsParams { get; }
    Property Value
    Type Description
    bool
    Overrides
    PerfTask.SupportsParams
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    Methods

    Dispose(bool)

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    protected override void Dispose(bool disposing)
    Parameters
    Type Name Description
    bool disposing
    Overrides
    PerfTask.Dispose(bool)
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    DoLogic()

    Perform the task once (ignoring repetitions specification). Return number of work items done by this task. For indexing that can be number of docs added. For warming that can be number of scanned items, etc.

    Declaration
    public override int DoLogic()
    Returns
    Type Description
    int

    Number of work items done by this task.

    Overrides
    PerfTask.DoLogic()
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    GetLogMessage(int)

    A task which writes documents, one line per document. Each line is in the following format: title <TAB> date <TAB> body. The output of this task can be consumed by LineDocSource and is intended to save the IO overhead of opening a file per document to be indexed.

    Declaration
    protected override string GetLogMessage(int recsCount)
    Parameters
    Type Name Description
    int recsCount
    Returns
    Type Description
    string
    Overrides
    PerfTask.GetLogMessage(int)
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    LineFileOut(Document)

    Selects output line file by written doc. Default: original output line file.

    Declaration
    protected virtual TextWriter LineFileOut(Document doc)
    Parameters
    Type Name Description
    Document doc
    Returns
    Type Description
    TextWriter
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    SetParams(string)

    Set the params (docSize only)

    Declaration
    public override void SetParams(string @params)
    Parameters
    Type Name Description
    string params

    docSize, or 0 for no limit.

    Overrides
    PerfTask.SetParams(string)
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    WriteHeader(TextWriter)

    Write header to the lines file - indicating how to read the file later.

    Declaration
    protected virtual void WriteHeader(TextWriter @out)
    Parameters
    Type Name Description
    TextWriter out
    Remarks

    The format of the output is set according to the output file extension. Compression is recommended when the output file is expected to be large. See info on file extensions in FileType.

    Supports the following parameters:
    • line.file.outthe name of the file to write the output to. That parameter is mandatory. NOTE: the file is re-created.
    • line.fieldswhich fields should be written in each line. (optional, default: DEFAULT_FIELDS).
    • sufficient.fields list of field names, separated by comma, which, if all of them are missing, the document will be skipped. For example, to require that at least one of f1,f2 is not empty, specify: "f1,f2" in this field. To specify that no field is required, i.e. that even empty docs should be emitted, specify "," (optional, default: DEFAULT_SUFFICIENT_FIELDS).

    NOTE: this class is not thread-safe and if used by multiple threads the output is unspecified (as all will write to the same output file in a non-synchronized way).

    Implements

    IDisposable
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.