Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class ExtractReuters

    Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body

    Inheritance
    object
    ExtractReuters
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Benchmarks.Utils
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class ExtractReuters

    Constructors

    ExtractReuters(DirectoryInfo, DirectoryInfo)

    Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body

    Declaration
    public ExtractReuters(DirectoryInfo reutersDir, DirectoryInfo outputDir)
    Parameters
    Type Name Description
    DirectoryInfo reutersDir
    DirectoryInfo outputDir

    Methods

    Extract()

    Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body

    Declaration
    public virtual void Extract()

    ExtractFile(FileInfo)

    Override if you wish to change what is extracted

    Declaration
    protected virtual void ExtractFile(FileInfo sgmFile)
    Parameters
    Type Name Description
    FileInfo sgmFile

    Main(string[])

    LUCENENET specific: In the Java implementation, this Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to this method: benchmark extract-reuters

    Declaration
    public static void Main(string[] args)
    Parameters
    Type Name Description
    string[] args

    The command line arguments

    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.