Class ExtractReuters
Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
Inherited Members
Namespace: Lucene.Net.Benchmarks.Utils
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class ExtractReuters
Constructors
ExtractReuters(DirectoryInfo, DirectoryInfo)
Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
Declaration
public ExtractReuters(DirectoryInfo reutersDir, DirectoryInfo outputDir)
Parameters
Type | Name | Description |
---|---|---|
DirectoryInfo | reutersDir | |
DirectoryInfo | outputDir |
Methods
Extract()
Split the Reuters SGML documents into Simple Text files containing: Title, Date, Dateline, Body
Declaration
public virtual void Extract()
ExtractFile(FileInfo)
Override if you wish to change what is extracted
Declaration
protected virtual void ExtractFile(FileInfo sgmFile)
Parameters
Type | Name | Description |
---|---|---|
FileInfo | sgmFile |
Main(string[])
LUCENENET specific: In the Java implementation, this Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to this method: benchmark extract-reuters
Declaration
public static void Main(string[] args)
Parameters
Type | Name | Description |
---|---|---|
string[] | args | The command line arguments |