Class DemoHTMLParser
Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.
Implements
Inherited Members
Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class DemoHTMLParser : IHTMLParser
Methods
Parse(DocData, string, DateTime?, InputSource, TrecContentSource)
Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.
Declaration
public virtual DocData Parse(DocData docData, string name, DateTime? date, InputSource source, TrecContentSource trecSrc)
Parameters
Type | Name | Description |
---|---|---|
DocData | docData | |
string | name | |
DateTime? | date | |
InputSource | source | |
TrecContentSource | trecSrc |
Returns
Type | Description |
---|---|
DocData |
Parse(DocData, string, DateTime?, TextReader, TrecContentSource)
Parse the input TextReader and return DocData. The provided name, title, date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.
Declaration
public virtual DocData Parse(DocData docData, string name, DateTime? date, TextReader reader, TrecContentSource trecSrc)
Parameters
Type | Name | Description |
---|---|---|
DocData | docData | Result reused. |
string | name | Name of the result doc data. |
DateTime? | date | Date of the result doc data. If null, attempt to set by parsed data. |
TextReader | reader | Reader of html text to parse. |
TrecContentSource | trecSrc | The TrecContentSource used to parse dates. |
Returns
Type | Description |
---|---|
DocData | Parsed doc data. |
Exceptions
Type | Condition |
---|---|
IOException | If there is a low-level I/O error. |