Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class DemoHTMLParser

    Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.

    Inheritance
    object
    DemoHTMLParser
    Implements
    IHTMLParser
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: Lucene.Net.Benchmarks.ByTask.Feeds
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class DemoHTMLParser : IHTMLParser

    Methods

    Parse(DocData, string, DateTime?, InputSource, TrecContentSource)

    Simple HTML Parser extracting title, meta tags, and body text that is based on NekoHTML.

    Declaration
    public virtual DocData Parse(DocData docData, string name, DateTime? date, InputSource source, TrecContentSource trecSrc)
    Parameters
    Type Name Description
    DocData docData
    string name
    DateTime? date
    InputSource source
    TrecContentSource trecSrc
    Returns
    Type Description
    DocData

    Parse(DocData, string, DateTime?, TextReader, TrecContentSource)

    Parse the input TextReader and return DocData. The provided name, title, date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.

    Declaration
    public virtual DocData Parse(DocData docData, string name, DateTime? date, TextReader reader, TrecContentSource trecSrc)
    Parameters
    Type Name Description
    DocData docData

    Result reused.

    string name

    Name of the result doc data.

    DateTime? date

    Date of the result doc data. If null, attempt to set by parsed data.

    TextReader reader

    Reader of html text to parse.

    TrecContentSource trecSrc

    The TrecContentSource used to parse dates.

    Returns
    Type Description
    DocData

    Parsed doc data.

    Exceptions
    Type Condition
    IOException

    If there is a low-level I/O error.

    Implements

    IHTMLParser
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.