Fork me on GitHub
  • API

    Show / Hide Table of Contents

    Class HTMLScanner

    This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.

    Inheritance
    object
    HTMLScanner
    Implements
    IScanner
    ILocator
    Inherited Members
    object.Equals(object)
    object.Equals(object, object)
    object.GetHashCode()
    object.GetType()
    object.MemberwiseClone()
    object.ReferenceEquals(object, object)
    object.ToString()
    Namespace: TagSoup
    Assembly: Lucene.Net.Benchmark.dll
    Syntax
    public class HTMLScanner : IScanner, ILocator

    Properties

    ColumnNumber

    Return the column number where the current document event ends. This is one-based number of Java

    char
    values since the last line end.

    Warning: The return value from the method is intended only as an approximation for the sake of diagnostics; it is not intended to provide sufficient information to edit the character content of the original XML document. For example, when lines contain combining character sequences, wide characters, surrogate pairs, or bi-directional text, the value may not correspond to the column in a text editor's display.

    The return value is an approximation of the column number in the document entity or external parsed entity where the markup triggering the event appears.

    If possible, the SAX driver should provide the line position of the first character after the text associated with the document event. The first column in each line is column 1.

    Returns the column number, or -1 if none is available.
    Declaration
    public virtual int ColumnNumber { get; }
    Property Value
    Type Description
    int
    See Also
    LineNumber

    LineNumber

    Return the line number where the current document event ends. Lines are delimited by line ends, which are defined in the XML specification.

    Warning: The return value from the method is intended only as an approximation for the sake of diagnostics; it is not intended to provide sufficient information to edit the character content of the original XML document. In some cases, these "line" numbers match what would be displayed as columns, and in others they may not match the source text due to internal entity expansion.

    The return value is an approximation of the line number in the document entity or external parsed entity where the markup triggering the event appears.

    If possible, the SAX driver should provide the line position of the first character after the text associated with the document event. The first line is line 1.

    Returns the line number, or -1 if none is available.
    Declaration
    public virtual int LineNumber { get; }
    Property Value
    Type Description
    int
    See Also
    ColumnNumber

    PublicId

    Gets the public identifier for the current document event.

    The return value is the public identifier of the document entity or of the external parsed entity in which the markup triggering the event appears.

    Returns a string containing the public identifier, or null if none is available.
    Declaration
    public virtual string PublicId { get; }
    Property Value
    Type Description
    string
    See Also
    SystemId

    SystemId

    Return the system identifier for the current document event.

    The return value is the system identifier of the document entity or of the external parsed entity in which the markup triggering the event appears.

    If the system identifier is a URL, the parser must resolve it fully before passing it to the application. For example, a file name must always be provided as a file:... URL, and other kinds of relative URI are also resolved against their bases.

    Returns a string containing the system identifier, or null if none is available.
    Declaration
    public virtual string SystemId { get; }
    Property Value
    Type Description
    string
    See Also
    PublicId

    Methods

    ResetDocumentLocator(string, string)

    Reset document locator, supplying systemId and publicId.

    Declaration
    public virtual void ResetDocumentLocator(string publicId, string systemId)
    Parameters
    Type Name Description
    string publicId

    Public id

    string systemId

    System id

    Scan(TextReader, IScanHandler)

    Scan HTML source, reporting lexical events.

    Declaration
    public virtual void Scan(TextReader r, IScanHandler h)
    Parameters
    Type Name Description
    TextReader r

    Reader that provides characters

    IScanHandler h

    ScanHandler that accepts lexical events.

    StartCDATA()

    A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.

    Declaration
    public virtual void StartCDATA()

    Implements

    IScanner
    ILocator
    Back to top Copyright © 2024 The Apache Software Foundation, Licensed under the Apache License, Version 2.0
    Apache Lucene.Net, Lucene.Net, Apache, the Apache feather logo, and the Apache Lucene.Net project logo are trademarks of The Apache Software Foundation.
    All other marks mentioned may be trademarks or registered trademarks of their respective owners.