Class HTMLScanner

This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.

Inheritance

System.Object

HTMLScanner

Implements

IScanner

ILocator

Namespace: TagSoup

Assembly: Lucene.Net.Benchmark.dll

Syntax

public class HTMLScanner : object, IScanner, ILocator

Constructors

| Improve this Doc View Source

HTMLScanner()

Declaration

public HTMLScanner()

Properties

| Improve this Doc View Source

ColumnNumber

Declaration

public virtual int ColumnNumber { get; }

Property Value

Type	Description
System.Int32

| Improve this Doc View Source

LineNumber

Declaration

public virtual int LineNumber { get; }

Property Value

Type	Description
System.Int32

| Improve this Doc View Source

PublicId

Declaration

public virtual string PublicId { get; }

Property Value

Type	Description
System.String

| Improve this Doc View Source

SystemId

Declaration

public virtual string SystemId { get; }

Property Value

Type	Description
System.String

Methods

| Improve this Doc View Source

ResetDocumentLocator(String, String)

Reset document locator, supplying systemid and publicid.

Declaration

public virtual void ResetDocumentLocator(string publicid, string systemid)

Parameters

Type	Name	Description
System.String	publicid	Public id
System.String	systemid	System id

| Improve this Doc View Source

Scan(TextReader, IScanHandler)

Scan HTML source, reporting lexical events.

Declaration

public virtual void Scan(TextReader r, IScanHandler h)

Parameters

Type	Name	Description
TextReader	r	Reader that provides characters
IScanHandler	h	ScanHandler that accepts lexical events.

| Improve this Doc View Source

StartCDATA()

A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.

Declaration

public virtual void StartCDATA()

Implements

IScanner

ILocator