Class HTMLScanner
This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
Inheritance
System.Object
HTMLScanner
Namespace: TagSoup
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class HTMLScanner : object, IScanner, ILocator
Constructors
| Improve this Doc View SourceHTMLScanner()
Declaration
public HTMLScanner()
Properties
| Improve this Doc View SourceColumnNumber
Declaration
public virtual int ColumnNumber { get; }
Property Value
Type | Description |
---|---|
System. |
LineNumber
Declaration
public virtual int LineNumber { get; }
Property Value
Type | Description |
---|---|
System. |
PublicId
Declaration
public virtual string PublicId { get; }
Property Value
Type | Description |
---|---|
System. |
SystemId
Declaration
public virtual string SystemId { get; }
Property Value
Type | Description |
---|---|
System. |
Methods
| Improve this Doc View SourceResetDocumentLocator(String, String)
Reset document locator, supplying systemid and publicid.
Declaration
public virtual void ResetDocumentLocator(string publicid, string systemid)
Parameters
Type | Name | Description |
---|---|---|
System. |
publicid | Public id |
System. |
systemid | System id |
Scan(TextReader, IScanHandler)
Scan HTML source, reporting lexical events.
Declaration
public virtual void Scan(TextReader r, IScanHandler h)
Parameters
Type | Name | Description |
---|---|---|
Text |
r | Reader that provides characters |
IScan |
h | ScanHandler that accepts lexical events. |
StartCDATA()
A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
Declaration
public virtual void StartCDATA()