Class HTMLScanner
This class implements a table-driven scanner for HTML, allowing for lots of defects. It implements the Scanner interface, which accepts a Reader object to fetch characters from and a ScanHandler object to report lexical events to.
Inherited Members
Namespace: TagSoup
Assembly: Lucene.Net.Benchmark.dll
Syntax
public class HTMLScanner : IScanner, ILocator
Properties
ColumnNumber
Return the column number where the current document event ends. This is one-based number of Java
char
values since
the last line end.
Warning: The return value from the method
is intended only as an approximation for the sake of diagnostics;
it is not intended to provide sufficient information
to edit the character content of the original XML document.
For example, when lines contain combining character sequences, wide
characters, surrogate pairs, or bi-directional text, the value may
not correspond to the column in a text editor's display.
The return value is an approximation of the column number
in the document entity or external parsed entity where the
markup triggering the event appears.
If possible, the SAX driver should provide the line position
of the first character after the text associated with the document
event. The first column in each line is column 1.
Returns the column number, or -1 if none is available.
Declaration
public virtual int ColumnNumber { get; }
Property Value
Type | Description |
---|---|
int |
See Also
LineNumber
Return the line number where the current document event ends. Lines are delimited by line ends, which are defined in the XML specification.
Warning: The return value from the method is intended only as an approximation for the sake of diagnostics; it is not intended to provide sufficient information to edit the character content of the original XML document. In some cases, these "line" numbers match what would be displayed as columns, and in others they may not match the source text due to internal entity expansion. The return value is an approximation of the line number in the document entity or external parsed entity where the markup triggering the event appears. If possible, the SAX driver should provide the line position of the first character after the text associated with the document event. The first line is line 1. Returns the line number, or -1 if none is available.Declaration
public virtual int LineNumber { get; }
Property Value
Type | Description |
---|---|
int |
See Also
PublicId
Gets the public identifier for the current document event.
The return value is the public identifier of the document entity or of the external parsed entity in which the markup triggering the event appears. Returns a string containing the public identifier, or null if none is available.Declaration
public virtual string PublicId { get; }
Property Value
Type | Description |
---|---|
string |
See Also
SystemId
Return the system identifier for the current document event.
The return value is the system identifier of the document entity or of the external parsed entity in which the markup triggering the event appears. If the system identifier is a URL, the parser must resolve it fully before passing it to the application. For example, a file name must always be provided as a file:... URL, and other kinds of relative URI are also resolved against their bases. Returns a string containing the system identifier, or null if none is available.Declaration
public virtual string SystemId { get; }
Property Value
Type | Description |
---|---|
string |
See Also
Methods
ResetDocumentLocator(string, string)
Reset document locator, supplying systemId and publicId.
Declaration
public virtual void ResetDocumentLocator(string publicId, string systemId)
Parameters
Type | Name | Description |
---|---|---|
string | publicId | Public id |
string | systemId | System id |
Scan(TextReader, IScanHandler)
Scan HTML source, reporting lexical events.
Declaration
public virtual void Scan(TextReader r, IScanHandler h)
Parameters
Type | Name | Description |
---|---|---|
TextReader | r | Reader that provides characters |
IScanHandler | h | ScanHandler that accepts lexical events. |
StartCDATA()
A callback for the ScanHandler that allows it to force the lexer state to CDATA content (no markup is recognized except the end of element.
Declaration
public virtual void StartCDATA()