Interface | Description |
---|---|
CharStreamSource |
Represents a character stream source.
|
Config.CharacterReferenceEncodingBehaviour |
Specifies the interface for defining character reference encoding behaviour.
|
HTMLElementName |
Contains static fields representing the names of
all elements defined in the HTML 4.01 specification
and the draft HTML 5 specification.
|
Logger |
Defines the interface for handling log messages.
|
LoggerProvider | |
OutputSegment |
Defines the interface for an output segment, which is used in an
OutputDocument to
replace segments of the source document with other text. |
ParseText |
Represents the text from the source document that is to be parsed.
|
Class | Description |
---|---|
Attribute | |
Attributes | |
BasicLogFormatter |
Provides basic formatting for log messages.
|
CharacterEntityReference |
Represents an HTML Character Entity Reference.
|
CharacterReference |
Represents an HTML Character Reference,
implemented by the subclasses
CharacterEntityReference and NumericCharacterReference . |
CharStreamSourceUtil |
Contains static utility methods for manipulating the way data is retrieved from a
CharStreamSource object. |
Config |
Encapsulates global configuration properties which determine the behaviour of various functions.
|
Config.CompatibilityMode |
Represents a set of configuration parameters that relate to
user agent compatibility issues.
|
Element | |
EndTag | |
EndTagType |
Defines the syntax for an end tag type.
|
EndTagTypeGenericImplementation |
Provides a generic implementation of the abstract
EndTagType class based on the most common end tag behaviour. |
FormControl |
Represents an HTML form control.
|
FormControlOutputStyle.ConfigDisplayValue |
Contains static properties that configure the
FormControlOutputStyle.DISPLAY_VALUE form control output style. |
FormField |
Represents a field in an HTML form,
a field being defined as the group of all form controls
having the same name.
|
FormFields |
Represents a collection of
FormField objects. |
HTMLElements |
Contains static methods which group HTML element names by the characteristics of their associated
elements.
|
MasonTagTypes | |
MicrosoftConditionalCommentTagTypes |
Contains tag types representing Microsoft® conditional comments.
|
MicrosoftTagTypes | Deprecated
Use the tag types defined in
MicrosoftConditionalCommentTagTypes instead. |
NumericCharacterReference |
Represents an HTML Numeric Character Reference.
|
OutputDocument | |
PHPTagTypes | |
Renderer |
Performs a simple rendering of HTML markup into text.
|
RowColumnVector |
Represents the row and column number of a character position in the source document.
|
Segment |
Represents a segment of a
Source document. |
Source |
Represents a source HTML document.
|
SourceCompactor |
Compacts HTML source by removing all unnecessary white space.
|
SourceFormatter |
Formats HTML source by laying out each non-inline-level element on a new line with an appropriate indent.
|
StartTag | |
StartTagType |
Defines the syntax for a start tag type.
|
StartTagTypeGenericImplementation |
Provides a generic implementation of the abstract
StartTagType class based on the most common start tag behaviour. |
StreamedSource |
Represents a streamed source HTML document.
|
Tag | |
TagType |
Defines the syntax for a tag type that can be recognised by the parser.
|
TextExtractor |
Extracts the textual content from HTML markup.
|
Util |
Contains miscellaneous utility methods not directly associated with the HTML Parser library.
|
WriterLogger |
Provides an implementation of the
Logger interface that sends output to the specified java.io.Writer . |
Enum | Description |
---|---|
FormControlOutputStyle |
An enumerated type representing the three major output styles of a form control's
output element.
|
FormControlType |
Represents the control type
of a
FormControl . |
A java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML. Also provides high-level HTML form manipulation functions.
For an introduction to the API, the documentation of the Source
class is the best place to start.
For a summary of features and sample applications, visit the homepage at http://jerichohtml.sourceforge.net
For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/
The Jericho HTML Parser is an open source library released under both the Eclipse Public License (EPL) and GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in either one of these licence documents.