public final class HTMLElements
extends java.lang.Object
An HTML element is a normal element with a name that matches one of the HTML element names (ignoring case). This type of element spans the logical HTML element as described in the HTML 4.01 specification section 3.2.1, which may be implicitly terminated if it specifies an optional end tag.
The term Non-HTML element refers to a normal element with a name that does not match one of the HTML element names. This type of element must be either a single tag element or explicitly terminated.
All of the sets returned by the methods in this class may be modified to customise the behaviour of the parser. Care must be taken however to ensure that the sets only contain tag names in lower case.
Below is a table summarising the default characteristics of each HTML element. See also the index of elements in the HTML 4.01 specification and draft HTML5 specification for the official tables containing similar information.
HTMLElementName
,
Element
Modifier and Type | Method and Description |
---|---|
static java.util.Set<java.lang.String> |
getBlockLevelElementNames()
Returns a set containing the names of all the
block-level elements.
|
static java.util.Set<java.lang.String> |
getDeprecatedElementNames()
Returns a set containing the names of all
deprecated elements in HTML 4.01.
|
static java.util.List<java.lang.String> |
getElementNames()
Returns a list containing all of the HTML element names.
|
static java.util.Set<java.lang.String> |
getEndTagForbiddenElementNames()
|
static java.util.Set<java.lang.String> |
getEndTagOptionalElementNames()
|
static java.util.Set<java.lang.String> |
getEndTagRequiredElementNames()
|
static java.util.Set<java.lang.String> |
getInlineLevelElementNames()
Returns a set containing the names of all the
inline-level elements.
|
static java.util.Set<java.lang.String> |
getNestingForbiddenElementNames()
Returns a set containing the names of all of the HTML elements
which should never contain elements of the same name, either as direct or indirect descendants.
|
static java.util.Set<java.lang.String> |
getNonterminatingElementNames(java.lang.String endTagOptionalElementName)
Returns the names of elements that do NOT implicitly terminate
an HTML element with the specified name.
|
static java.util.Set<java.lang.String> |
getStartTagOptionalElementNames()
|
static java.util.Set<java.lang.String> |
getTerminatingEndTagNames(java.lang.String endTagOptionalElementName)
|
static java.util.Set<java.lang.String> |
getTerminatingStartTagNames(java.lang.String endTagOptionalElementName)
|
public static final java.util.List<java.lang.String> getElementNames()
The returned list is in alphabetical order.
public static java.util.Set<java.lang.String> getBlockLevelElementNames()
The element names contained in this set are:
ADDRESS
,
article
, aside
,
BLOCKQUOTE
, CENTER
,
details
,
DIR
, DIV
,
DL
, FIELDSET
,
footer
,
FORM
,
H1
, H2
, H3
, H4
, H5
, H6
,
header
, hgroup
,
HR
, ISINDEX
, MENU
,
nav
,
NOFRAMES
, NOSCRIPT
,
OL
, P
, PRE
,
section
,
TABLE
, UL
This set is defined in the HTML 4.01 Transitional DTD, but more detailed information can be found in the HTML 4.01 specification section 7.5.3 - Block-level and inline elements and the CSS2 specification section 9.2.1 - Block-level elements and block boxes.
The CSS2 display property can be used to override the normal box type of an element.
getInlineLevelElementNames()
public static java.util.Set<java.lang.String> getInlineLevelElementNames()
The element names contained in this set are:
A
, ABBR
, ACRONYM
, APPLET
, B
, BASEFONT
,
bdi
,
BDO
, BIG
, BR
, BUTTON
, CITE
, CODE
,
DEL
, DFN
, EM
, FONT
, I
, IFRAME
, IMG
,
INPUT
, INS
, KBD
,
keygen
,
LABEL
, MAP
,
mark
, meter
,
OBJECT
,
output
, progress
,
Q
,
rp
, rt
, ruby
,
S
, SAMP
, SCRIPT
, SELECT
, SMALL
,
SPAN
, STRIKE
, STRONG
, SUB
, SUP
, TEXTAREA
,
time
,
TT
, U
, VAR
,
wbr
This set is defined in the HTML 4.01 Transitional DTD, but more detailed information can be found in the HTML 4.01 specification section 7.5.3 - Block-level and inline elements and the CSS2 specification section 9.2.2 - Inline-level elements and inline boxes.
The CSS2 display property can be used to override the normal box type of an element.
The HTML Document Type Definitions
forbid the presence of block-level elements inside inline-level elements,
but it is tolerated by all popular browsers in various situations, even in XHTML documents.
The most notorious example of this is the common inclusion of block-level elements inside FONT
elements.
getBlockLevelElementNames()
public static java.util.Set<java.lang.String> getDeprecatedElementNames()
public static java.util.Set<java.lang.String> getEndTagForbiddenElementNames()
See the element parsing rules for HTML elements with forbidden end tags for more information.
The index of elements in the HTML 4.01 specification includes the letter 'F' in the "End Tag" column for elements whose end tag is forbidden.
getEndTagOptionalElementNames()
,
getEndTagRequiredElementNames()
public static java.util.Set<java.lang.String> getEndTagOptionalElementNames()
Elements with these names may be implicitly terminated by a subsequent
terminating start tag or
terminating end tag.
A list of the these terminating tags, and the names of non-terminating elements
that can be nested within the element, can be found in the documentation of each relevant element in the HTMLElementName
class.
See the element parsing rules for HTML elements with optional end tags for more information.
The index of elements in the HTML 4.01 specification includes the letter 'O' in the "End Tag" column for elements whose end tag is optional.
getEndTagForbiddenElementNames()
,
getEndTagRequiredElementNames()
public static java.util.Set<java.lang.String> getEndTagRequiredElementNames()
See the element parsing rules for HTML elements with required end tags for more information.
The index of elements in the HTML 4.01 specification leaves the "End Tag" column blank for elements whose end tag is required.
getEndTagForbiddenElementNames()
,
getEndTagOptionalElementNames()
public static java.util.Set<java.lang.String> getStartTagOptionalElementNames()
Elements with optional start tags must be present in the document object model (DOM)
in certain locations, either forming part of the structure of the HTML document as a whole
(e.g. the HTML
, HEAD
, and BODY
elements),
or forming part of the structure of a TABLE
element (e.g. the TBODY
element).
The location of an omitted start tag
in the document's object model can be inferred from the surrounding elements.
This library does not use this property in any way when parsing documents, and does not construct a document object model from the source, so no implied element is created where an optional start tag is omitted.
When the start tag has been omitted in the document text, the corresponding end tag should also be omitted.
The index of elements in the HTML 4.01 specification includes the letter 'O' in the "Start Tag" column for elements whose start tag is optional.
public static java.util.Set<java.lang.String> getTerminatingStartTagNames(java.lang.String endTagOptionalElementName)
This method is only relevant to HTML elements for which the
end tag is optional.
It returns null
if
getEndTagOptionalElementNames()
.contains(endTagOptionalElementName.toLowerCase())==null
.
endTagOptionalElementName
- the name of an element for which the end tag is optional.null
if the name does not identify an element for which the end tag is optional.getTerminatingEndTagNames(String endTagOptionalElementName)
,
getNonterminatingElementNames(String endTagOptionalElementName)
public static java.util.Set<java.lang.String> getTerminatingEndTagNames(java.lang.String endTagOptionalElementName)
This method is only relevant to HTML elements for which the
end tag is optional.
It returns null
if
getEndTagOptionalElementNames()
.contains(endTagOptionalElementName.toLowerCase())==null
.
Note that removing the tag name matching the specified element has no effect on the behaviour of the parser, as it is always assumed that a start tag is terminated by an end tag with a matching name.
endTagOptionalElementName
- the name of an element for which the end tag is optional.null
if the name does not identify an element for which the end tag is optional.getTerminatingStartTagNames(String endTagOptionalElementName)
,
getNonterminatingElementNames(String endTagOptionalElementName)
public static java.util.Set<java.lang.String> getNonterminatingElementNames(java.lang.String endTagOptionalElementName)
This method is only relevant to HTML elements for which the
end tag is optional.
It returns null
if
getEndTagOptionalElementNames()
.contains(endTagOptionalElementName.toLowerCase())==null
.
endTagOptionalElementName
- the name of an element for which the end tag is optional.null
if the name does not identify an element for which the end tag is optional.getTerminatingStartTagNames(String endTagOptionalElementName)
,
getTerminatingEndTagNames(String endTagOptionalElementName)
public static java.util.Set<java.lang.String> getNestingForbiddenElementNames()