public final class StartTag extends Tag
A start tag always has a type that is a subclass of StartTagType
, meaning that any tag
that does not start with the characters '</
' is categorised as a start tag.
This includes many tags which stand alone, without a corresponding end tag, and would not intuitively be categorised as a "start tag". For example, an HTML comment is represented as a single start tag that spans the whole comment, and does not have an end tag at all.
See the static fields defined in the StartTagType
class for a list of the
standard start tag types.
StartTag
instances are obtained using one of the following methods:
Element.getStartTag()
Tag.getNextTag()
Tag.getPreviousTag()
Source.getPreviousStartTag(int pos)
Source.getPreviousStartTag(int pos, String name)
Source.getPreviousTag(int pos)
Source.getPreviousTag(int pos, TagType)
Source.getNextStartTag(int pos)
Source.getNextStartTag(int pos, String name)
Source.getNextStartTag(int pos, String attributeName, String value, boolean valueCaseSensitive)
Source.getNextTag(int pos)
Source.getNextTag(int pos, TagType)
Source.getEnclosingTag(int pos)
Source.getEnclosingTag(int pos, TagType)
Source.getTagAt(int pos)
Segment.getAllStartTags()
Segment.getAllStartTags(String name)
Segment.getAllStartTags(String attributeName, String value, boolean valueCaseSensitive)
Segment.getAllTags()
Segment.getAllTags(TagType)
The methods above which accept a name
parameter are categorised as named search methods.
In such methods dealing with start tags, specifying an argument to the name
parameter that ends in a
colon (:
) searches for all start tags in the specified XML namespace.
The constants defined in the HTMLElementName
interface can be used directly as arguments to these name
parameters.
For example, source.getAllStartTags(
HTMLElementName.A
)
is equivalent to
source.getAllStartTags("a")
, and gets all hyperlink start tags.
The Tag
superclass defines a method called getName()
to get the name of this start tag.
See also the XML 1.0 specification for start tags.
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
generateHTML(java.lang.String tagName,
java.util.Map<java.lang.String,java.lang.String> attributesMap,
boolean emptyElementTag)
Generates the HTML text of a normal start tag with the specified tag name and attributes map.
|
Attributes |
getAttributes()
Returns the attributes specified in this start tag.
|
java.lang.String |
getAttributeValue(java.lang.String attributeName)
Returns the decoded value of the attribute with the specified name (case insensitive).
|
java.lang.String |
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
|
Element |
getElement()
Returns the element that is started by this start tag.
|
FormControl |
getFormControl()
Returns the
FormControl defined by this start tag. |
StartTagType |
getStartTagType()
Returns the type of this start tag.
|
Segment |
getTagContent()
Returns the segment between the end of the tag's name and the start of its end delimiter.
|
TagType |
getTagType()
Returns the type of this tag.
|
boolean |
isEmptyElementTag()
Indicates whether this start tag is an empty-element tag.
|
boolean |
isEndTagForbidden()
Indicates whether a matching end tag is forbidden.
|
boolean |
isEndTagRequired()
Indicates whether a matching end tag is required.
|
boolean |
isSyntacticalEmptyElementTag()
Indicates whether this start tag is syntactically an empty-element tag.
|
boolean |
isUnregistered()
Indicates whether this tag has a syntax that does not match any of the registered tag types.
|
Attributes |
parseAttributes()
Parses the attributes specified in this start tag, regardless of the type of start tag.
|
Attributes |
parseAttributes(int maxErrorCount)
Parses the attributes specified in this start tag, regardless of the type of start tag.
|
java.lang.String |
tidy()
Returns an XML representation of this start tag.
|
java.lang.String |
tidy(boolean toXHTML)
Returns an XML or XHTML representation of this start tag.
|
getName, getNameSegment, getNextTag, getPreviousTag, getUserData, isXMLName, isXMLNameChar, isXMLNameStartChar, setUserData
charAt, compareTo, encloses, encloses, equals, getAllCharacterReferences, getAllElements, getAllElements, getAllElements, getAllElements, getAllElements, getAllElementsByClass, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTagsByClass, getAllTags, getAllTags, getBegin, getChildElements, getEnd, getFirstElement, getFirstElement, getFirstElement, getFirstElement, getFirstElementByClass, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTagByClass, getFormControls, getFormFields, getMaxDepthIndicator, getNodeIterator, getRenderer, getRowColumnVector, getSource, getStyleURISegments, getTextExtractor, getURIAttributes, hashCode, ignoreWhenParsing, isWhiteSpace, isWhiteSpace, length, subSequence, toString
public Element getElement()
null
.
1. <div> 2. <div> 3. <div> 4. <div>This is line 4</div> 5. </div> 6. <div>This is line 6</div> 7. </div>
<div>
element is required,
making the sample code invalid as all the end tags are matched with other start tags.
1. <ul> 2. <li>item 1 3. <li>item 2 4. <ul> 5. <li>subitem 1</li> 6. <li>subitem 2 7. </ul> 8. <li>item 3</li> 9. </ul>
<li>
start tag on line 3.
<li>
start tag on line 8.
</ul>
end tag on line 7.
getElement
in class Tag
public boolean isEmptyElementTag()
This property checks that the the tag is syntactically an empty-element tag, but in addition checks that the name of the tag is not one that is defined in the HTML specification to have a required or an optional end tag, which the major browsers do not recognise as empty-element tags, even in an XHTML document.
This is equivalent to:
isSyntacticalEmptyElementTag()
&& !(
HTMLElements.getEndTagOptionalElementNames()
.contains(
getName()
) ||
HTMLElements.getEndTagRequiredElementNames()
.contains(
getName()
))
.
You can set the static Config.IsHTMLEmptyElementTagRecognised
property to true
to force the parser to recognise all empty-element tags,
making this method is exactly equivalent to isSyntacticalEmptyElementTag()
.
true
if this start tag is an empty-element tag, otherwise false
.public boolean isSyntacticalEmptyElementTag()
This is signified by the characters "/>" at the end of the start tag.
Only a normal start tag can be syntactically an empty-element tag.
This property simply reports whether the syntax of the start tag is consistent with that of an empty-element tag, it does not guarantee that this start tag's element is actually empty.
This possible discrepancy reflects the way major browsers interpret illegal empty element tags used in
HTML elements, and is explained further in the documentation of the
isEmptyElementTag()
property.
true
if this start tag is syntactically an empty-element tag, otherwise false
.isEmptyElementTag()
public StartTagType getStartTagType()
This is equivalent to (StartTagType)
getTagType()
.
public TagType getTagType()
Tag
getTagType
in class Tag
public Attributes getAttributes()
Return value is not null
if and only if
getStartTagType()
.
hasAttributes()
==true
.
To force the parsing of attributes in other start tag types, use the parseAttributes()
method instead.
null
if the type of this start tag does not have attributes.parseAttributes()
,
Source.parseAttributes(int pos, int maxEnd)
public java.lang.String getAttributeValue(java.lang.String attributeName)
Returns null
if this start tag does not have attributes,
no attribute with the specified name exists or the attribute has no value.
This is equivalent to getAttributes()
.
getValue(attributeName)
,
except that it returns null
if this start tag does not have attributes instead of throwing a
NullPointerException
.
attributeName
- the name of the attribute to get.null
if the attribute does not exist or has no value.public Attributes parseAttributes()
This method returns the cached attributes from the getAttributes()
method
if its value is not null
, otherwise the source is physically parsed with each call to this method.
This is equivalent to parseAttributes
(
Attributes.getDefaultMaxErrorCount()
)}
.
parseAttributes
in class Segment
null
if too many errors occur while parsing.getAttributes()
,
Source.parseAttributes(int pos, int maxEnd)
public Attributes parseAttributes(int maxErrorCount)
See the documentation of the parseAttributes()
method for more information.
maxErrorCount
- the maximum number of minor errors allowed while parsingnull
if too many errors occur while parsing.getAttributes()
public Segment getTagContent()
This method is normally only of use for start tags whose content is something other than attributes.
A new Segment
object is created with each call to this method.
public FormControl getFormControl()
FormControl
defined by this start tag.
This is equivalent to getElement()
.
getFormControl()
.
FormControl
defined by this start tag, or null
if it is not a control.public boolean isEndTagForbidden()
This property returns true
if one of the following conditions is met:
If this property returns true
then this start tag's element will always be a
single tag element.
true
if a matching end tag is forbidden, otherwise false
.public boolean isEndTagRequired()
This property returns true
if one of the following conditions is met:
StartTagType.NORMAL
, but specifies a
corresponding end tag type.
true
if a matching end tag is required, otherwise false
.public boolean isUnregistered()
Tag
The only requirement of an unregistered tag type is that it starts with
'<
' and there is a closing '>
' character
at some position after it in the source document.
The absence or presence of a '/
' character after the initial '<
' determines whether an
unregistered tag is respectively a
StartTag
with a type of StartTagType.UNREGISTERED
or an
EndTag
with a type of EndTagType.UNREGISTERED
.
There are no restrictions on the characters that might appear between these delimiters, including other '<
'
characters. This may result in a '>
' character that is identified as the closing delimiter of two
separate tags, one an unregistered tag, and the other a tag of any type that begins in the middle
of the unregistered tag. As explained below, unregistered tags are usually only found when specifically looking for them,
so it is up to the user to detect and deal with any such nonsensical results.
Unregistered tags are only returned by the Source.getTagAt(int pos)
method,
named search methods, where the specified name
matches the first characters inside the tag, and by tag type search methods, where the
specified tagType
is either StartTagType.UNREGISTERED
or EndTagType.UNREGISTERED
.
Open tag searches and other searches always ignore unregistered tags, although every discovery of an unregistered tag is logged by the parser.
The logic behind this design is that unregistered tag types are usually the result of a '<
' character
in the text that was mistakenly left unencoded, or a less-than
operator inside a script, or some other occurrence which is of no interest to the user.
By returning unregistered tags in named and tag type
search methods, the library allows the user to specifically search for tags with a certain syntax that does not match any
existing TagType
. This expediency feature avoids the need for the user to create a
custom tag type to define the syntax before searching for these tags.
By not returning unregistered tags in the less specific search methods, it is providing only the information that
most users are interested in.
isUnregistered
in class Tag
true
if this tag has a syntax that does not match any of the registered tag types, otherwise false
.public java.lang.String tidy()
This is equivalent to tidy(false)
, thereby keeping the name of the tag in its original case.
See the documentation of the tidy(boolean toXHTML)
method for more details.
tidy
in class Tag
public java.lang.String tidy(boolean toXHTML)
The tidying of the tag is carried out as follows:
toXHTML
argument is true
and this is a normal start tag
The toXHTML
parameter determines only whether the name is converted to lower case for normal tags.
In all other respects the generated tag is already valid XHTML.
The following source text:
<INPUT name=Company value='Günter O&#39;Reilly & Associés'>
produces the following regenerated HTML:
<input name="Company" value="Günter O'Reilly & Associés" />
toXHTML
- specifies whether the output is XHTML.public static java.lang.String generateHTML(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributesMap, boolean emptyElementTag)
The output of the attributes is as described in the Attributes.generateHTML(Map attributesMap)
method.
The emptyElementTag
parameter specifies whether the start tag should be an
empty-element tag,
in which case a slash is inserted before the closing angle bracket, separated from the name
or last attribute by a single space.
The following code:
generates the following output:LinkedHashMap attributesMap=new LinkedHashMap(); attributesMap.put("name","Company"); attributesMap.put("value","G\n00fcnter O'Reilly & Associés"); System.out.println(StartTag.generateHTML("INPUT",attributesMap,true));
<INPUT name="Company" value="Günter O'Reilly & Associés" />
tagName
- the name of the start tag.attributesMap
- a map containing attribute name/value pairs.emptyElementTag
- specifies whether the start tag should be an empty-element tag.EndTag.generateHTML(String tagName)
public java.lang.String getDebugInfo()
Segment
getDebugInfo
in class Segment