public final class Attribute extends Segment
StartTag
.
An instance of this class is a representation of a single attribute in the source document and is not modifiable.
The OutputDocument.replace(Attributes, Map)
and OutputDocument.replace(Attributes, boolean convertNamesToLowerCase)
methods
provide the means to add, delete or modify attributes and their values in an OutputDocument
.
Obtained using the Attributes.get(String key)
method.
See also the XML 1.0 specification for attributes.
Attributes
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.
|
java.lang.String |
getKey()
Returns the name of this attribute in lower case.
|
java.lang.String |
getName()
Returns the name of this attribute in original case.
|
Segment |
getNameSegment()
Returns the segment spanning the name of this attribute.
|
char |
getQuoteChar()
Returns the character used to quote the value.
|
StartTag |
getStartTag()
Returns the start tag to which this attribute belongs.
|
java.lang.String |
getValue()
|
Segment |
getValueSegment()
|
Segment |
getValueSegmentIncludingQuotes()
Returns the segment spanning the value of this attribute, including quotation marks if any,
or
null if it has no value. |
boolean |
hasValue()
Indicates whether this attribute has a value.
|
charAt, compareTo, encloses, encloses, equals, getAllCharacterReferences, getAllElements, getAllElements, getAllElements, getAllElements, getAllElements, getAllElementsByClass, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTagsByClass, getAllTags, getAllTags, getBegin, getChildElements, getEnd, getFirstElement, getFirstElement, getFirstElement, getFirstElement, getFirstElementByClass, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTagByClass, getFormControls, getFormFields, getMaxDepthIndicator, getNodeIterator, getRenderer, getRowColumnVector, getSource, getStyleURISegments, getTextExtractor, getURIAttributes, hashCode, ignoreWhenParsing, isWhiteSpace, isWhiteSpace, length, parseAttributes, subSequence, toString
public java.lang.String getKey()
This package treats all attribute names as case insensitive, consistent with HTML but not consistent with XHTML.
getName()
public java.lang.String getName()
This is exactly equivalent to getNameSegment()
.toString()
.
getKey()
public Segment getNameSegment()
public boolean hasValue()
This method also returns true
if this attribute has been assigned a zero-length value.
It only returns false
if this attribute appears in
minimized form.
true
if this attribute has a value, otherwise false
.public java.lang.String getValue()
null
if it has no value.
This is equivalent to CharacterReference
.
decode
(
getValueSegment()
,true)
.
To obtain the raw value without decoding, use getValueSegment()
.toString()
.
Special attention should be given to attributes that contain URLs, such as the
href
attribute.
When such an attribute contains a URL with parameters (as described in the
form-urlencoded media type),
the ampersand (&
) characters used to separate the parameters should be
encoded to prevent the parameter names from being
unintentionally interpreted as character entity references.
This requirement is explicitly stated in the
HTML 4.01 specification section 5.3.2.
For example, take the following element in the source document:
<a href="Report.jsp?chapt=2§=3">next</a>
getAttributes()
.
getValue
("href")
on this element returns the string
"Report.jsp?chapt=2§=3
", since the text "§
" is interpreted as the rarely used
character entity reference §
(U+00A7), despite the fact that it is
missing the terminating semicolon (;
).
Most browsers recognise unterminated character entity references in attribute values representing a codepoint of U+00FF or below, but ignore those representing codepoints above this value. One relatively popular browser only recognises those representing a codepoint of U+003E or below, meaning it would have interpreted the URL in the above example differently to most other browsers. Most browsers also use different rules depending on whether the unterminated character reference is inside or outside of an attribute value, with both of these possibilities further split into different rules for character entity references, decimal character references, and hexadecimal character references.
The behaviour of this library is determined by the current compatibility mode setting,
which is determined by the static Config.CurrentCompatibilityMode
property.
null
if it has no value.public Segment getValueSegment()
null
if it has no value.getValue()
public Segment getValueSegmentIncludingQuotes()
null
if it has no value.
If the value is not enclosed by quotation marks, this is the same as the value segment
null
if it has no value.public char getQuoteChar()
The return value is either a double-quote ("
), a single-quote ('
), or a space.
public StartTag getStartTag()
null
if it is not within a start tag.public java.lang.String getDebugInfo()
getDebugInfo
in class Segment