Jericho HTML Parser at SourceForge.net
Tweed Coast IT Services
privacy policy

net.htmlparser.jericho
Class Segment

java.lang.Object
  extended by Segment
All Implemented Interfaces:
java.lang.CharSequence, java.lang.Comparable<Segment>
Direct Known Subclasses:
Attribute, CharacterReference, Element, FormControl, net.htmlparser.jericho.nodoc.SequentialListSegment, Source, Tag

public class Segment
extends java.lang.Object
implements java.lang.Comparable<Segment>, java.lang.CharSequence

Represents a segment of a Source document.

Many of the tag search methods are defined in this class.

The span of a segment is defined by the combination of its begin and end character positions.


Constructor Summary
Segment(Source source, int begin, int end)
          Constructs a new Segment within the specified source document with the specified begin and end character positions.
 
Method Summary
 char charAt(int index)
          Returns the character at the specified index.
 int compareTo(Segment segment)
          Compares this Segment object to another object.
 boolean encloses(int pos)
          Indicates whether this segment encloses the specified character position in the source document.
 boolean encloses(Segment segment)
          Indicates whether this Segment encloses the specified Segment.
 boolean equals(java.lang.Object object)
          Compares the specified object with this Segment for equality.
 java.util.List<CharacterReference> getAllCharacterReferences()
          Returns a list of all CharacterReference objects that are enclosed by this segment.
 java.util.List<Element> getAllElements()
          Returns a list of all Element objects that are enclosed by this segment.
 java.util.List<Element> getAllElements(StartTagType startTagType)
          Returns a list of all Element objects with start tags of the specified type that are enclosed by this segment.
 java.util.List<Element> getAllElements(java.lang.String name)
          Returns a list of all Element objects with the specified name that are enclosed by this segment.
 java.util.List<Element> getAllElements(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
          Returns a list of all Element objects with the specified attribute name and value pattern that are enclosed by this segment.
 java.util.List<Element> getAllElements(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
          Returns a list of all Element objects with the specified attribute name/value pair that are enclosed by this segment.
 java.util.List<Element> getAllElementsByClass(java.lang.String className)
          Returns a list of all Element objects with the specified class that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTags()
          Returns a list of all StartTag objects that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTags(StartTagType startTagType)
          Returns a list of all StartTag objects of the specified type that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTags(java.lang.String name)
          Returns a list of all normal StartTag objects with the specified name that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTags(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
          Returns a list of all StartTag objects with the specified attribute name and value pattern that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTags(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
          Returns a list of all StartTag objects with the specified attribute name/value pair that are enclosed by this segment.
 java.util.List<StartTag> getAllStartTagsByClass(java.lang.String className)
          Returns a list of all StartTag objects with the specified class that are enclosed by this segment.
 java.util.List<Tag> getAllTags()
          Returns a list of all Tag objects that are enclosed by this segment.
 java.util.List<Tag> getAllTags(TagType tagType)
          Returns a list of all Tag objects of the specified type that are enclosed by this segment.
 int getBegin()
          Returns the character position in the Source document at which this segment begins, inclusive.
 java.util.List<Element> getChildElements()
          Returns a list of the immediate children of this segment in the document element hierarchy.
 java.lang.String getDebugInfo()
          Returns a string representation of this object useful for debugging purposes.
 int getEnd()
          Returns the character position in the Source document immediately after the end of this segment.
 Element getFirstElement()
          Returns the first Element enclosed by this segment.
 Element getFirstElement(java.lang.String name)
          Returns the first normal Element with the specified name enclosed by this segment.
 Element getFirstElement(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
          Returns the first Element with the specified attribute name and value pattern that is enclosed by this segment.
 Element getFirstElement(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
          Returns the first Element with the specified attribute name/value pair enclosed by this segment.
 Element getFirstElementByClass(java.lang.String className)
          Returns the first Element with the specified class that is enclosed by this segment.
 StartTag getFirstStartTag()
          Returns the first StartTag enclosed by this segment.
 StartTag getFirstStartTag(StartTagType startTagType)
          Returns the first StartTag of the specified type enclosed by this segment.
 StartTag getFirstStartTag(java.lang.String name)
          Returns the first normal StartTag enclosed by this segment.
 StartTag getFirstStartTag(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
          Returns the first StartTag with the specified attribute name and value pattern that is enclosed by this segment.
 StartTag getFirstStartTag(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
          Returns the first StartTag with the specified attribute name/value pair enclosed by this segment.
 StartTag getFirstStartTagByClass(java.lang.String className)
          Returns the first StartTag with the specified class that is enclosed by this segment.
 java.util.List<FormControl> getFormControls()
          Returns a list of the FormControl objects that are enclosed by this segment.
 FormFields getFormFields()
          Returns the FormFields object representing all form fields that are enclosed by this segment.
 int getMaxDepthIndicator()
          Returns an indication of the maximum depth of nested elements within this segment.
 java.util.Iterator<Segment> getNodeIterator()
          Returns an iterator over every tag, character reference and plain text segment contained within this segment.
 Renderer getRenderer()
          Performs a simple rendering of the HTML markup in this segment into text.
 RowColumnVector getRowColumnVector()
          Returns a RowColumnVector object representing the row and column number of the start of this segment in the source document.
 Source getSource()
          Returns the Source document containing this segment.
 java.util.List<Segment> getStyleURISegments()
          Returns a list of all URI segments inside the CSS of STYLE elements and style attribute values enclosed by this segment.
 TextExtractor getTextExtractor()
          Extracts the textual content from the HTML markup of this segment.
 java.util.List<Attribute> getURIAttributes()
          Returns a list of all attributes enclosed by this segment that have URI values.
 int hashCode()
          Returns a hash code value for the segment.
 void ignoreWhenParsing()
          Causes the this segment to be ignored when parsing.
 boolean isWhiteSpace()
          Indicates whether this segment consists entirely of white space.
static boolean isWhiteSpace(char ch)
          Indicates whether the specified character is white space.
 int length()
          Returns the length of the segment.
 Attributes parseAttributes()
          Parses any Attributes within this segment.
 java.lang.CharSequence subSequence(int beginIndex, int endIndex)
          Returns a new character sequence that is a subsequence of this sequence.
 java.lang.String toString()
          Returns the source text of this segment as a String.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Segment

public Segment(Source source,
               int begin,
               int end)
Constructs a new Segment within the specified source document with the specified begin and end character positions.

Parameters:
source - the Source document, must not be null.
begin - the character position in the source where this segment begins, inclusive.
end - the character position in the source where this segment ends, exclusive.
Method Detail

getSource

public final Source getSource()
Returns the Source document containing this segment.

If a StreamedSource is in use, this method throws an UnsupportedOperationException.

Returns:
the Source document containing this segment.

getBegin

public final int getBegin()
Returns the character position in the Source document at which this segment begins, inclusive.

Use the Source.getRowColumnVector(int pos) method to determine the row and column numbers corresponding to this character position.

Returns:
the character position in the Source document at which this segment begins, inclusive.

getEnd

public final int getEnd()
Returns the character position in the Source document immediately after the end of this segment.

The character at the position specified by this property is not included in the segment.

Returns:
the character position in the Source document immediately after the end of this segment.
See Also:
getBegin()

equals

public final boolean equals(java.lang.Object object)
Compares the specified object with this Segment for equality.

Returns true if and only if the specified object is also a Segment, and both segments have the same Source, and the same begin and end positions.

Overrides:
equals in class java.lang.Object
Parameters:
object - the object to be compared for equality with this Segment.
Returns:
true if the specified object is equal to this Segment, otherwise false.

hashCode

public int hashCode()
Returns a hash code value for the segment.

The current implementation returns the sum of the begin and end positions, although this is not guaranteed in future versions.

Overrides:
hashCode in class java.lang.Object
Returns:
a hash code value for the segment.

length

public int length()
Returns the length of the segment. This is defined as the number of characters between the begin and end positions.

Specified by:
length in interface java.lang.CharSequence
Returns:
the length of the segment.

encloses

public final boolean encloses(Segment segment)
Indicates whether this Segment encloses the specified Segment.

This is the case if getBegin()<=segment.getBegin() && getEnd()>=segment.getEnd().

Note that a segment encloses itself.

Parameters:
segment - the segment to be tested for being enclosed by this segment.
Returns:
true if this Segment encloses the specified Segment, otherwise false.

encloses

public final boolean encloses(int pos)
Indicates whether this segment encloses the specified character position in the source document.

This is the case if getBegin() <= pos < getEnd().

Parameters:
pos - the position in the Source document.
Returns:
true if this segment encloses the specified character position in the source document, otherwise false.

toString

public java.lang.String toString()
Returns the source text of this segment as a String.

The returned String is newly created with every call to this method, unless this segment is itself an instance of Source.

Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class java.lang.Object
Returns:
the source text of this segment as a String.

getRenderer

public Renderer getRenderer()
Performs a simple rendering of the HTML markup in this segment into text.

The output can be configured by setting any number of properties on the returned Renderer instance before obtaining its output.

Returns:
an instance of Renderer based on this segment.
See Also:
getTextExtractor()

getTextExtractor

public TextExtractor getTextExtractor()
Extracts the textual content from the HTML markup of this segment.

The output can be configured by setting properties on the returned TextExtractor instance before obtaining its output.

Returns:
an instance of TextExtractor based on this segment.
See Also:
getRenderer()

getNodeIterator

public java.util.Iterator<Segment> getNodeIterator()
Returns an iterator over every tag, character reference and plain text segment contained within this segment.

See the Source.iterator() method for a detailed description.

Example:

The following code demonstrates the typical usage of this method to make an exact copy of this segment to writer (assuming no server tags are present):

 for (Iterator<Segment> nodeIterator=segment.getNoteIterator(); nodeIterator.hasNext();) {
   Segment nodeSegment=nodeIterator.next();
   if (nodeSegment instanceof Tag) {
     Tag tag=(Tag)nodeSegment;
     // HANDLE TAG
     // Uncomment the following line to ensure each tag is valid XML:
     // writer.write(tag.tidy()); continue;
   } else if (nodeSegment instanceof CharacterReference) {
     CharacterReference characterReference=(CharacterReference)nodeSegment;
     // HANDLE CHARACTER REFERENCE
     // Uncomment the following line to decode all character references instead of copying them verbatim:
     // characterReference.appendCharTo(writer); continue;
   } else {
     // HANDLE PLAIN TEXT
   }
   // unless specific handling has prevented getting to here, simply output the segment as is:
   writer.write(nodeSegment.toString());
 }

Returns:
an iterator over every tag, character reference and plain text segment contained within this segment.

getAllTags

public java.util.List<Tag> getAllTags()
Returns a list of all Tag objects that are enclosed by this segment.

The Source.fullSequentialParse() method should be called after construction of the Source object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on the Source object itself.

See the Tag class documentation for more details about the behaviour of this method.

Returns:
a list of all Tag objects that are enclosed by this segment.

getAllTags

public java.util.List<Tag> getAllTags(TagType tagType)
Returns a list of all Tag objects of the specified type that are enclosed by this segment.

See the Tag class documentation for more details about the behaviour of this method.

Specifying a null argument to the tagType parameter is equivalent to getAllTags().

Parameters:
tagType - the type of tags to get.
Returns:
a list of all Tag objects of the specified type that are enclosed by this segment.
See Also:
getAllStartTags(StartTagType)

getAllStartTags

public java.util.List<StartTag> getAllStartTags()
Returns a list of all StartTag objects that are enclosed by this segment.

The Source.fullSequentialParse() method should be called after construction of the Source object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on the Source object itself.

See the Tag class documentation for more details about the behaviour of this method.

Returns:
a list of all StartTag objects that are enclosed by this segment.

getAllStartTags

public java.util.List<StartTag> getAllStartTags(StartTagType startTagType)
Returns a list of all StartTag objects of the specified type that are enclosed by this segment.

See the Tag class documentation for more details about the behaviour of this method.

Specifying a null argument to the startTagType parameter is equivalent to getAllStartTags().

Parameters:
startTagType - the type of tags to get.
Returns:
a list of all StartTag objects of the specified type that are enclosed by this segment.

getAllStartTags

public java.util.List<StartTag> getAllStartTags(java.lang.String name)
Returns a list of all normal StartTag objects with the specified name that are enclosed by this segment.

See the Tag class documentation for more details about the behaviour of this method.

Specifying a null argument to the name parameter is equivalent to getAllStartTags(), which may include non-normal start tags.

This method also returns unregistered tags if the specified name is not a valid XML tag name.

Parameters:
name - the name of the start tags to get.
Returns:
a list of all normal StartTag objects with the specified name that are enclosed by this segment.

getAllStartTags

public java.util.List<StartTag> getAllStartTags(java.lang.String attributeName,
                                                java.lang.String value,
                                                boolean valueCaseSensitive)
Returns a list of all StartTag objects with the specified attribute name/value pair that are enclosed by this segment.

See the Tag class documentation for more details about the behaviour of this method.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
value - the value of the specified attribute to search for, must not be null.
valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
Returns:
a list of all StartTag objects with the specified attribute name/value pair that are enclosed by this segment.
See Also:
getAllStartTags(String attributeName, Pattern valueRegexPattern)

getAllStartTags

public java.util.List<StartTag> getAllStartTags(java.lang.String attributeName,
                                                java.util.regex.Pattern valueRegexPattern)
Returns a list of all StartTag objects with the specified attribute name and value pattern that are enclosed by this segment.

Specifying a null argument to the valueRegexPattern parameter performs the search on the attribute name only, without regard to the attribute value. This will also match an attribute that has no value at all.

See the Tag class documentation for more details about the behaviour of this method.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
valueRegexPattern - the regular expression pattern that must match the attribute value, may be null.
Returns:
a list of all StartTag objects with the specified attribute name and value pattern that are enclosed by this segment.
See Also:
getAllStartTags(String attributeName, String value, boolean valueCaseSensitive)

getAllStartTagsByClass

public java.util.List<StartTag> getAllStartTagsByClass(java.lang.String className)
Returns a list of all StartTag objects with the specified class that are enclosed by this segment.

This matches start tags with a class attribute that contains the specified class name, either as an exact match or where the specified class name is one of multiple class names separated by white space in the attribute value.

See the Tag class documentation for more details about the behaviour of this method.

Parameters:
className - the class name (case sensitive) to search for, must not be null.
Returns:
a list of all StartTag objects with the specified class that are enclosed by this segment.

getChildElements

public java.util.List<Element> getChildElements()
Returns a list of the immediate children of this segment in the document element hierarchy.

The returned list may include an element that extends beyond the end of this segment, as long as it begins within this segment.

An element found at the start of this segment is included in the list. Note however that if this segment is an Element, the overriding Element.getChildElements() method is called instead, which only returns the children of the element.

Calling getChildElements() on an Element is much more efficient than calling it on a Segment.

The objects in the list are all of type Element.

The Source.fullSequentialParse() method should be called after construction of the Source object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on the Source object itself.

See the Source.getChildElements() method for more details.

Returns:
the a list of the immediate children of this segment in the document element hierarchy, guaranteed not null.
See Also:
Element.getParentElement()

getAllElements

public java.util.List<Element> getAllElements()
Returns a list of all Element objects that are enclosed by this segment.

The Source.fullSequentialParse() method should be called after construction of the Source object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on the Source object itself.

The elements returned correspond exactly with the start tags returned in the getAllStartTags() method.

If this segment is itself an Element, the result includes this element in the list.

Returns:
a list of all Element objects that are enclosed by this segment.

getAllElements

public java.util.List<Element> getAllElements(java.lang.String name)
Returns a list of all Element objects with the specified name that are enclosed by this segment.

The elements returned correspond with the start tags returned in the getAllStartTags(String name) method, except that elements which are not entirely enclosed by this segment are excluded.

Specifying a null argument to the name parameter is equivalent to getAllElements(), which may include elements of non-normal tags.

This method also returns elements consisting of unregistered tags if the specified name is not a valid XML tag name.

If this segment is itself an Element with the specified name, the result includes this element in the list.

Parameters:
name - the name of the elements to get.
Returns:
a list of all Element objects with the specified name that are enclosed by this segment.

getAllElements

public java.util.List<Element> getAllElements(StartTagType startTagType)
Returns a list of all Element objects with start tags of the specified type that are enclosed by this segment.

The elements returned correspond with the start tags returned in the getAllTags(TagType) method, except that elements which are not entirely enclosed by this segment are excluded.

If this segment is itself an Element with the specified type, the result includes this element in the list.

Parameters:
startTagType - the type of start tags to get, must not be null.
Returns:
a list of all Element objects with start tags of the specified type that are enclosed by this segment.

getAllElements

public java.util.List<Element> getAllElements(java.lang.String attributeName,
                                              java.lang.String value,
                                              boolean valueCaseSensitive)
Returns a list of all Element objects with the specified attribute name/value pair that are enclosed by this segment.

The elements returned correspond with the start tags returned in the getAllStartTags(String attributeName, String value, boolean valueCaseSensitive) method, except that elements which are not entirely enclosed by this segment are excluded.

If this segment is itself an Element with the specified name/value pair, the result includes this element in the list.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
value - the value of the specified attribute to search for, must not be null.
valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
Returns:
a list of all Element objects with the specified attribute name/value pair that are enclosed by this segment.
See Also:
getAllElements(String attributeName, Pattern valueRegexPattern)

getAllElements

public java.util.List<Element> getAllElements(java.lang.String attributeName,
                                              java.util.regex.Pattern valueRegexPattern)
Returns a list of all Element objects with the specified attribute name and value pattern that are enclosed by this segment.

The elements returned correspond with the start tags returned in the getAllStartTags(String attributeName, Pattern valueRegexPattern) method, except that elements which are not entirely enclosed by this segment are excluded.

Specifying a null argument to the valueRegexPattern parameter performs the search on the attribute name only, without regard to the attribute value. This will also match an attribute that has no value at all.

If this segment is itself an Element with the specified attribute name and value pattern, the result includes this element in the list.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
valueRegexPattern - the regular expression pattern that must match the attribute value, may be null.
Returns:
a list of all Element objects with the specified attribute name and value pattern that are enclosed by this segment.
See Also:
getAllElements(String attributeName, String value, boolean valueCaseSensitive)

getAllElementsByClass

public java.util.List<Element> getAllElementsByClass(java.lang.String className)
Returns a list of all Element objects with the specified class that are enclosed by this segment.

This matches elements with a class attribute that contains the specified class name, either as an exact match or where the specified class name is one of multiple class names separated by white space in the attribute value.

The elements returned correspond with the start tags returned in the getAllStartTagsByClass(String className) method, except that elements which are not entirely enclosed by this segment are excluded.

If this segment is itself an Element with the specified class, the result includes this element in the list.

Parameters:
className - the class name (case sensitive) to search for, must not be null.
Returns:
a list of all Element objects with the specified class that are enclosed by this segment.

getAllCharacterReferences

public java.util.List<CharacterReference> getAllCharacterReferences()
Returns a list of all CharacterReference objects that are enclosed by this segment.

Returns:
a list of all CharacterReference objects that are enclosed by this segment.

getURIAttributes

public java.util.List<Attribute> getURIAttributes()
Returns a list of all attributes enclosed by this segment that have URI values.

According to the HTML 4.01 specification, the following attributes have URI values:

HTML element nameAttribute name
Ahref
APPLETcodebase
APPLETarchive
AREAhref
BASEhref
BLOCKQUOTEcite
BODYbackground
FORMaction
FRAMElongdesc
FRAMEsrc
DELcite
HEADprofile
IFRAMElongdesc
IFRAMEsrc
IMGlongdesc
IMGsrc
IMGusemap
INPUTsrc
INPUTusemap
INScite
LINKhref
OBJECTarchive
OBJECTclassid
OBJECTcodebase
OBJECTdata
OBJECTusemap
Qcite
SCRIPTsrc

Attributes from other elements may also be returned if the attribute name matches one of those in the list above.

This method is often used in conjunction with the getStyleURISegments() method in order to find all URIs in a document.

The attributes are returned in order of appearance.

Returns:
a list of all attributes enclosed by this segment that have URI values.
See Also:
getStyleURISegments()

getStyleURISegments

public java.util.List<Segment> getStyleURISegments()
Returns a list of all URI segments inside the CSS of STYLE elements and style attribute values enclosed by this segment.

If this segment does not contain any tags, the entire segment is assumed to be CSS.

The URI segments are found by searching the CSS for the functional notation "url()" as described in section 4.3.4 of the CSS2 specification.

The segments are returned in order of appearance.

Returns:
a list of all URI segments inside STYLE elements and style attribute values enclosed by this segment.
See Also:
getURIAttributes()

getFirstStartTag

public final StartTag getFirstStartTag()
Returns the first StartTag enclosed by this segment.

This is functionally equivalent to getAllStartTags().iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Returns:
the first StartTag enclosed by this segment, or null if none exists.

getFirstStartTag

public final StartTag getFirstStartTag(StartTagType startTagType)
Returns the first StartTag of the specified type enclosed by this segment.

This is functionally equivalent to getAllStartTags(startTagType).iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Parameters:
startTagType - the StartTagType to search for.
Returns:
the first StartTag of the specified type enclosed by this segment, or null if none exists.

getFirstStartTag

public final StartTag getFirstStartTag(java.lang.String name)
Returns the first normal StartTag enclosed by this segment.

This is functionally equivalent to getAllStartTags(name).iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Specifying a null argument to the name parameter is equivalent to getFirstStartTag().

Parameters:
name - the name of the start tag to search for, may be null.
Returns:
the first normal StartTag enclosed by this segment, or null if none exists.

getFirstStartTag

public final StartTag getFirstStartTag(java.lang.String attributeName,
                                       java.lang.String value,
                                       boolean valueCaseSensitive)
Returns the first StartTag with the specified attribute name/value pair enclosed by this segment.

This is functionally equivalent to getAllStartTags(attributeName,value,valueCaseSensitive).iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
value - the value of the specified attribute to search for, must not be null.
valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
Returns:
the first StartTag with the specified attribute name/value pair enclosed by this segment, or null if none exists.
See Also:
getFirstStartTag(String attributeName, Pattern valueRegexPattern)

getFirstStartTag

public final StartTag getFirstStartTag(java.lang.String attributeName,
                                       java.util.regex.Pattern valueRegexPattern)
Returns the first StartTag with the specified attribute name and value pattern that is enclosed by this segment.

This is functionally equivalent to getAllStartTags(attributeName,valueRegexPattern).iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
valueRegexPattern - the regular expression pattern that must match the attribute value, may be null.
Returns:
the first StartTag with the specified attribute name and value pattern that is enclosed by this segment, or null if none exists.
See Also:
getFirstStartTag(String attributeName, String value, boolean valueCaseSensitive)

getFirstStartTagByClass

public final StartTag getFirstStartTagByClass(java.lang.String className)
Returns the first StartTag with the specified class that is enclosed by this segment.

This is functionally equivalent to getAllStartTagsByClass(className).iterator().next(), but does not search beyond the first start tag and returns null if no such start tag exists.

Parameters:
className - the class name (case sensitive) to search for, must not be null.
Returns:
the first StartTag with the specified class that is enclosed by this segment, or null if none exists.

getFirstElement

public final Element getFirstElement()
Returns the first Element enclosed by this segment.

This is functionally equivalent to getAllElements().iterator().next(), but does not search beyond the first enclosed element and returns null if no such element exists.

If this segment is itself an Element, this element is returned, not the first child element.

Returns:
the first Element enclosed by this segment, or null if none exists.

getFirstElement

public final Element getFirstElement(java.lang.String name)
Returns the first normal Element with the specified name enclosed by this segment.

This is functionally equivalent to getAllElements(name).iterator().next(), but does not search beyond the first enclosed element and returns null if no such element exists.

Specifying a null argument to the name parameter is equivalent to getFirstElement().

If this segment is itself an Element with the specified name, this element is returned.

Parameters:
name - the name of the element to search for.
Returns:
the first normal Element with the specified name enclosed by this segment, or null if none exists.

getFirstElement

public final Element getFirstElement(java.lang.String attributeName,
                                     java.lang.String value,
                                     boolean valueCaseSensitive)
Returns the first Element with the specified attribute name/value pair enclosed by this segment.

This is functionally equivalent to getAllElements(attributeName,value,valueCaseSensitive).iterator().next(), but does not search beyond the first enclosed element and returns null if no such element exists.

If this segment is itself an Element with the specified attribute name/value pair, this element is returned.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
value - the value of the specified attribute to search for, must not be null.
valueCaseSensitive - specifies whether the attribute value matching is case sensitive.
Returns:
the first Element with the specified attribute name/value pair enclosed by this segment, or null if none exists.
See Also:
getFirstElement(String attributeName, Pattern valueRegexPattern)

getFirstElement

public final Element getFirstElement(java.lang.String attributeName,
                                     java.util.regex.Pattern valueRegexPattern)
Returns the first Element with the specified attribute name and value pattern that is enclosed by this segment.

This is functionally equivalent to getAllElements(attributeName,valueRegexPattern).iterator().next(), but does not search beyond the first enclosed element and returns null if no such element exists.

If this segment is itself an Element with the specified attribute name and value pattern, this element is returned.

Parameters:
attributeName - the attribute name (case insensitive) to search for, must not be null.
valueRegexPattern - the regular expression pattern that must match the attribute value, may be null.
Returns:
the first Element with the specified attribute name and value pattern that is enclosed by this segment, or null if none exists.
See Also:
getFirstElement(String attributeName, String value, boolean valueCaseSensitive)

getFirstElementByClass

public final Element getFirstElementByClass(java.lang.String className)
Returns the first Element with the specified class that is enclosed by this segment.

This is functionally equivalent to getAllElementsByClass(className).iterator().next(), but does not search beyond the first enclosed element and returns null if no such element exists.

If this segment is itself an Element with the specified class, this element is returned.

Parameters:
className - the class name (case sensitive) to search for, must not be null.
Returns:
the first Element with the specified class that is enclosed by this segment, or null if none exists.

getFormControls

public java.util.List<FormControl> getFormControls()
Returns a list of the FormControl objects that are enclosed by this segment.

Returns:
a list of the FormControl objects that are enclosed by this segment.

getFormFields

public FormFields getFormFields()
Returns the FormFields object representing all form fields that are enclosed by this segment.

This is equivalent to new FormFields(getFormControls()).

Returns:
the FormFields object representing all form fields that are enclosed by this segment.
See Also:
getFormControls()

parseAttributes

public Attributes parseAttributes()
Parses any Attributes within this segment. This method is only used in the unusual situation where attributes exist outside of a start tag. The StartTag.getAttributes() method should be used in normal situations.

This is equivalent to source.parseAttributes(getBegin(),getEnd()).

Returns:
the Attributes within this segment, or null if too many errors occur while parsing.

ignoreWhenParsing

public void ignoreWhenParsing()
Causes the this segment to be ignored when parsing.

Ignored segments are treated as blank spaces by the parsing mechanism, but are included as normal text in all other functions.

This method was originally the only means of preventing server tags located inside normal tags from interfering with the parsing of the tags (such as where an attribute of a normal tag uses a server tag to dynamically set its value), as well as preventing non-server tags from being recognised inside server tags.

It is not necessary to use this method to ignore server tags located inside normal tags, as the attributes parser automatically ignores any server tags.

It is not necessary to use this method to ignore non-server tags inside server tags, or the contents of SCRIPT elements, as the parser does this automatically when performing a full sequential parse.

This leaves only very few scenarios where calling this method still provides a significant benefit.

One such case is where XML-style server tags are used inside normal tags. Here is an example using an XML-style JSP tag:

<a href="<i18n:resource path="/Portal"/>?BACK=TRUE">back</a>
The first double-quote of "/Portal" will be interpreted as the end quote for the href attribute, as there is no way for the parser to recognise the il8n:resource element as a server tag. Such use of XML-style server tags inside normal tags is generally seen as bad practice, but it is nevertheless valid JSP. The only way to ensure that this library is able to parse the normal tag surrounding it is to find these server tags first and call the ignoreWhenParsing method to ignore them before parsing the rest of the document.

It is important to understand the difference between ignoring the segment when parsing and removing the segment completely. Any text inside a segment that is ignored when parsing is treated by most functions as content, and as such is included in the output of tools such as TextExtractor and Renderer.

To remove segments completely, create an OutputDocument and call its remove(Segment) or replaceWithSpaces(int begin, int end) method for each segment. Then create a new source document using new Source(outputDocument.toString()) and perform the desired operations on this new source object.

Calling this method after the Source.fullSequentialParse() method has been called is not permitted and throws an IllegalStateException.

Any tags appearing in this segment that are found before this method is called will remain in the tag cache, and so will continue to be found by the tag search methods. If this is undesirable, the Source.clearCache() method can be called to remove them from the cache. Calling the Source.fullSequentialParse() method after this method clears the cache automatically.

For best performance, this method should be called on all segments that need to be ignored without calling any of the tag search methods in between.

See Also:
Source.ignoreWhenParsing(Collection segments)

compareTo

public int compareTo(Segment segment)
Compares this Segment object to another object.

If the argument is not a Segment, a ClassCastException is thrown.

A segment is considered to be before another segment if its begin position is earlier, or in the case that both segments begin at the same position, its end position is earlier.

Segments that begin and end at the same position are considered equal for the purposes of this comparison, even if they relate to different source documents.

Note: this class has a natural ordering that is inconsistent with equals. This means that this method may return zero in some cases where calling the equals(Object) method with the same argument returns false.

Specified by:
compareTo in interface java.lang.Comparable<Segment>
Parameters:
segment - the segment to be compared
Returns:
a negative integer, zero, or a positive integer as this segment is before, equal to, or after the specified segment.
Throws:
java.lang.ClassCastException - if the argument is not a Segment

isWhiteSpace

public final boolean isWhiteSpace()
Indicates whether this segment consists entirely of white space.

Returns:
true if this segment consists entirely of white space, otherwise false.

getMaxDepthIndicator

public int getMaxDepthIndicator()
Returns an indication of the maximum depth of nested elements within this segment.

A high return value can indicate that the segment contains a large number of incorrectly nested tags that could result in a StackOverflowException if its content is parsed.

The usefulness of this method is debatable as a StackOverflowException is a recoverable error that can be easily caught. The use of this method to pre-detect and avoid a stack overflow may save some memory and processing resources in certain circumstances, but the cost of calling this method to check every segment or document will very often exceed any benefit.

It is up to the application developer to determine what return value constitutes an unreasonable level of nesting given the stack space allocated to the application and other factors.

Note that the return value is an approximation only and is usually greater than the actual maximum element depth that would be reported by calling the Element.getDepth() method on the most nested element.

Returns:
an indication of the maximum depth of nested elements within this segment.

isWhiteSpace

public static final boolean isWhiteSpace(char ch)
Indicates whether the specified character is white space.

The HTML 4.01 specification section 9.1 specifies the following white space characters:

Despite the explicit inclusion of the zero-width space in the HTML specification, Microsoft IE6 does not recognise them as white space and renders them as an unprintable character (empty square). Even zero-width spaces included using the numeric character reference &#x200B; are rendered this way.

Parameters:
ch - the character to test.
Returns:
true if the specified character is white space, otherwise false.

getRowColumnVector

public RowColumnVector getRowColumnVector()
Returns a RowColumnVector object representing the row and column number of the start of this segment in the source document.

Returns:
a RowColumnVector object representing the row and column number of the start of this segment in the source document.
See Also:
Source.getRowColumnVector(int pos)

getDebugInfo

public java.lang.String getDebugInfo()
Returns a string representation of this object useful for debugging purposes.

Returns:
a string representation of this object useful for debugging purposes.

charAt

public char charAt(int index)
Returns the character at the specified index.

This is logically equivalent to toString().charAt(index) for valid argument values 0 <= index < length().

However because this implementation works directly on the underlying document source string, it should not be assumed that an IndexOutOfBoundsException is thrown for an invalid argument value.

Specified by:
charAt in interface java.lang.CharSequence
Parameters:
index - the index of the character.
Returns:
the character at the specified index.

subSequence

public java.lang.CharSequence subSequence(int beginIndex,
                                          int endIndex)
Returns a new character sequence that is a subsequence of this sequence.

This is logically equivalent to toString().subSequence(beginIndex,endIndex) for valid values of beginIndex and endIndex.

However because this implementation works directly on the underlying document source text, it should not be assumed that an IndexOutOfBoundsException is thrown for invalid argument values as described in the String.subSequence(int,int) method.

Specified by:
subSequence in interface java.lang.CharSequence
Parameters:
beginIndex - the begin index, inclusive.
endIndex - the end index, exclusive.
Returns:
a new character sequence that is a subsequence of this sequence.


privacy policy