public final class SourceFormatter extends java.lang.Object implements CharStreamSource
Any indentation present in the original source text is removed.
Use one of the following methods to obtain the output:
The output text is functionally equivalent to the original source and should be rendered identically unless specified below.
The following points describe the process in general terms. Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.
IndentString
property,
where n is the depth of the indentation.
PRE
and TEXTAREA
are not indented,
nor is the white space modified in any way.
SCRIPT
elements is preserved,
but with the indentation of new lines starting at a depth one greater than that of the SCRIPT
element.
TidyTags
property is set to true
,
every tag in the document is replaced with the output from its Tag.tidy()
method.
If this property is set to false
, the tag from the original text is used, including all white space,
but with any new lines indented at a depth one greater than that of the element.
CollapseWhiteSpace
property
is set to true
, every string of one or more white space characters
located outside of a tag is replaced with a single space in the output.
White space located adjacent to a non-inline-level element tag (except server tags) may be removed.
IndentAllElements
property
is set to true
, every element appears indented on a new line, including inline-level elements.
This generates output that is a good representation of the actual document element hierarchy,
but is very likely to introduce white space that compromises the functional equivalency of the document.
NewLine
property specifies the character sequence
to use for each newline in the output document.
Formatting an entire Source
object performs a full sequential parse automatically.
Constructor and Description |
---|
SourceFormatter(Segment segment)
Constructs a new
SourceFormatter based on the specified Segment . |
Modifier and Type | Method and Description |
---|---|
void |
appendTo(java.lang.Appendable appendable)
Appends the output to the specified
Appendable object. |
boolean |
getCollapseWhiteSpace()
Indicates whether white space in the text between the tags is to be collapsed.
|
long |
getEstimatedMaximumOutputLength()
Returns the estimated maximum number of characters in the output, or
-1 if no estimate is available. |
boolean |
getIndentAllElements()
Indicates whether all elements are to be indented, including inline-level elements and those with preformatted contents.
|
java.lang.String |
getIndentString()
Returns the string to be used for indentation.
|
java.lang.String |
getNewLine()
Returns the string to be used to represent a newline in the output.
|
boolean |
getTidyTags()
Indicates whether the original text of each tag is to be replaced with the output from its
Tag.tidy() method. |
SourceFormatter |
setCollapseWhiteSpace(boolean collapseWhiteSpace)
Sets whether white space in the text between the tags is to be collapsed.
|
SourceFormatter |
setIndentAllElements(boolean indentAllElements)
Sets whether all elements are to be indented, including inline-level elements and those with preformatted contents.
|
SourceFormatter |
setIndentString(java.lang.String indentString)
Sets the string to be used for indentation.
|
SourceFormatter |
setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.
|
SourceFormatter |
setTidyTags(boolean tidyTags)
Sets whether the original text of each tag is to be replaced with the output from its
Tag.tidy() method. |
java.lang.String |
toString()
Returns the output as a string.
|
void |
writeTo(java.io.Writer writer)
Writes the output to the specified
Writer . |
public SourceFormatter(Segment segment)
SourceFormatter
based on the specified Segment
.segment
- the segment containing the HTML to be formatted.Source.getSourceFormatter()
public void writeTo(java.io.Writer writer) throws java.io.IOException
CharStreamSource
Writer
.writeTo
in interface CharStreamSource
writer
- the destination java.io.Writer
for the output.java.io.IOException
- if an I/O exception occurs.public void appendTo(java.lang.Appendable appendable) throws java.io.IOException
CharStreamSource
Appendable
object.appendTo
in interface CharStreamSource
appendable
- the destination java.lang.Appendable
object for the output.java.io.IOException
- if an I/O exception occurs.public long getEstimatedMaximumOutputLength()
CharStreamSource
-1
if no estimate is available.
The returned value should be used as a guide for efficiency purposes only, for example to set an initial StringBuilder
capacity.
There is no guarantee that the length of the output is indeed less than this value,
as classes implementing this method often use assumptions based on typical usage to calculate the estimate.
Although implementations of this method should never return a value less than -1, users of this method must not assume that this will always be the case. Standard practice is to interpret any negative value as meaning that no estimate is available.
getEstimatedMaximumOutputLength
in interface CharStreamSource
-1
if no estimate is available.public java.lang.String toString()
CharStreamSource
toString
in interface CharStreamSource
toString
in class java.lang.Object
public SourceFormatter setIndentString(java.lang.String indentString)
The default value is a string containing a single tab character (U+0009).
The most commonly used indent strings are "\t"
(single tab), " "
(single space), " "
(2 spaces), and " "
(4 spaces).
indentString
- the string to be used for indentation, must not be null
.SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement.getIndentString()
public java.lang.String getIndentString()
See the setIndentString(String)
method for a full description of this property.
public SourceFormatter setTidyTags(boolean tidyTags)
Tag.tidy()
method.
The default value is false
.
If this property is set to false
, the tag from the original text is used, including all white space,
but with any new lines indented at a depth one greater than that of the element.
tidyTags
- specifies whether the original text of each tag is to be replaced with the output from its Tag.tidy()
method.SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement.getTidyTags()
public boolean getTidyTags()
Tag.tidy()
method.
See the setTidyTags(boolean)
method for a full description of this property.
true
if the original text of each tag is to be replaced with the output from its Tag.tidy()
method, otherwise false
.public SourceFormatter setCollapseWhiteSpace(boolean collapseWhiteSpace)
The default value is false
.
If this property is set to true
, every string of one or more white space characters
located outside of a tag is replaced with a single space in the output.
White space located adjacent to a non-inline-level element tag (except server tags) may be removed.
collapseWhiteSpace
- specifies whether white space in the text between the tags is to be collapsed.SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement.getCollapseWhiteSpace()
public boolean getCollapseWhiteSpace()
See the setCollapseWhiteSpace(boolean collapseWhiteSpace)
method for a full description of this property.
true
if white space in the text between the tags is to be collapsed, otherwise false
.public SourceFormatter setIndentAllElements(boolean indentAllElements)
The default value is false
.
If this property is set to true
, every element appears indented on a new line, including
inline-level elements.
This generates output that is a good representation of the actual document element hierarchy, but is very likely to introduce white space that compromises the functional equivalency of the document.
indentAllElements
- specifies whether all elements are to be indented.SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement.getIndentAllElements()
public boolean getIndentAllElements()
See the setIndentAllElements(boolean)
method for a full description of this property.
true
if all elements are to be indented, otherwise false
.public SourceFormatter setNewLine(java.lang.String newLine)
The default is to use the same new line string as is used in the source document, which is determined via the Source.getNewLine()
method.
If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document,
or using the value from the static Config.NewLine
property.
Specifying a null
argument resets the property to its default value, which is to use the same new line string as is used in the source document.
newLine
- the string to be used to represent a newline in the output, may be null
.SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement.getNewLine()
public java.lang.String getNewLine()
See the setNewLine(String)
method for a full description of this property.