4 Using Writer2xhtml and Calc2xhtml
Writer2xhtml is producing standards compliant XHTML files, in particular it can be used to put math on the web using the XHTML + MathML combination. Thus Writer2xhtml can convert into any of these XHTML variants:
-
XHTML 1.0 strict, which follows the guidelines for HTML compatibility, so that the output should be viewable with any browser that supports HTML 4.
-
XHTML 1.1 + MathML 2.0, which currently is viewable with the Mozilla and Amaya browsers only.
-
XHTML 1.1 + MathML 2.0 using XSL transformations from the W3C Math Working Group to make the file viewable also in some browsers that needs a plugin to display MathML, eg. Internet Explorer with MathPlayer plugin.
This is how W3C's Math Working Group recommends to put ”math on the web”.
Note that the default file extension and the recommended MIME types varies with the output format:
Output format |
Default file extenstion |
MIME type |
XHTML 1.0 |
.html |
text/html |
XHTML 1.1 + MathML 2.0 |
.xhtml |
application/xhtml+xml |
XHTML 1.1 + MathML 2.0 (with xsl transformation) |
.xml |
application/xml |
Writer2xhtml is quite flexible; in particular with respect to the handling of formatting:
-
You can let Writer2xhtml convert the style information in the source document and thus get an xhtml document that has the same general appearance as the original, but with an online look and feel.
-
You can use your own style sheet and let Writer2xhtml convert the content only. You can map styles in OOo to xhtml elements and css classes, see sections 4.3 and 4.4
Calc2xhtml is a companion to Writer2xhtml that produces XHTML 1.0 strict from your Calc documents.
4.1 Converting to XHTML from the command line
To convert a file to XHTML use the command line
w2l -xhtml|-xhtml+mathml|-xhtml+mathml+xsl [-config <configfile>] <document to convert> [<output path and/or file name>]
The parts in square brackets are optional.
This will produce an XHTML file with the specified name. If no output file is specified, Writer2xhtml will use the same name as the original document, but a different file extension.
The option -xhtml+mathml is used to produce XHTML 1.1 + MathML 2.0, the option -xhtml+mathml+xsl produces the variant using XSL transformations.
Examples:
w2l -xhtml+mathml+xsl mydocument.sxw
or
w2l -xhtml -config myconfig.xml mydocument.sxw
If you specify the -config option, Writer2xhtml will load this configuration file before converting your document. You can read more about configuration in section 4.3.
The script w2l also provides a shorthand notation to use the sample configuration file included in writer2latex04beta2.zip. The command line is
w2l -cleanxhtml <writer document to convert> [<output path and/or file name>]
This configuration file produces a ”clean” xhtml file (see section 4.4), for example:
w2l -cleanxhtml mydocument.sxw mypath/myoutputdoc.html
It is recommended that you extend w2l / w2l.bat to support your own configuration files.
4.2 Using Writer2xhtml as an export filter
If you choose File – Export in Writer you should be able to choose XHTML 1.0 strict, XHTML 1.1 + MathML 2.0 or XHTML 1.1 + MathML 2.0 (xsl) as file type. Using Calc2xhtml as an export filter is not yet supported.
Note: You have to use the export menu because Writer2xhtml does not provide an import filter for XHTML. You should always save in the native format of OOo as well!
Note: Currently embedded graphics are not converted when Writer2xhtml is used as an export filter. Also splitting at headings/sheets only works from the command line. This is because of an issue with the xmerge framework. A fix for this is planned for a later version of Writer2xhtml.
4.3 Configuration
XHTML export can be configured with a configuration file. The configuration is read from several sources:
-
First Writer2xhtml/Calc2xhtml reads the file writer2latex.xml in the same directory as writer2latex.jar. This file is supposed to contain installation-wide configuration.
-
Then it reads the file writer2latex.xml in your home directory (unix, eg. /home/username) or user profile (windows, eg. c:\documents and settings\username). This file is supposed to contain user-specific configuration. The installation-wide configuration may specify, that this file should be generated automatically.
-
Finally the configuration file you specify on the command line is read.
The configuration file is an xml file, here are the default contents:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<option name=”create_user_config” value=”true” />
<option name="xhtml_no_doctype" value="false" />
<option name="xhtml_custom_stylesheet" value="" />
<option name="xhtml_ignore_styles" value="false" />
<option name="xhtml_use_dublin_core" value="true" />
<option name="xhtml_convert_to_px" value="true" />
<option name="xhtml_scaling" value="100%" />
<option name="xhtml_column_scaling" value="100%" />
<option name="xhtml_split_level" value="0" />
<option name="xhtml_calc_split" value="false" />
<option name="ignore_hard_line_breaks" value="false" />
<option name="ignore_empty_paragraphs" value="false" />
<option name="ignore_double_spaces" value="false" />
</config>
Options
-
If the option create_user_config if set to true, the user specific configuration file mentioned above will be created if it does not exist.
-
The option xhtml_no_doctype can have the values true or false (default). When this option is true, Writer2xhtml will not include the !DOCTYPE declaration in the converted document. The !DOCTYPE is required for a valid xhtml document; this option should only be used if you need to process the document further.
-
The option xhtml_custom_stylesheet is used to specify an URL to your own, external stylesheet. If the value is empty or the option is not specified, no external stylesheet will be used.
-
The option xhtml_ignore_styles is used to specify if formatting should be exported. If the value is true, no style information will be exported (in this case you should specify a custom style sheet!).
-
The option xhtml_use_dublin_core is used to specify if Dublin Core Meta data should be exported (the format will be as specified in http://dublincore.org/documents/dcq-html/). If the value is false, it will not be exported.
-
The option xhtml_convert_to_px can have the values true (default) or false. When this option is true, Writer2xhtml will convert all units to px, otherwise the original units are used. The resolution is assumed to be 96ppi, you can change this with the xhtml_scaling option. Eg. a scaling 75% will change the resolution to 72ppi.
-
The option xhtml_scaling is used to specify a scaling of all formatting, ie. to get a different text size than the original document. The value must be a percentage.
-
The option xhtml_column_scaling is used to specify an additional scaling for table colums. The value must be a percentage.
-
The option xhtml_split_level is used to specify that the Writer documents should be split in several documents and the outline level at which the splitting should happen (the default 0 means no split). This is convenient for long documents. Each output document will get a simple navigation panel in the header and the footer.
-
The option xhtml_calc_split is used to specify that the Calc documents should be split in several documents, one for each sheet. This is convenient for large spreadsheets. Each output document will get a simple navigation panel in the header and the footer.
-
The option ignore_double_spaces can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore double spaces, otherwise they are converted to non-breaking spaces.
-
The option ignore_empty_paragraphs can have the values true (default) or false. Setting the option to true will instruct Writer2xhtml to ignore empty paragraphs..
-
The option ignore_hard_line_breaks can have the values true or false (default). Setting the option to true will instruct Writer2xhtml to ignore hard line breaks (shift-Enter).
Style maps
In addition to the options, you can specify that certain styles in Writer should be mapped to specific XHTML elements and CSS style classes. Here are some examples showing how to use some of the built-in Writer styles to create XHTML elements:
<?xml version="1.0" encoding="UTF-8"?>
<config>
<!-- map OOo paragraph styles to xhtml elements -->
<xhtml-style-map name="Text body" class="paragraph"
element="p" css="(none)" />
<xhtml-style-map name="Sender" class="paragraph"
element="address" css="(none)" />
<xhtml-style-map name="Quotations" class="paragraph"
block-element="blockquote" block-css="(none)"
element="p" css="(none)" />
<!-- map OOo text styles to xhtml elements -->
<xhtml-style-map name="Citation" class="text"
element="cite" css="(none)" />
<xhtml-style-map name="Emphasis" class="text"
element="em" css="(none)" />
<!-- map hard formatting attributes to xhtml elements -->
<xhtml-style-map name="bold" class="attribute"
element="b" css="(none)" />
<xhtml-style-map name="italics" class="attribute"
element="i" css="(none)" />
</config>
An extended version of this is distributed with Writer2LaTeX, please see the file cleanxhtml.xml.
The attributes of the xhtml-style-map element are used as follows:
-
name specifies the name of the Writer style.
-
class specifies the styles class in Writer; this can either be text, paragraph, frame, list or attribute. The last value does not specify a real style, but refers to hard formatting attributes. The possible names in this case are bold, italics, fixed (for fixed pitch fonts), superscript and subscript.
-
element specifies the XHTML element to use when converting this style. This is not used for frame and list styles.
-
css specifies the CSS style class to use when converting this style. If it is not specified or the value is “(none)”, no CSS class will be used.
-
block-element only has effect for paragraph styles. It is used to specify a block XHTML element, that should surround several exported paragraphs with this style.
-
block-css specifies the CSS style class to be used for this block element. If it is not specified or the value is “(none)”, no CSS class will be used.
For example the rules above produces code like this:
<p>This paragraph is Text body</p>
<address>This paragraph is Sender</address>
<blockquote>
<p>This paragraph is Quotations</p>
<p>This paragraph is also Quotations</p>
</blockquote>
<p>This paragraph is also Text body and has some <em>text with emphasis style</em> and uses some <b>hard formatting</b>.</p>
You can use your own Writer styles together with your own CSS style sheet to create further style mappings, for example:
<xhtml-style-map name="Some OOo style" class="paragraph"
block-element="div" block-css="block_style"
element="p" css="par_style" />
to produce output like this:
<div class=”block_style”>
<p class=”par_style”>Paragraph with Some OOo style</p>
<p class=”par_style”>Yet another</p>
</div>
Note that the rules for hard formatting are only used when xhtml_ignore_styles is set to true. It is not recommended to rely on these rules, using real text styles is preferable. They are included because the use of hard character formatting is very common even in otherwise well-structured documents.
4.4 Using OpenOffice.org to create XHTML documents
The configuration file cleanxhtml.xml that is distributed with Writer2LaTeX, can be used to create semantically rich XHTML content, which can be formatted with your own stylesheet (you should edit the file to add the URL to the stylesheet you want to use).
A subset of the built-in styles in Writer are mapped to XHTML elements (note that the style names are localized, so this is for the english version of OpenOffice.org):
OOo Writer style |
OOo Writer class |
XHTML element |
Text body |
paragraph style |
p |
Sender |
paragraph style |
address |
Quotations |
paragraph style |
blockquote |
Preformatted Text |
paragraph style |
pre |
List Heading |
paragraph style |
dt (in dl) |
List Contents |
paragraph style |
dd (in dl) |
Horizontal Rule |
paragraph style |
hr |
Citation |
text style |
cite |
Definition |
text style |
dfn |
Emphasis |
text style |
em |
Example |
text style |
samp |
Source Text |
text style |
code |
Strong Emphasis |
text style |
strong |
Teletype |
text style |
tt |
User entry |
text style |
kbd |
Variable |
text style |
var |
bold |
hard formatting attribute |
b |
italics |
hard formatting attribute |
i |
fixed pitch font |
hard formatting attribute |
tt |
superscript |
hard formatting attribute |
sup |
subscript |
hard formatting attribute |
sub |
So by using these styles only, you will create well-structured XHTML documents. See the document sample-xhtml.sxw for an example of how to use this.
Warning: Some elements are not allowed inside pre, so this might in some cases lead to invalid documents. This will be fixed in a later version of Writer2xhtml.