4 Using Writer2xhtml and Calc2xhtml

Writer2xhtml is producing standards compliant XHTML files, in particular it can be used to put math on the web using the XHTML + MathML combination. Thus Writer2xhtml can convert into any of these XHTML variants:

Note that the default file extension and the recommended MIME types varies with the output format:

Output format

Default file extenstion

MIME type

XHTML 1.0

.html

text/html

XHTML 1.1 + MathML 2.0

.xhtml

application/xhtml+xml

XHTML 1.1 + MathML 2.0 (with xsl transformation)

.xml

application/xml

Writer2xhtml is quite flexible; in particular with respect to the handling of formatting:

Calc2xhtml is a companion to Writer2xhtml that produces XHTML 1.0 strict from your Calc documents.

4.1 Converting to XHTML from the command line

To convert a file to XHTML use the command line

w2l -xhtml|-xhtml+mathml|-xhtml+mathml+xsl [-config <configfile>] <document to convert> [<output path and/or file name>]

The parts in square brackets are optional.

This will produce an XHTML file with the specified name. If no output file is specified, Writer2xhtml will use the same name as the original document, but a different file extension.

The option -xhtml+mathml is used to produce XHTML 1.1 + MathML 2.0, the option -xhtml+mathml+xsl produces the variant using XSL transformations.

Examples:

w2l -xhtml+mathml+xsl mydocument.sxw

or

w2l -xhtml -config myconfig.xml mydocument.sxw

If you specify the -config option, Writer2xhtml will load this configuration file before converting your document. You can read more about configuration in section 4.3.

The script w2l also provides a shorthand notation to use the sample configuration file included in writer2latex04beta2.zip. The command line is

w2l -cleanxhtml <writer document to convert> [<output path and/or file name>]

This configuration file produces a ”clean” xhtml file (see section 4.4), for example:

w2l -cleanxhtml mydocument.sxw mypath/myoutputdoc.html

It is recommended that you extend w2l / w2l.bat to support your own configuration files.

4.2 Using Writer2xhtml as an export filter

If you choose File – Export in Writer you should be able to choose XHTML 1.0 strict, XHTML 1.1 + MathML 2.0 or XHTML 1.1 + MathML 2.0 (xsl) as file type. Using Calc2xhtml as an export filter is not yet supported.

Note: You have to use the export menu because Writer2xhtml does not provide an import filter for XHTML. You should always save in the native format of OOo as well!

Note: Currently embedded graphics are not converted when Writer2xhtml is used as an export filter. Also splitting at headings/sheets only works from the command line. This is because of an issue with the xmerge framework. A fix for this is planned for a later version of Writer2xhtml.

4.3 Configuration

XHTML export can be configured with a configuration file. The configuration is read from several sources:

The configuration file is an xml file, here are the default contents:

<?xml version="1.0" encoding="UTF-8"?>

<config>

  <option name=”create_user_config” value=”true” />

  <option name="xhtml_no_doctype" value="false" />

  <option name="xhtml_custom_stylesheet" value="" />

  <option name="xhtml_ignore_styles" value="false" />

  <option name="xhtml_use_dublin_core" value="true" />

  <option name="xhtml_convert_to_px" value="true" />

  <option name="xhtml_scaling" value="100%" />

  <option name="xhtml_column_scaling" value="100%" />

  <option name="xhtml_split_level" value="0" />

  <option name="xhtml_calc_split" value="false" />

  <option name="ignore_hard_line_breaks" value="false" />

  <option name="ignore_empty_paragraphs" value="false" />

  <option name="ignore_double_spaces" value="false" />

</config>

Options

Style maps

In addition to the options, you can specify that certain styles in Writer should be mapped to specific XHTML elements and CSS style classes. Here are some examples showing how to use some of the built-in Writer styles to create XHTML elements:

<?xml version="1.0" encoding="UTF-8"?>

<config>

  <!-- map OOo paragraph styles to xhtml elements -->

  <xhtml-style-map name="Text body" class="paragraph"   

           element="p" css="(none)" />  

  <xhtml-style-map name="Sender" class="paragraph"

           element="address" css="(none)" />

  <xhtml-style-map name="Quotations" class="paragraph"

           block-element="blockquote" block-css="(none)"

           element="p" css="(none)" />

 

  <!-- map OOo text styles to xhtml elements -->

  <xhtml-style-map name="Citation" class="text"

           element="cite" css="(none)" />

  <xhtml-style-map name="Emphasis" class="text"

           element="em" css="(none)" />

  

  <!-- map hard formatting attributes to xhtml elements -->

  <xhtml-style-map name="bold" class="attribute"

           element="b" css="(none)" />

  <xhtml-style-map name="italics" class="attribute"

           element="i" css="(none)" />

</config>

An extended version of this is distributed with Writer2LaTeX, please see the file cleanxhtml.xml.

The attributes of the xhtml-style-map element are used as follows:

For example the rules above produces code like this:

<p>This paragraph is Text body</p>

<address>This paragraph is Sender</address>

<blockquote>

  <p>This paragraph is Quotations</p>

  <p>This paragraph is also Quotations</p>

</blockquote>

<p>This paragraph is also Text body and has some <em>text with emphasis style</em> and uses some <b>hard formatting</b>.</p>

You can use your own Writer styles together with your own CSS style sheet to create further style mappings, for example:

<xhtml-style-map name="Some OOo style" class="paragraph"

           block-element="div" block-css="block_style"

           element="p" css="par_style" />

to produce output like this:

<div class=”block_style”>

  <p class=”par_style”>Paragraph with Some OOo style</p>

  <p class=”par_style”>Yet another</p>

</div>

Note that the rules for hard formatting are only used when xhtml_ignore_styles is set to true. It is not recommended to rely on these rules, using real text styles is preferable. They are included because the use of hard character formatting is very common even in otherwise well-structured documents.

4.4 Using OpenOffice.org to create XHTML documents

The configuration file cleanxhtml.xml that is distributed with Writer2LaTeX, can be used to create semantically rich XHTML content, which can be formatted with your own stylesheet (you should edit the file to add the URL to the stylesheet you want to use).

A subset of the built-in styles in Writer are mapped to XHTML elements (note that the style names are localized, so this is for the english version of OpenOffice.org):

OOo Writer style

OOo Writer class

XHTML element

Text body

paragraph style

p

Sender

paragraph style

address

Quotations

paragraph style

blockquote

Preformatted Text

paragraph style

pre

List Heading

paragraph style

dt (in dl)

List Contents

paragraph style

dd (in dl)

Horizontal Rule

paragraph style

hr

Citation

text style

cite

Definition

text style

dfn

Emphasis

text style

em

Example

text style

samp

Source Text

text style

code

Strong Emphasis

text style

strong

Teletype

text style

tt

User entry

text style

kbd

Variable

text style

var

bold

hard formatting attribute

b

italics

hard formatting attribute

i

fixed pitch font

hard formatting attribute

tt

superscript

hard formatting attribute

sup

subscript

hard formatting attribute

sub

So by using these styles only, you will create well-structured XHTML documents. See the document sample-xhtml.sxw for an example of how to use this.

Warning: Some elements are not allowed inside pre, so this might in some cases lead to invalid documents. This will be fixed in a later version of Writer2xhtml.