We use as a base the DocBook export available in OpenOffice.org. It appears the result can be catastrophic without a little document cleaning. So this is the process we crafted:
- Open the original document in OOo
- If it's very big, copy the beginning of the document, just to keep a few dozen significant pages (in terms of images, tables, sections structure, etc.)
- Copy the document into a new, blank OOo text document, and save under a different name.
- Now comes the boring part, you will have to try and fix the doc structure and styles:
- Make sure the titles are correctly styled (title1, title2, etc.)
- Make sure those styles correspond to the correct level (in style configuration window) and have no numbering associated
- Make sure the chapters numbering (tools menu) configuration actually corresponds to the title styles used.
- Once this is done make sure the default styles are applied to all the document by selecting all content (Ctrl+A) and right click -> Default Formatting. Remember the steps you have been taking on this sample document to fix it.
- Save to DocBook and check the document contains the content and structure you expect. If not go back to step 4.
- Once you are satisfied with the sample doc, apply all steps (from step 3) to the real document, and save it to DocBook.
- Remove all
"anchor" elements: they proved to make fop fail. - Check the value of the "cols" attribute of all tables. It must be equal to the maximum number of cells in a row. OOo writes wrong decimal values.
- Process the document through the db4-upgrade.xsl stylesheet to get a DocBook 5 document.
- Process the resulting DocBook 5 document.through an XSL that automatically make modules (xi:include) out of it. We will provide one in a future article.
Limitations: Images files are not processed by OOo, though they can be recovered by unzipping the .odt. There should be a method to automatically reference the image file in the XML, it might be studied in a future article.
Feedback: Please comment with your success/failures and your tricks to fix the output.
0 commentaires:
Enregistrer un commentaire