RE: [xsl] split OpenOffice 1.1 documents (flat xml)

Subject: RE: [xsl] split OpenOffice 1.1 documents (flat xml)
From: cknell@xxxxxxxxxx
Date: Wed, 30 Jul 2003 11:26:02 -0400
It appears that the beginning and end of a chapter is not signified by an element, that is to say, there is no element that contains a chapter. Is that correct?

If so, how can you determine where a chapter begins and ends? If you can answer that question, you have moved a long way toward solving the problem.

It appears that you can identify the beginning of a chapter with an XPath expression along these lines: "office:document-content/office:body/text:h[@text:level="1"]. It also seems that all sibling nodes of a particular <text:h> element up to but not including the next <text:h> sibling node are part of the chapter, is that correct?
-- 
Charles Knell
cknell@xxxxxxxxxx - email



-----Original Message-----
From:     "Linnemann, Victor" <Linnemann@xxxxxxxxxxxxx>
Sent:     Wed, 30 Jul 2003 15:50:51 +0200
To:       XSL-List@xxxxxxxxxxxxxxxxxxxxxx
Subject:  [xsl] split OpenOffice 1.1 documents (flat xml)

Hello everybody,
my question is about splitting large OpenOffice 1.1 documents (the
content.xml that you will see once you unzipped the *.swx) into single
chapters for translation purposes.
It's flat xml, and because of this I already looked in the XSL-FAQ under
http://www.dpawson.co.uk/xsl/sect2/flatfile.htm "Convert a flat XML
document", but I was not able to apply the suggested solution to my problem.
Each of the splitted files has to be a valid OpenOffice document and must
contain exactly one chapter (begins with <text:h ...>bla</text:h> and ends
with the next <text:h ...>bla</text:h>).
***********************************************************
XML (sorry, very odd content):
***********************************************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD
OfficeDocument 1.0//EN" "office.dtd">
<office:document-content 
	xmlns:office="http://openoffice.org/2000/office"; 
	xmlns:style="http://openoffice.org/2000/style"; 
	xmlns:text="http://openoffice.org/2000/text"; 
	(...)
	xmlns:script="http://openoffice.org/2000/script"; office:class="text"
office:version="1.0">
<office:script/>
<office:font-decls>
	<style:font-decl style:name="Arial" fo:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
</office:font-decls>
<office:automatic-styles/>
<office:body>
	<text:sequence-decls>
		<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
		<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
		<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
		<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
	</text:sequence-decls>
	<text:h text:style-name="Heading 1" text:level="1">Kapitel
1</text:h>
	<text:p text:style-name="Standard">Dies ist mein Dokument.</text:p>
	<text:h text:style-name="Heading 1" text:level="1">Kapitel
2</text:h>
	<text:p text:style-name="Standard">Vor jedem neuen Kapitel soll
gesplittet werden.</text:p>
</office:body>
</office:document-content>
***********************************************************
desired result:
***********************************************************
The same document structure, but splitted file 1 has as it's content

	<text:h text:style-name="Heading 1" text:level="1">Chapter
1</text:h>
	<text:p text:style-name="Standard">This is my content.</text:p>

whereas splitted file 2 has as it's content

	<text:h text:style-name="Heading 1" text:level="1">Chapter
2</text:h>
	<text:p text:style-name="Standard">This is my other
content.</text:p>

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread