Subject: RE: [xsl] split OpenOffice 1.1 documents (flat xml) From: cknell@xxxxxxxxxx Date: Wed, 30 Jul 2003 11:26:02 -0400 |
It appears that the beginning and end of a chapter is not signified by an element, that is to say, there is no element that contains a chapter. Is that correct? If so, how can you determine where a chapter begins and ends? If you can answer that question, you have moved a long way toward solving the problem. It appears that you can identify the beginning of a chapter with an XPath expression along these lines: "office:document-content/office:body/text:h[@text:level="1"]. It also seems that all sibling nodes of a particular <text:h> element up to but not including the next <text:h> sibling node are part of the chapter, is that correct? -- Charles Knell cknell@xxxxxxxxxx - email -----Original Message----- From: "Linnemann, Victor" <Linnemann@xxxxxxxxxxxxx> Sent: Wed, 30 Jul 2003 15:50:51 +0200 To: XSL-List@xxxxxxxxxxxxxxxxxxxxxx Subject: [xsl] split OpenOffice 1.1 documents (flat xml) Hello everybody, my question is about splitting large OpenOffice 1.1 documents (the content.xml that you will see once you unzipped the *.swx) into single chapters for translation purposes. It's flat xml, and because of this I already looked in the XSL-FAQ under http://www.dpawson.co.uk/xsl/sect2/flatfile.htm "Convert a flat XML document", but I was not able to apply the suggested solution to my problem. Each of the splitted files has to be a valid OpenOffice document and must contain exactly one chapter (begins with <text:h ...>bla</text:h> and ends with the next <text:h ...>bla</text:h>). *********************************************************** XML (sorry, very odd content): *********************************************************** <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE office:document-content PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "office.dtd"> <office:document-content xmlns:office="http://openoffice.org/2000/office" xmlns:style="http://openoffice.org/2000/style" xmlns:text="http://openoffice.org/2000/text" (...) xmlns:script="http://openoffice.org/2000/script" office:class="text" office:version="1.0"> <office:script/> <office:font-decls> <style:font-decl style:name="Arial" fo:font-family="Arial" style:font-family-generic="swiss" style:font-pitch="variable"/> </office:font-decls> <office:automatic-styles/> <office:body> <text:sequence-decls> <text:sequence-decl text:display-outline-level="0" text:name="Illustration"/> <text:sequence-decl text:display-outline-level="0" text:name="Table"/> <text:sequence-decl text:display-outline-level="0" text:name="Text"/> <text:sequence-decl text:display-outline-level="0" text:name="Drawing"/> </text:sequence-decls> <text:h text:style-name="Heading 1" text:level="1">Kapitel 1</text:h> <text:p text:style-name="Standard">Dies ist mein Dokument.</text:p> <text:h text:style-name="Heading 1" text:level="1">Kapitel 2</text:h> <text:p text:style-name="Standard">Vor jedem neuen Kapitel soll gesplittet werden.</text:p> </office:body> </office:document-content> *********************************************************** desired result: *********************************************************** The same document structure, but splitted file 1 has as it's content <text:h text:style-name="Heading 1" text:level="1">Chapter 1</text:h> <text:p text:style-name="Standard">This is my content.</text:p> whereas splitted file 2 has as it's content <text:h text:style-name="Heading 1" text:level="1">Chapter 2</text:h> <text:p text:style-name="Standard">This is my other content.</text:p> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] split OpenOffice 1.1 document, Linnemann, Victor | Thread | [xsl] RE: dictating node processing, Nathan Shaw |
[xsl] Passing HTML to template, Karl J. Stubsjoen | Date | RE: [xsl] traversing to next Elemen, Andrew Welch |
Month |