Subject: Re: [xsl] Generic stylesheet to flatten XML hierarchy From: Sara Mitchell <samitchell6@xxxxxxxxx> Date: Mon, 7 Dec 2009 10:49:01 -0800 (PST) |
I know that this may not work in every case. Basically the rules are: * every attribute on an element becomes a column in a row * every element that has data content becomes a column in a row * repeating elements define a row -- with the further restriction that if there are hierarchical levels of repeating elements (nested), the final lowest level of repeating elements defines a row and ancestor levels get repeated * hierarchical relationships get flattened * siblings at any level that don't repeat get repeated in each row I'm going to try one last possible solution using keys and XPath, I think, and if that does not work I may move on to Michael Kay's suggestion of a meta-stylesheet. Thanks to everyone for the ideas. --- On Fri, 12/4/09, C. M. Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx> wrote: > From: C. M. Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx> > Subject: Re: [xsl] Generic stylesheet to flatten XML hierarchy > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx > Cc: "C. M. Sperberg-McQueen" <cmsmcq@xxxxxxxxxxxxxxxxx> > Date: Friday, December 4, 2009, 6:35 PM > On 4 Dec 2009, at 12:37 , Sara > Mitchell wrote: > > > ... > > > > With input like this: > > <rss ...some attributes> > > ... > > </rss> > > > > I would like XML output like this: > > > > <root> > > <row> > > <rss-attr1>value</rss-attr1> > > ... > > </row> > > <row>...again rss attributes, channel > attributes, non-repeating children of channel followed by > fields for second item </row> > > ...more rows ... > > </root> > > I'm having trouble seeing exactly what should be going on > here, > because I can't see anything in your sample input (elided > here > without loss of generality) that gives rise to the name > 'rss-attr1'. It's hard to correlate input with output > if > all the values are spelled 'value' and some details in one > half of the input / output pair correspond to ellipses in > the > other. > > > > > > > This example is for a single level of repeating > descendants, but my solution has to be able to handle any > level of repeating descendants. More over, the stylesheet > has no knowledge of the structure of the input document. > > My very strong gut reaction here is to suspect that such > an > absolutely generic transformation is unlikely to produce > helpful > (or: meaningful) output in some unknown but possibly large > percentage of cases. > > Perhaps the transformation you have in mind is intended to > work generically on all XML documents that follow certain > conventions in structuring the information they represent? > Can you say what those conventions are? > > Perhaps you have a very clear understanding of the > transform you > want, but so far this discussion has not elicited a clear > description from you. The following questions are > intended to > try to elicit some more clarity. > > In a generic XML document, there are elements with > parents, > left and right siblings, children, descendants, and > attributes. > > In a generic table, there are rows and columns. Each > row but > the first or last has a predecessor and a successor, and > ditto > each column but the first or last. > > What is the relationship between the elements, attributes, > containment and sibling relations in the input, and the > rows and columns and their sequence relations in the > output? > > Given your output table, should I expect to have all the > information present in the XML? Can I recreate the > XML from > your table? > > Do all your rows have the same number of columns? (I > suppose > they must, or it's not much of a table, but perhaps I'd > better check?) > > When does an XML document give rise to a single row in the > output > table? When does it give rise to exactly three > rows? When > does the resulting table have exactly one column? > > What information do the labels of columns convey? > > What tables would you want to produce for the documents > > (1) <e/> > (2) <e><e n="23"/><e > n="45">Pax</e></e> > (3) <table> > <row a="1" b="2" > c="34">998</row> > <row a="2" b="22" > c="34">999</row> > <row a="3" b="2" > c="3">1000</row> > <row a="4" b="24" > c="">1001</row> > <row a="5" x="Viva Villa!" > c="34">998</row> > </table> > (4) <p>This isn't mixed content, because the schema > says I'm a string.</p> > > ? > > > > > > I have a solution that works ok by traversing the > input document in doc order -- but it does not handle the > siblings of repeating nodes that are not themselves > repeating. > > > > I have thought of doing this the opposite way, get a > key of all repeating nodes and process only those at the > lowest depth to generate rows. I haven't actually > written the logic. > > I gather that the tables you want to generate have > something > to do with multiple occurrences of elements with the same > name. > Does adjacency matter, or would > > > <a><b/><b/><b/><c/><c/><c/></a> > > be treated differently from > > > <a><b/><c/><b/><c/><b/><c/></a> > > ? (Assume if you like, for purposes of discussion, > that the b and c > and a elements all have interesting attributes.) > > > > > Any better ideas would be welcome. > > Your example reminds me of the contortions I've seen > people > go to trying to represent structured information in RFC > 822 > attribute-value pairs. So the best idea I have at the > moment > is: Save yourself! Don't do it! > > But probably you know exactly what you're doing, there is a > perfectly > reasonable algorithm for what you want, and I just haven't > understood. > > hth > > --**************************************************************** > * C. M. Sperberg-McQueen, Black Mesa Technologies LLC > * http://www.blackmesatech.com > * http://cmsmcq.com/mib > * http://balisage.net > **************************************************************** > > > > > > --~------------------------------------------------------------------ > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list > To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/ > or e-mail: <mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx> > --~--
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] Generic stylesheet to fla, Sara Mitchell | Thread | [xsl] General trick for re-applying, Ben Stover |
Re: [xsl] database and XSL, a kusa | Date | Re: [xsl] database and XSL, Ganesh Babu N |
Month |