Subject: [xsl] CALS table model - finding 'straddle' rows From: Feargal Hogan <feargal.hogan@xxxxxxxxx> Date: Wed, 26 Feb 2014 12:15:07 +0000 |
Hi all I am building an XSLT process to go through a large collection of XML files, looking for CALS model tables (most of the files contain at least one) and then store the tabular data back to a database. In the first instance, many of these tables will have dissimilar structures, but I want to use the database to analyse commonality of structure. I list below an (incomplete) extract from an example file. <table frame="none"> <tgroup cols="6" colsep="0" rowsep="0"> <colspec colname="1" colnum="1" colwidth="127pt" align="center"/> <colspec colname="2" colnum="2" colwidth="39pt" align="center"/> <colspec colname="3" colnum="3" colwidth="30pt" align="center"/> <colspec colname="4" colnum="4" colwidth="33pt" align="center"/> <colspec colname="5" colnum="5" colwidth="33pt" align="center"/> <colspec colname="6" colnum="6" colwidth="87pt"/> <thead> <row valign="bottom"> <entry align="center">Product</entry> <entry>SKU</entry> <entry>Length</entry> <entry>Depth</entry> <entry align="center">Weight</entry> <entry align="center">Remarks</entry> </row> <row valign="bottom"> <entry></entry> <entry></entry> <entry>(m)</entry> <entry>(m)</entry> <entry align="center">(kg) </entry> <entry align="center"> </entry> </row> </thead> <tbody> <row> <entry align="left" namest="1" nameend="6"><hd4>Whites</hd4></entry> </row> <row> <entry>Albion</entry> <entry>12345</entry> <entry>398</entry> <entry>15.5</entry> <entry> </entry> <entry>N/A </entry> </row> <row> <entry>Rotorua</entry> <entry>12346</entry> <entry>398</entry> <entry>15.5</entry> <entry> </entry> <entry> </entry> </row> <row> <entry>Quintep</entry> <entry>12347</entry> <entry>398</entry> <entry>15.5</entry> <entry> </entry> <entry> </entry> </row> ... Because of the dissimilar structures that I know I will encounter during the process, I am unable to create a table schema in the database that will hold all this data, other than to store each table cell as an entity with the following properties: doc_id table_id row_id col_id col_name col_units entry_value This will allow me to store both string and numeric values as strings. An additional property that is required in many instances is the 'category' as defined in the straddle row in the example above at tbody/row[1]. Here the original document creator has added a straddle to categorise the rows immediately following: <row> <entry align="left" namest="1" nameend="6"><hd4>Whites</hd4></entry> </row> These straddle rows are causing me some difficulties. Where they occur, they 'categorise' the rows following UNTIL the next straddle occurs. My initial document analysis has indicated that there are a number of possible 'types' for the table structure in relation to these straddles: Type 1. That no straddles occur in the tables - simple and easy to process Type 2. That the first row in the tbody is a straddle and that there are zero or more further straddles below this in the tbody Type 3. That the table contains straddles but NOT in the first row of the tbody Type 3 'could' be treated as 2 separate tables, one of type 1 (all the rows up to but excluding the 1st straddle row) and a Type 2 (all the rows from the 1st straddle forwards) So it seems that the key to solving this processing problem is to identify the position of the 1st straddle, treat everything (zero or more rows) before the straddle as Type1 and treat everything from the straddle forward as Type 2. But I am having some difficulty identifying the position of the 1st straddle. My definition of 1st straddle - in Xpath terms - is tbody/row[entry [@nameend > @namest]][1] This allows for the possibility that the straddle is not always keyed from column 1 and does not always extend into the last column. Both of these possibilities do exist in the real world data. There are many similar solutions listed on this page http://www.dpawson.co.uk/xsl/sect2/flatfile.html#d5010e13 But I am having difficulty applying them to my instances. Something like this may work <xsl:key name="straddles" match="row[entry[@nameend > @namest]]" use="??????"/> But I'm unsure what to use to define the @use attribute of the key? When I try to define a first-straddle variable, I don't have a defining value to pass to the key() function? <xsl:variable name="first-straddle" select="table/tgroup/tbody/row[generate-id() = generate-id(key('straddles',?????))]"/> How do I find the the location of the first straddle? What XPath statement accurately locates it? Thanks in advance Feargal
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Seemingly incorrect examp, Vladimir Nesterovsky | Thread | Re: [xsl] CALS table model - findin, Michael Kay |
Re: [xsl] Seemingly incorrect examp, Vladimir Nesterovsky | Date | Re: [xsl] CALS table model - findin, Michael Kay |
Month |