[xsl] Disjunctive reasoning in XSLT: controlling presence of subtrees

Subject: [xsl] Disjunctive reasoning in XSLT: controlling presence of subtrees
From: "Kenneth Bowen" <kbowen@xxxxxxxxxxxxxx>
Date: Fri, 20 Apr 2007 13:44:57 -0400
Hi!

I'm looking for some advice or pointers on how to handle disjunctive
reasoning in XLST.
I'm using XSLT 2.0 with saxon.

The context is an XML --> XML transform, where the source is a flat
representation of (insurance) data extracted from spreadsheets.  The
data is processed a row at a time (each row represents one policy), and
so the source XML looks like:

<col1>val1</col1>
<col2>val2</col2>
<col3>val3</col3>
<col4>val4</col4>
<col5>val5</col5>
.... etc. ...

The tags are the column heads and the values are the column entries from
a single row of the spreadsheet.  The target is an insurance industry
ACORD standard representation of the policy.  This is defined by a
mapping taking the column headers into XPaths.  A small subset of the
mapping looks like
this:

dr1_fname  -->
PersAutoLineBusiness.PersDriver[id=<DriverID>].GeneralPartyInfo.NameInfo
.PersonName.GivenName
dr1_middle  -->
PersAutoLineBusiness.PersDriver[id=<DriverID>].GeneralPartyInfo.NameInfo
.PersonName.OtherGivenName
dr1_lname  -->
PersAutoLineBusiness.PersDriver[id=<DriverID>].GeneralPartyInfo.NameInfo
.PersonName.Surname
dr1_sex  -->
PersAutoLineBusiness.PersDriver[id=<DriverID>].DriverInfo.PersonInfo.Gen
derCd

There are a little less than 200 entries in this mapping table (and it's
a moving target).  So I take this table as input to a (Java) routine
which generates an XSLT transform (really a collection of sheets with
includes) and then the row data is passed through this transform a row
at a time.

This works fine.  The difficulty arises from the fact that the input
data (a given row) can be quite variable:  An auto policy might have one
driver, or five, might cover one vehicle or 7, there might be no prior
accidents or violations, or several, etc., etc.  The naive transform
that I currently generate takes an expansive approach, setting up XML
subtrees for most everything that might possibly be coming through in
the data.  This has the effect of leaving essentially empty subtrees
throughout the target ACORD tree, which is not acceptable.  I currently
perform a post-process tree walk to clean up.  However, I'd like to
understand if it is possible to more intelligently generate an XSLT
transform which can omit various subtrees when the data for them is not
present.

One simple version of the problem might be phrased this way.  Imagine
that the output XML tree has three subtrees:  T1, T2, and T3, and that
T2,T3 are subtrees of T1.  From the simple (left-side) input at the
beginning of this email,

    val1,val5 go into T1, but lie outside of T2,T3;
    val2,val4 go into T2;
    val3 goes into T3

Moreover:
    if val3 is absent, the entire T3 should not appear;
    if either val2 or val4 is present, T2 should appear, but not
otherwise;
    if all of val1...val5 are absent, the entire T1 should not appear;
      even if val1,val5 are absent, if T2 or T3 should appear, then T1
is built, etc.

The ugliness arises from the need to govern every subtree by conditional
tests for all of the data which might occur in the subtree:  We must
test for val1...val5 above T1, for val2,val4 above T2, etc.  And it
isn't simply a matter of removing empty aggregates.  A subtree (e.g.,
T3) might incorporate only a single piece of incoming data (val3), but
the subtree (T3) might have several bits of structure apart from the
purely val3 part.

Of course, I'm hoping that there is some clever bit of XSLT trickery
that I've missed (but I'm not optimistic).  I'll welcome any thoughts
you have on this problem.
Thanks very much in advance.

Ken Bowen

Current Thread