[xsl] XSL flat genealogical structure to hierarchical parent-child list

Subject: [xsl] XSL flat genealogical structure to hierarchical parent-child list
From: "Vadim Verenich" <vadimverenich@xxxxxxxxx>
Date: Mon, 25 Aug 2008 00:40:41 +0300
Dear XSLT Experts,
I am having problems with converting flat structured XML file into
hierarchically nested XML.
Last month i read Chapter 19 from Michael Kay's book (it deals with
conversion of unparsed GEDCOM text file into XML structure) and was
very impressed.
Since then i have converted all my GEDCOM files into GedcomXML format;
however some bits of genealogical data in my digital archive are
organized into
more classical text format rather than commonly accepted Gedcom format.
I will use a part of Paul Thereof's Hohenzollern genealogy scheme to
illustrate how does this format looks like:
The text format is as follwing:
//sampe
1.Johann der Alchemist, renounced his rights of succession (1406-146);
m.1412 Pss Barbara of
Saxe-Wittenberg (1405-1465)
1.1.Rudolf, b.and d.1424
1.2.Barbara (1423-1481); m.1433 Luigi III Gonzaga, Margrave of Mantua (d.1478)
1.3.Elisabeth (1425-after 13 Jan 1465); m.1st 1437 Duke Joachim of
Pomerania (d.1451); m.2d
1453 Duke Wratislaw X of Pomerania (d.1478)
1.4.Dorothea (1430-1495); m.1st 1445 King Christof III of Denmark
(d.1448); m.2d 1449 King
Christian I of Denmark (d.1481)
2.Friedrich II, Elector of Brandenburg (1413-1471); m.1441 Pss
Katharina of Saxony (1421-
1476)
2.1.Johann (1452-1454)
2.2.Erasmus, b.after 1452, d.1464/5
2.3.Dorothea (1446-1519); m.1464 Duke Johann V of Saxe-Lauenburg (d1507)
2.4.Margarete, d.1489; m.ca 1477 Duke Bogislaw X of Pomerania (d.1523)
3.Albrecht Achilles, Elector of Brandenburg (1414-1486); he laid down
the family rule rare among
German families, the key to its future success, that Brandenburg would
never be divided, but
always inherited by the eldest son, and that the territories of
Ansbach and Bayreuth could be
given to younger sons, but not further subdivided; he m.1st 1446
Mgvine Margarete of Baden
(d.1457); m.2d 1458 Pss Anna of Saxony (1437-1512)
//
The analysis of data structure:
As you can see, the structural organization of data is looking like
something akin to two dimensional flat array (with 2 axis: vertical
and horizontal).
The family relations are represented along a horziontal axis,
meanwhile the vertical axis declares parent-child relationships in
string forms 1.n., where string "."
is used as a delimiter between two proceeding generations, and n is a
classifier of an individual within any given generation.
For example: an individual with classifier 1.1. is a child of 1 etc.
I converted this plain text format into CSV data and then used Altova
MapForce to map some fields  of this structure against Rob Mckinnon's
Genview XSD scheme and was given a following result:
<?xml version="1.0" encoding="UTF-8"?>
<genview>
 <individual id="@I1@">
  <name first="Johann der Alchemist" />
 </individual>
 <family id="@F1@">
  <father ref="@I1@" />
  <child />
 </family>
 <individual id="@I11@">
  <name first="Rudolf" />
 </individual>
 <family id="@F2@">
  <father ref="@I11@" />
  <child />
                <mother/>
 </family>
 <individual id="@I12@">
  <name first=" Barbara " />
 </individual>
 <family id="@F3@">
  <father />
  <child />
  <mother ref="@I12@"/>
 </family>
 <individual id="@I13@">
  <name first=" Elisabeth " />
 </individual>
 <family id="@F4@">
  <father />
  <child />
  <mother ref="@I13@"/>
 </family>
 <individual id="@I14@" >
  <name first="Dorothea" />
 </individual>
 <family id="@F5@">
  <father/>
  <child />
  <mother ref="@I14@"/>
 </family>
 <individual id="@I2@" >
  <name first="Friedrich II" />
 </individual>
 <family id="@F6@">
  <father ref="@I2@" />
  <child />
  <mother />
 </family>
 <individual id="@I21@">
  <name first="Johann" />
 </individual>
 <family id="@F7@">
  <father ref="@I21@" />
  <child />
  <mother />
 </family>
 <individual id="@I22@">
  <name first="Erasmus" />
 </individual>
 <family id="@F8@">
  <father ref="@I22@" />
  <child />
  <mother />
 </family>
 <individual id="@I23@">
  <name first="Dorothea" />
 </individual>
 <family id="@F9@">
  <father />
  <child />
  <mother ref="@I23@"/>
 </family>
 <individual id="@I24@">
  <name first="Margarete" />
 </individual>
 <family id="@F10@">
  <father />
  <child />
  <mother ref="@I24@"/>
 </family>
 <individual id="@I3@">
  <name first="Albrecht Achilles" />
 </individual>
 <family id="@F11@">
  <father ref="@I3@" />
  <child />
  <mother />
 </family>

</genview>
It appears that genealogical data was mapped more or less correctly,
but the major issue with this format that it lacks of parent-child
relations
I need some XSL techniques/methods to define relations between parents
and children. I tried to utilize Muenchian method, but it seems
alogocal to apply it for defining
hierarhical realtions. What seems more logical to me is to write XSL
transformation routines which define an unique ID of each individual
in this context as a variable or a key and
check/compare this value against other ID classifiers for occurance of
string values after checked variable (1.1 and 1.1.1; 1.2 and
1.2.1,1.2.3 etc.). When child ID classifier is found, it should then
be written as a child reference (@ref) within /genview/family/child node.
The required output is following:


<genview>
 <individual id="@I1@">
  <name first="Johann der Alchemist" />
 </individual>
 <family id="@F1@">
  <father ref="@I1@" />
  <child ref="@I11@"/>
                <child ref="@I12@"/>
                <child ref="@I13@"/>
                <child ref="@I14@"/>
 </family>
 <individual id="@I11@">
  <name first="Rudolf" />
 </individual>
 <family id="@F2@">
  <father ref="@I11@" />
  <child />
                <mother/>
 </family>
 <individual id="@I12@">
  <name first=" Barbara " />
 </individual>
 <family id="@F3@">
  <father />
  <child />
  <mother ref="@I12@"/>
 </family>
 <individual id="@I13@">
  <name first=" Elisabeth " />
 </individual>
 <family id="@F4@">
  <father />
  <child />
  <mother ref="@I13@"/>
 </family>
 <individual id="@I14@" >
  <name first="Dorothea" />
 </individual>
 <family id="@F5@">
  <father/>
  <child />
  <mother ref="@I14@"/>
 </family>
 <individual id="@I2@" >
  <name first="Friedrich II" />
 </individual>
 <family id="@F6@">
  <father ref="@I2@" />
  <child ref="@I21@"/>
                <child ref="@I22@"/>
                <child ref="@I23@"/>
  <mother />
 </family>
 <individual id="@I21@">
  <name first="Johann" />
 </individual>
 <family id="@F7@">
  <father ref="@I21@" />
  <child />
  <mother />
 </family>
 <individual id="@I22@">
  <name first="Erasmus" />
 </individual>
 <family id="@F8@">
  <father ref="@I22@" />
  <child />
  <mother />
 </family>
 <individual id="@I23@">
  <name first="Dorothea" />
 </individual>
 <family id="@F9@">
  <father />
  <child />
  <mother ref="@I23@"/>
 </family>

</genview>
I am not sure if the result could be achieved by means of XSLT transformations?
Thank you for your patience and support,
Sincerely
Vadim Verenich

Current Thread