RE: [xsl] Flat genealogical structure to organized parent-child relationships

Subject: RE: [xsl] Flat genealogical structure to organized parent-child relationships
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Mon, 25 Aug 2008 10:05:08 +0100
The general structure of the problem seems very similar to that of my paper

http://www.idealliance.org/proceedings/xml04/papers/111/mhk-paper.html

and I think it is best tackled using a similar 2-stage approach: first parse
the records using regular expressions, then group them, recursively. Of
course both stages are much easier using XSLT 2.0.

The first stage is to parse the text into elements that retain the
hierarchic numbering, for example

<p id="I1" nr="1" details="Johann der Alchemist, renounced his rights of
succession (1406-146); m.1412 Pss Barbara of Saxe-Wittenberg (1405-1465)"/>
<p id="I2" nr="1.1" details="Rudolf, b.and d.1424"/>

Of course you can do more parsing at this stage if you want, but I don't
think that part is critical to the problem.

Then in stage 2 you need to do some grouping to create the family elements. 

In your first grouping phase you want to group by the value of tokenize(@nr,
'\.')[1]. Then each of these groups is (recursively) grouped using the key
tokenize(@nr, '\.')[2]; and so on, until the groups are empty.

Michael Kay
http://www.saxonica.com/ 

> -----Original Message-----
> From: Vadim Verenich [mailto:vadimverenich@xxxxxxxxx] 
> Sent: 25 August 2008 08:18
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: [xsl] Flat genealogical structure to organized 
> parent-child relationships
> 
> Dear XSLT Experts,
> I am having problems with converting flat structured XML file 
> into hierarchically nested XML.
> Last month i read Chapter 19 from Michael Kay's book (it 
> deals with conversion of unparsed GEDCOM text file into XML 
> structure) and was very impressed.
> Since then i have converted all my GEDCOM files into 
> GedcomXML format; however some bits of genealogical data in 
> my digital archive are organized into more classical text 
> format rather than commonly accepted Gedcom format.
> I will use a part of Paul Thereof's Hohenzollern genealogy 
> scheme to illustrate how does this format looks like:
> The text format is as follwing:
> //sampe
> 1.Johann der Alchemist, renounced his rights of succession (1406-146);
> m.1412 Pss Barbara of
> Saxe-Wittenberg (1405-1465)
> 1.1.Rudolf, b.and d.1424
> 1.2.Barbara (1423-1481); m.1433 Luigi III Gonzaga, Margrave 
> of Mantua (d.1478) 1.3.Elisabeth (1425-after 13 Jan 1465); 
> m.1st 1437 Duke Joachim of Pomerania (d.1451); m.2d
> 1453 Duke Wratislaw X of Pomerania (d.1478) 1.4.Dorothea 
> (1430-1495); m.1st 1445 King Christof III of Denmark 
> (d.1448); m.2d 1449 King Christian I of Denmark (d.1481) 
> 2.Friedrich II, Elector of Brandenburg (1413-1471); m.1441 
> Pss Katharina of Saxony (1421-
> 1476)
> 2.1.Johann (1452-1454)
> 2.2.Erasmus, b.after 1452, d.1464/5
> 2.3.Dorothea (1446-1519); m.1464 Duke Johann V of 
> Saxe-Lauenburg (d1507) 2.4.Margarete, d.1489; m.ca 1477 Duke 
> Bogislaw X of Pomerania (d.1523) 3.Albrecht Achilles, Elector 
> of Brandenburg (1414-1486); he laid down the family rule rare 
> among German families, the key to its future success, that 
> Brandenburg would never be divided, but always inherited by 
> the eldest son, and that the territories of Ansbach and 
> Bayreuth could be given to younger sons, but not further 
> subdivided; he m.1st 1446 Mgvine Margarete of Baden (d.1457); 
> m.2d 1458 Pss Anna of Saxony (1437-1512) // The analysis of 
> data structure:
> As you can see, the structural organization of data is 
> looking like something akin to two dimensional flat array 
> (with 2 axis: vertical and horizontal).
> The family relations are represented along a horziontal axis, 
> meanwhile the vertical axis declares parent-child 
> relationships in string forms 1.n., where string "."
> is used as a delimiter between two proceeding generations, 
> and n is a classifier of an individual within any given generation.
> For example: an individual with classifier 1.1. is a child of 1 etc.
> I converted this plain text format into CSV data and then 
> used Altova MapForce to map some fields  of this structure 
> against Rob Mckinnon's Genview XSD scheme and was given a 
> following result:
> <?xml version="1.0" encoding="UTF-8"?>
> <genview>
>  <individual id="@I1@">
>  <name first="Johann der Alchemist" />
>  </individual>
>  <family id="@F1@">
>  <father ref="@I1@" />
>  <child />
>  </family>
>  <individual id="@I11@">
>  <name first="Rudolf" />
>  </individual>
>  <family id="@F2@">
>  <father ref="@I11@" />
>  <child />
>                <mother/>
>  </family>
>  <individual id="@I12@">
>  <name first=" Barbara " />
>  </individual>
>  <family id="@F3@">
>  <father />
>  <child />
>  <mother ref="@I12@"/>
>  </family>
>  <individual id="@I13@">
>  <name first=" Elisabeth " />
>  </individual>
>  <family id="@F4@">
>  <father />
>  <child />
>  <mother ref="@I13@"/>
>  </family>
>  <individual id="@I14@" >
>  <name first="Dorothea" />
>  </individual>
>  <family id="@F5@">
>  <father/>
>  <child />
>  <mother ref="@I14@"/>
>  </family>
>  <individual id="@I2@" >
>  <name first="Friedrich II" />
>  </individual>
>  <family id="@F6@">
>  <father ref="@I2@" />
>  <child />
>  <mother />
>  </family>
>  <individual id="@I21@">
>  <name first="Johann" />
>  </individual>
>  <family id="@F7@">
>  <father ref="@I21@" />
>  <child />
>  <mother />
>  </family>
>  <individual id="@I22@">
>  <name first="Erasmus" />
>  </individual>
>  <family id="@F8@">
>  <father ref="@I22@" />
>  <child />
>  <mother />
>  </family>
>  <individual id="@I23@">
>  <name first="Dorothea" />
>  </individual>
>  <family id="@F9@">
>  <father />
>  <child />
>  <mother ref="@I23@"/>
>  </family>
>  <individual id="@I24@">
>  <name first="Margarete" />
>  </individual>
>  <family id="@F10@">
>  <father />
>  <child />
>  <mother ref="@I24@"/>
>  </family>
>  <individual id="@I3@">
>  <name first="Albrecht Achilles" />
>  </individual>
>  <family id="@F11@">
>  <father ref="@I3@" />
>  <child />
>  <mother />
>  </family>
> 
> </genview>
> It appears that genealogical data was mapped more or less 
> correctly, but the major issue with this format that it lacks 
> of parent-child relations I need some XSL techniques/methods 
> to define relations between parents and children. I tried to 
> utilize Muenchian method, but it seems alogocal to apply it 
> for defining hierarhical realtions. What seems more logical 
> to me is to write XSL transformation routines which define an 
> unique ID of each individual in this context as a variable or 
> a key and check/compare this value against other ID 
> classifiers for occurance of string values after checked 
> variable (1.1 and 1.1.1; 1.2 and
> 1.2.1,1.2.3 etc.). When child ID classifier is found, it 
> should then be written as a child reference (@ref) within 
> /genview/family/child node.
> The required output is following:
> 
> 
> <genview>
>  <individual id="@I1@">
>  <name first="Johann der Alchemist" />
>  </individual>
>  <family id="@F1@">
>  <father ref="@I1@" />
>  <child ref="@I11@"/>
>                <child ref="@I12@"/>
>                <child ref="@I13@"/>
>                <child ref="@I14@"/>
>  </family>
>  <individual id="@I11@">
>  <name first="Rudolf" />
>  </individual>
>  <family id="@F2@">
>  <father ref="@I11@" />
>  <child />
>                <mother/>
>  </family>
>  <individual id="@I12@">
>  <name first=" Barbara " />
>  </individual>
>  <family id="@F3@">
>  <father />
>  <child />
>  <mother ref="@I12@"/>
>  </family>
>  <individual id="@I13@">
>  <name first=" Elisabeth " />
>  </individual>
>  <family id="@F4@">
>  <father />
>  <child />
>  <mother ref="@I13@"/>
>  </family>
>  <individual id="@I14@" >
>  <name first="Dorothea" />
>  </individual>
>  <family id="@F5@">
>  <father/>
>  <child />
>  <mother ref="@I14@"/>
>  </family>
>  <individual id="@I2@" >
>  <name first="Friedrich II" />
>  </individual>
>  <family id="@F6@">
>  <father ref="@I2@" />
>  <child ref="@I21@"/>
>                <child ref="@I22@"/>
>                <child ref="@I23@"/>
>  <mother />
>  </family>
>  <individual id="@I21@">
>  <name first="Johann" />
>  </individual>
>  <family id="@F7@">
>  <father ref="@I21@" />
>  <child />
>  <mother />
>  </family>
>  <individual id="@I22@">
>  <name first="Erasmus" />
>  </individual>
>  <family id="@F8@">
>  <father ref="@I22@" />
>  <child />
>  <mother />
>  </family>
>  <individual id="@I23@">
>  <name first="Dorothea" />
>  </individual>
>  <family id="@F9@">
>  <father />
>  <child />
>  <mother ref="@I23@"/>
>  </family>
> 
> </genview>
> I am not sure if the result could be achieved by means of 
> XSLT transformations?
> Thank you for your patience and support, Sincerely
> 
> Vadim Verenich

Current Thread