Re: [xsl] Re: Re-arranging an XML file

Subject: Re: [xsl] Re: Re-arranging an XML file
From: Syd Bauman <Syd_Bauman@xxxxxxxxx>
Date: Wed, 21 Jan 2009 10:44:21 -0500
Hmmm ... I posted this a few hours ago (09:30-05), but it was
rejected by the list (because I had included the sample XSLT as a
text/xml attachment, rather than cut-and-pasted plain text).

I'm re-posting it mostly just to prove the point that, although great
minds may think alike, Wendell says it so much better :-)

---------

> As I mentioned in an earlier post: I'm new to XSL (I've got about 3
> hours experience) and am still trying to get my head around it,

Indeed. I'm very sympathetic. XSLT is weird for us old 3rd generation
procedural folks (I learned on PASCAL, but same idea), but the
benefits outweight the learning curve, IMHO. If you can find one, a
tutorial (either hands-on, web-based activity, or an instroductory
book) may be a lot more efficient way of learning the big picture
stuff than posting here.

In any case, I don't understand exactly what you're trying to do, but
may help you down the right track, anyway.


> - how can I output in CSV format?

XSLT has no native "output in CSV" capability, nor do I think does
XPath (someone will correct me if I'm wrong, I'm sure :-). XSLT
outputs XML by default. However, XSLT can output plain text if you
ask it to:
   <xsl:output method="text"/>
Then you can use the <xsl:text> element to output the values of
interest. 


> - how do I remove the "<AddedAlbums> tag?

The tags or the element? To ignore the tags, but retain the (text)
content as text output:
   <xsl:template match="AddedAlbums">
     <xsl:value-of select="."/>
   </xsl:template>
(Note that this takes only the textual content of <AddedAlbums> and
its decendants.) To ignore the entire element (including attributes,
child elements, text content, PIs, comments):
   <xsl:template match="AddedAlbums"/>
This says "when you match an <AddedAlbums>, output nothing (not even
the default behavior)".


> - how can I include (for example) "LastChangedBy" labeling it
>   something else? (eg "User")

I don't understand what you mean by "labeling" in the context of CSV
output. 


The following tiny example reads in a TEI corpus file, and for each
text spits out CSV record of author, title, publication date. Note
that (because of the corpus I am dealing with) I know in advance that
the straight double-quote character (U+0022) will never occur in the
input, but that comma (U+002C) may appear in the title, will almost
always appear in the author, and will never appear in the date.

Also note that <author> elements that do not have a child naming
element are treated differently: the content is surrounded in square
brackets (used for "unknown" and "anonymous" and such).

I don't claim this is the best XSLT one could come up with, but it
did work for my data. Also note that some TEI-ers may disagree with
my use of naming elements inside <author>.

--------- begin XSLT ---------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
  xmlns:tei="http://www.tei-c.org/ns/1.0"; >

  <!-- Read in a <teiCorpus> and output a CSV-like table of titles, authors, and -->
  <!-- Copyleft 2009 Syd Bauman and the Brown University Women Writers Project -->
  <!-- publication dates -->
  <!-- Input: a TEI document -->
  <!-- Output: a CSV document -->
  <!-- Details: -->
  <!-- * For each <TEI> child of a root <teiCorpus> we generate a record -->
  <!-- * each record has three fields: author, title, and date -->
  <!-- * each field will end up being empty if the corresponding input element -->
  <!--   is missing or empty -->
  <!-- * all titles and authors are surrounded by quotes, as so many have -->
  <!--   commas in them, and none have quotation marks in them -->
  <!-- * author output is surrounded in square brackets if it is not explicitly -->
  <!--   encoded as some sort of name -->
  
  <xsl:output method="text"/>
  
  <xsl:template match="/">
    <!-- header line names fields in my CSV-like output -->
    <xsl:text>author,title,date&#x0A;</xsl:text>
    <!-- for each child text element ... -->
    <xsl:for-each select="tei:teiCorpus/tei:TEI">
      <!-- output the first <author> element also surrounded by quotes and -->
      <!-- followed by a comma -->
      <xsl:text>"</xsl:text>
      <xsl:apply-templates select="./tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:author[1]"/>
      <xsl:text>",</xsl:text>
      <!-- next, the 1st <title> element surrounded by quotes and a comma -->
      <xsl:text>"</xsl:text>
      <xsl:value-of select="normalize-space(./tei:teiHeader/tei:fileDesc/tei:titleStmt/tei:title[1])"/>
      <xsl:text>",</xsl:text>
      <!-- then the publication date, and then a newline -->
      <xsl:value-of select="./tei:teiHeader/tei:fileDesc/tei:publicationStmt/tei:date/@when"/>
      <xsl:text>&#x0A;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template match="tei:author">
    <xsl:choose>
      <!-- If we actually have an author name, spit it out -->
      <xsl:when test="tei:persName|tei:orgName|tei:name">
        <xsl:apply-templates/>
      </xsl:when>
      <!-- If what we have is not explicitly indicated as a name, -->
      <!-- stick it in square brackets. -->
      <xsl:otherwise>
        <xsl:text>[</xsl:text>
        <xsl:apply-templates/>
        <xsl:text>]</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>
--------- end XSLT ---------

Current Thread