[xsl] RE: for roger Glover..., Knowledge management XML

Subject: [xsl] RE: for roger Glover..., Knowledge management XML
From: Jinesh Varia <jineshresearch@xxxxxxxxx>
Date: Mon, 10 Feb 2003 17:04:52 -0800 (PST)
Hello,

I have included the final code for others to
experiment.


This is interesting problem of matching the two XML
data sheets to get one correct one. the Knowledge
mangement aspect with regards to the XSL sheet which
has Person names who are authors of publications.

If I have a knowledge XML of say <author>Micheal
Kay</author> is same as <author>M. Kay</author> in one
xml data sheet in the form:
<samePersons>
<author>Micheal Kay</author> <!-- the actual correct
one that I want in database-->
<author>Micheal</author>
<author>Micheal K.</author>
</samepersons>

I have a seperate xml data sheet. that simply has all
the "knowledge" mentioned. how can I sort/delete the
error names for my current XML, which is
<person id="0003">
Micheal Kay
</person>

I hope I am explaining you properly. I have one XML
data sheet which has knowledge of which ones aer right
and which ones are wrong names. I want to delete the
errornous elements in my main XML sheet so that only
the correct names are shown.
Also, if I delete the errornoues elements, I have put
the correct id in the pubper element also.

Suggest whether should I do this when I am generating
the ids (XSL sheet show below) or after I generate the
ids in a seperate XSL.

Jinesh




-----------------------------------------------
final code:
<xsl:transform version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:output method="xml" indent="yes"
xmlns:xalan="http://xml.apache.org/xalan";
xalan:indent-amount="4" />
<xsl:variable name="persons">
<xsl:apply-templates
select="//publication/author[not(.=preceding::author
or
.=preceding::editor)]|//publication/editor[not(.=preceding::author
or .=preceding::editor)]"
mode="generate-person"/>
</xsl:variable>

<!-- Similar to original "generate-author-id"
template, but generates entire person element-->
<xsl:template match="author|editor"
mode="generate-person">
<xsl:if test="normalize-space(.)"> <!-- this is to
prevent any emply author/editor elements to get ids
-->
<xsl:variable name="temp"
select="concat('000000000',position())" />
    <xsl:variable name="perid"
select="substring($temp,string-length($temp)-9)"/>
    <person perid="{$perid}">
        <personname>
            <xsl:value-of select="."/>
        </personname>
    </person>
</xsl:if>
</xsl:template>

<xsl:template match="dblp">

    <dblp>
        <!-- copies the "person" elements result tree
fragment into the result tree -->
        <xsl:copy-of select="$persons"/>
        <xsl:apply-templates select="publication"/>
    </dblp>
</xsl:template>

<xsl:template match="publication">

    <!-- Same as in the original code -->
    <publication>
        <xsl:copy-of select="@*|*[not(self::author or
self::editor)]"/>
    </publication>

    <!-- calls template to create "pubper" elements,
one per publication per pub author -->
    <xsl:apply-templates select="author|editor"/>
</xsl:template>
    
<!-- creates "pubper" elements -->
<xsl:template match="author|editor">
<xsl:if test="normalize-space(.)">
    <pubper>

        <!-- gets "pubid" from parent  -->
        <pubid>
            <xsl:value-of select="../@pubid"/>
        </pubid>
    
        <!-- gets "perid" from "$persons" variable -->
        <perid>   
            
            <!-- Note that in XSLT 1.0 a result tree
fragment like "$persons" does not automatically
convert to a node set.  Therefore
most processors provide an extension function for that
purpose (like "xalan:nodeset()" below) -->
    <xsl:value-of
xmlns:xalan="http://xml.apache.org/xalan";
select="xalan:nodeset($persons)/person[current()=personname]/@perid"
            />
        </perid>
        <persontype><xsl:choose><xsl:when
test="node()=self::editor"><xsl:text>2</xsl:text></xsl:when><xsl:otherwise><xsl:text>1</xsl:text></xsl:otherwise></xsl:choose></persontype>
    </pubper>
</xsl:if>
</xsl:template>
        
</xsl:transform>

--- Roger Glover <glover_roger@xxxxxxxxx> wrote:
> Jinesh Varia wrote:
> 
> > Are you some kind of XML jini!
> 
> Far from it.  Just ask the *real* regulars. :-)
> 
> 
> > thank you very much. I am entangled in this XSL
> > programming since two weeks and you solved it like
> in
> > a blink.
> 
> You were most of the way there, you just needed one
> key insight.  It would
> have taken me somewhat longer to write this starting
> with just an idea.
> 
> 
> > But there are some serious issues here:
> >
> > With your approach of generating perids before the
> > actual seperation of publication, person, pubper
> > elements, I feel it would not work when I have
> 500,000
> > author elements. I have an 130MB XML sheet which
> > contains almost 350,000 publication elements
> > I know you did not knew about this. Can you please
> > comment on this.
> >
> > Do you think I am right on this? Please correct
> me.
> 
> I chose this solution not because it was the most
> efficient, but because it
> was the most direct route I could find from where
> you were to where you
> wanted to be.
> 
> Right now it would probably behoove you to spend
> some time with the FAQ, the
> spec and other reference resources (I like Michael
> Kay's "XSLT Programmer's
> Reference"), studying the syntax and usage of the
> "<xsl:key>" element and
> the "key()" function.  You should then also look up
> and study any FAQ
> reference to Muenchian grouping.
> 
> 
> > Now there are also editors along with authors.
> Authors
> > can be editors also for some publication. means
> > <author>Steve Lawyer</author> for pub1 can be
> > <editor>Steve Lawyer</editor> for pub2. but we
> want
> > to have single person element generated. While in
> > <pubper> we have <persontype> (1 for author, 2 for
> > editors) hence in our example for pub1, it shoud
> be
> > <persontype>1</persontype> and for pub2 it should
> be
> > <persontype>2</persontype>
> > how can we store that information with your code
> then?
> > we have to get unique person names
> 
> Match "author | editor" instead of just "author",
> and use either "<xsl:if>"
> or "<xsl:choose>" + "<xsl:when>" to choose between
> persontype "1" (author)
> and persontype "2" (editor).  Likewise, the "select"
> expression on
> "<xsl:apply-templates>" in the "persons" variable
> *would* have to become
> much more complicated.  However, if you change to
> keys and Muenchian
> grouping, the expression will be much simpler.
> 
> 
> > You dont have a clue How much your code has helped
> > me!!! I have been working on this since two
> weeks...
> > thanks, roger. thank you
> 
> You are very welcome.  Glad to help.  :^)
> 
> Let us know if you get stuck, or when you have a
> final version.
> 
> 
> -- Roger Glover
>    glover_roger@xxxxxxxxx
> 
> 
> 
>  XSL-List info and archive: 
> http://www.mulberrytech.com/xsl/xsl-list
> 


=====
-----------------------------------------------------------------
Jinesh Varia
Graduate Student, Information Systems
Pennsylvania State University
Email: jinesh@xxxxxxx
-----------------------------------------------------------------
'Self is the author of its actions.'

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread