Re: [xsl] sorting titles w stopwords but w/o value in every title node

Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title node
From: "Susan Campbell" <SCampbell@xxxxxxxxxxxxxx>
Date: Wed, 1 Sep 2004 16:44:33 -0400
Anton and Bruce,
Thanks for your help.  I'm sorry for the delay in responding.  A large tree
fell on my house about 1 AM Tuesday morning and I have been away from work
finding a tree service and contractors, etc.  It's  quite a challenge.

I cannot do a triple sort using doc-number as the first sort.  That just puts
things in doc-number order.  I don't think I can group on doc-number and then
sort by title within that group. I think xsl:sort needs a path name.

Anton says it succinctly, I need to treat records that don't have a title as
if they do have a title. The link is that they have the same document number.
I need the records with the same doc number to show up with the corresponding
title in arrival-date order.

The processor is Saxon but it's being called from within another application.
I do not believe I can do a two-step process.  That's why I'm calling the
stopwords with document() from this stylesheet.
sc
------------------------------

Date: Mon, 30 Aug 2004 09:01:10 -0400
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
From: "Susan Campbell" <SCampbell@xxxxxxxxxxxxxx>
Subject:  Re: [xsl] sorting titles w stopwords but w/o value in every title
node
Message-ID: <D44554884CB7D74B87423B62952F369901BD1DF1@xxxxxxxxxxxxxxxxxxxxxx>

Thanks for the help. (I am still referring to the stop-words variable =
with document('')/xsl:stylesheet/sw:stop/word because that does give me =
the sort order. Because
our setup, that may be my only option.)

The problem I still have is that entries without a value in the title =
sort first. =20
I need to group by title when the doc-number is the same. It may be both =
a sorting=20
and grouping problem, but I don't know how to go about it. =20

(The doc number is included only for testing. I left out imprint and =
ISBN from this sample for clarity. It is possible to have the same issue =
or different issue arrive on the same or different days as there are =
multiple subscriptions.)

The output I need is:
doc#	Title				      Description			Arrived date=09
53690 American Artist 			v.68:no.738(2004:Jan.)	 02/26/2004
57769	The American city & country	v.119:no.1(2004:Jan.)	 02/11/2004=09
57769						v.119:no.3(2004:Mar.)	 03/25/2004
58345 American demographics		v.26:no.1(2004:Feb.)	 02/05/2004
58345 					v.26:no.1(2004:Feb.)	 02/26/2004
58345 					v.26:no.2(2004:Mar.)	 02/26/2004
58345						v.26:no.2(2004:Mar.)	 02/26/2004

Sample of problem causing xml:
-------------
<section-02>
<title>Forbes.</title>
<isbn-issn>0015-6914</isbn-issn>
<doc-number>58615</doc-number>
<description>v.173:no.5(2004:Mar.15)</description>
<arrival-date>03/15/2004</arrival-date>
</section-02>

<section-02>
<title></title>
<isbn-issn-code></isbn-issn-code>
<doc-number>58615</doc-number>
<description>v.173:no.1(2004:Jan. 12)</description>
<arrival-date>01/12/2004</arrival-date>
</section-02>

<section-02>
<title></title>
<isbn-issn-code></isbn-issn-code>
<doc-number>58615</doc-number>
<description>v.173:no.2(2004:Feb. 02)</description>
<arrival-date>01/21/2004</arrival-date>
</section-02>

My stylesheet:
-------------
<xsl:stylesheet
   xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform"; version=3D"1.0"
   xmlns:sw=3D"mailto:bubba@xxxxxxx";
   exclude-result-prefixes=3D"sw">
<xsl:include href=3D"funcs.xsl"/>
<sw:stop>
	<word>the</word>
	<word>a</word>
	<word>an</word>
</sw:stop>
<xsl:variable name=3D"stop-words" =
select=3D"document('')/xsl:stylesheet/sw:stop/word"/>
<xsl:variable name=3D"lowercase" =
select=3D"'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name=3D"uppercase" select=3D"'ABCDEFGHIJKLMNOPQRSTUV'"/>

<xsl:template match=3D"/">=09
<table border=3D"'1'">
<th colspan=3D"6">Arrived Issues sorted without stop words</th>
<tr>
<td align=3D"center"><b/>number</td>
<td align=3D"center"><b/>Title</td>
<td align=3D"center"><b/>ISBN-ISSN</td>
<td align=3D"center"><b/>Imprint</td>
<td align=3D"center"><b/>Description</td>
<td align=3D"center"><b/>Arrived</td>
</tr>
<xsl:for-each select=3D"//section-02/title">
<xsl:sort select=3D"concat(substring(substring-after(.,' '), 0 div =
boolean
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),=20
concat(translate(., $uppercase, $lowercase), ' '))])), substring(., 0 =
div not
($stop-words[starts-with(translate(current(), $uppercase, $lowercase),=20
concat(translate(., $uppercase, $lowercase), ' '))])))"/>

<xsl:sort select=3D"number(concat(substring(../arrival-date, 7,4),
substring(../arrival-date, 1,2),=20
substring(../arrival-date, 4,2)))" order=3D"descending"/>=20
		=09
<tr>
<td width=3D"10%"><xsl:value-of select=3D"../doc-number"/></td>
<td width=3D"30%"><xsl:value-of select=3D"../title" /></td>
<td width=3D"10%"><xsl:value-of select=3D"../isbn-issn"/></td>
<td width=3D"20%"><xsl:value-of select=3D"../imprint"/></td>
<td width=3D"20%"><xsl:value-of select=3D"../description"/></td>
<td width=3D"10%"><xsl:value-of select=3D"../arrival-date"/></td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>

Thanks,
Susan Campbell
College Center for Library Automation
1753 W. Paul Dirac Drive
Tallahassee, FL 32310
850-922-6044

------------------------------

Date: Mon, 30 Aug 2004 09:17:01 -0400
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
From: Bruce D'Arcus <bdarcus@xxxxxxxxxxxxx>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title
node
Message-Id: <E0EAD541-FA86-11D8-B6E0-000A959F0E52@xxxxxxxxxxxxx>

On Aug 30, 2004, at 9:01 AM, Susan Campbell wrote:

> The problem I still have is that entries without a value in the title
> sort first.
> I need to group by title when the doc-number is the same. It may be
> both a sorting
> and grouping problem, but I don't know how to go about it.

So is it the case that if two records -- one with a title and one
without -- share the same doc-number, then they share the same title,
even if not explicitly coded?

If that were true, I guess logically you'd group by doc-number, and
then take a title from one among the group and sort on that for the
groups?

Bruce

------------------------------

Date: Mon, 30 Aug 2004 18:34:32 +0200
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
From: "cking" <cking@xxxxxxxxxx>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title
node
Message-ID: <002901c48eaf$3a7e6740$408876d5@xxxxxxxxxx>

Hi Susan,

> Thanks for the help. (I am still referring to the stop-words variable with
> document('')/xsl:stylesheet/sw:stop/word because that does give me the sort
order.
> Because our setup, that may be my only option.)

I found out why it didn't work for me, it's a namespace issue. I had put your
template
inside a XHTML-output stylesheet (with xmlns="http://www.w3.org/1999/xhtml";),
and then "document('')/xsl:stylesheet/sw:stop/word" didn't return anything. If
I
change the <word> elements to <sw:word>, it works.

> The problem I still have is that entries without a value in the title sort
first.
> I need to group by title when the doc-number is the same. It may be both a
sorting
> and grouping problem, but I don't know how to go about it.
>
> (The doc number is included only for testing. I left out imprint and ISBN
from this
> sample for clarity. It is possible to have the same issue or different issue
arrive on
> the same or different days as there are multiple subscriptions.)

Maybe I don't fully understand what you're trying to get (esp. that last
sentence),
but can't you simply perform a triple-sort instead of double-sort?
First sort by doc-number, then by title and finally by date?

> <xsl:for-each select="//section-02/z13-title">

I guess you're only using "//" in your sample code, because you know this can
seriously
slow down the transform process (esp. with large input files)? Unless of
course your
input files are organized with <section-02> elements that can appear anywhere
in
the document...

Best regards
Anton Triest

------------------------------

Date: Tue, 31 Aug 2004 03:37:28 +0200
To: <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
From: "cking" <cking@xxxxxxxxxx>
Subject: Re: [xsl] sorting titles w stopwords but w/o value in every title
node
Message-ID: <010401c48efb$13a60780$408876d5@xxxxxxxxxx>

Susan,

I wrote:
> but can't you simply perform a triple-sort instead of double-sort?
> First sort by doc-number, then by title and finally by date?

By rereading your message (desired output, and Bruce's reply), I think I
understand
your point. You don't want to sort by doc-number. You want to treat the
records that
don't have a title, as if they do have a title, taken from another record with
the same
doc-number. Is that correct?

What processor are you using? I mean, would it be OK to do a transformation
in two steps?

Greetings
Anton Triest

Current Thread