Re: [xsl] Muenchian keys ... plus a bit?

Subject: Re: [xsl] Muenchian keys ... plus a bit?
From: "Thomas B. Passin" <tpassin@xxxxxxxxxxxx>
Date: Wed, 22 Aug 2001 11:05:46 -0400
Dave, here's what I would try.

1) Create a list of all items and assign it to a variable" all-items".

2) Create a list of all unique items (based on their PCDATA - that is,  all
<item>content</item>
elements get represented by one element in this list.  Assign it to a
variable "unique-items".  This is the "Muenchian" part, of course.

3) Do a for-each on $unique-items.  At each iteration, output that item's
header (e.g., "content"), then find all the item nodes with that name:
<variable name='this-items-name' select='name()'/>
<variable name='these-items' select='$all-items[name()=$this-items-name]'/>

4) Do a for-each over $these-items.  You could sort them, too.  This is
where you output the pages.

Once this is working, you could create some keys if your files are big and
you need some speed-up action.

I didn't try this so some details may need tuning up, but it should work
nicely.

Cheers,

Tom P

[<DPawson@xxxxxxxxxxx>]
> Given
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE idx [
> <!ELEMENT idx (ent+)>
> <!ELEMENT ent (item, pge+)>
> <!ELEMENT item (#PCDATA)>
> <!ELEMENT pge  (#PCDATA)>
> <!ATTLIST pge key (t|f) 'f'>
>
>
> ]>
>
> <idx>
>  <ent>
>   <item>content</item>
>   <pge key="f">98</pge>
>  </ent>
>  <ent>
>   <item>content</item>
>   <pge key="f">108</pge>
>   <pge>110</pge>
>  </ent>
>  <ent>
>   <item>another</item>
>   <pge key="f">100</pge>
>  </ent>
>  <ent>
>   <item>zero</item>
>   <pge key="t">210</pge>
>  </ent>
> </idx>
>
>
> And indexing DTD.
>
> I want to present it as
>
> A  B  C .... Z
> (each hotlinked to the start of that letter).
>
> Then
>
> A  (the anchor)
>
> aardvark, page 1,67,79
>   (say with page 67
> -------------------
> B
>
> bathtub, page 3,5,7
>
> ------------------
>
> Z
>
> zero, page 210
>    (210 in bold, its the main entry)
> etc.
>
> Two pass solution, first sorting, to make data entry easy.
> Being lazy, I don't always remember that I've already made
> an entry for a particular element, so there are duplicates.
> the <item> is duplicated, but the page numbers are not,
> hence the 'remove duplicates' approach of keys only partially works.
> Hence the Muenchian plus (I think :-).
>
> Question, how to remove the duplicate entries without losing
> the page numbers associated with the duplicate?
>
> I found this quite an interesting stylesheet, till I couldn't
> figure out the key definitions/usage, then I was stopped.
>
> I have everything except the 'remove duplicates' bit.
>



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread