RE: [xsl] Yet Another Entity Ref question!

Subject: RE: [xsl] Yet Another Entity Ref question!
From: Marco Guazzone <sguazt@xxxxxxxxxxx>
Date: Fri, 20 Dec 2002 15:16:28 +0100 (CET)
Hi Michael,
maybe my brain has been drinking!!! :-P
...maybe I didn't explain what I would like to produce very well!
So, let's repeat again:
User may write an XML, possibly with reference to entity.
So there are two solutions:
A) include all the possible symbolic entity in the DTD:
      <!ENTITY ent "&#xNN;">
   and then produce the output with these entities either encoded or not
   (using "us-ascii" enconding)
B) the user will use a special XML element, say "entity", to refer to the
   entity:
     <entity>ent</entity>
   Through XSLT this will be managed so that the resulting output contains
   the entity. To manage this you have two possibilities:
   B.1) the solution previously proposed by David, that's to insert an entity
        dictionary in the XML doc and referring to it with an XPath query.
   B.2) the solution originally proposed by me, that's to construct the
        entity by outputting "&", "ent", ";".

Here below my considerations; the (+) symbol means good while the
(-) means bad.
Solution A:
(+) is the more elegant solution
(+) in general faster than B (at least than B.1), even if ... [see (-)s
    below]
(-) consumes more memory to store the entities
(-) I have to take care about writing down all possible symbolic entities to
    construct the DTD
(-) even if the user does not insert entities, the document will contain
    DTD, consuming time and space (the DTD infact will be automatically
    included in the doc by the content management engine).

Solution B.1:
(-) conceptually is equivalent to "A"; however, instead of having the
    entities representations in the DTD, it stores them in XML elements;
    So, in addition to disadvantages came from "A", I add the one
    regarding the template processing in XSLT; furthermore, respect to
    "B.2" we have one more Xpath query.

Solution B.2:
(+) I haven't to take care about writing down all possible symbolic
    entities
(+) don't consume additional memory (except that for storing the template)
(-) produces a not-well formed document, since it wants to output the "&"
    symbol.
(-) in general is less faster than "A" since we have to apply the
    template; however when the user does not insert an entity, the
    XML parser don't have to parse the DTD for the entities (like "A").
(-) It seems that when the output of the entity template processing is
    stored in a variable I have to use xsl:value-of with d-o-e (see my
    original email).

Thanks very much!!!

--------------------------------
Marco Guazzone
Software Engineer
Kerbero S.r.L. - Gruppo TC
Viale Forlanini, 36
Garbagnate M.se (MI)
20024 - Italy
mail: marco.guazzone@xxxxxxxxxxx
www: http://www.kerbero.com
Tel. +39 02 99514.247
Fax. +39 02 99514.399
--------------------------------

On Fri, 20 Dec 2002, Michael Kay wrote:

> You are making things ridiculously complicated. If you are producing
> output that you want to view in an editor that can't understand UTF-8,
> just set <xsl:output encoding="us-ascii"/> as you were initially
> advised.
> 
> Michael Kay
> Software AG
> home: Michael.H.Kay@xxxxxxxxxxxx
> work: Michael.Kay@xxxxxxxxxxxxxx 
> 
> > -----Original Message-----
> > From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx 
> > [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx] On Behalf Of 
> > Marco Guazzone
> > Sent: 20 December 2002 11:55
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: Re: [xsl] Yet Another Entity Ref question!
> > 
> > 
> > Hi David,
> > your idea is good.
> > Currently I'm using the LibXSLT processor version 1.0.23 
> > (with libxml2 version 2..4.30). However, with this I will 
> > produce the encode UNICODE char in the output
> > (HTML) doc; i.e.
> > XML:
> > <doc>
> >   <label>Foobar<entity>copy</entity></label>
> >   <entity-dict>
> >      <entity-item name="copy" value="&#169;" />
> >   </entity-dict>
> > </doc>
> > 
> > XSL:
> > <!-- ... like the previous except for: -->
> > <xsl:template match="entity">
> >   <xsl:value-of select="/doc/entity-dict/entity-item[@name = 
> > current()]/@value" /> </xsl:template>
> > 
> > This produce as output
> > Foobar(C)Foobar(C)Foobar(C)
> > where (C) is the encoded value of &#169;
> > This may cause problem in non-UNICODE editors or browser, 
> > especially if I include the result in a source document (e.g. 
> > Perl, C) as a return value of a function (problems may arise 
> > in compiling/interpreting phase). Instead what I would 
> > generate is: Foobar&copy; or more generally: Foobar&ent; 
> > where "ent" is specified by an anonymous user in XML via: 
> > <entity>ent</entity> What do you think about it?
> > 
> > --------------------------------
> > Marco Guazzone
> > Software Engineer
> > Kerbero S.r.L. - Gruppo TC
> > Viale Forlanini, 36
> > Garbagnate M.se (MI)
> > 20024 - Italy
> > mail: marco.guazzone@xxxxxxxxxxx
> > www: http://www.kerbero.com
> > Tel. +39 02 99514.247
> > Fax. +39 02 99514.399
> > --------------------------------
> > 
> > On Fri, 20 Dec 2002, David Carlisle wrote:
> > 
> > > <xsl:template match="doc">
> > >    <xsl:apply-templates select="label" /> <!-- ok! -->
> > >    <xsl:variable name="label">
> > >       <xsl:apply-templates select="label" />
> > >    </xsl:variable>
> > >    <xsl:value-of select="$label" />  <!-- not ok -->
> > >    <xsl:value-of disable-output-escaping="yes" select="$label" />  
> > > <!-- ok
> > > -->
> > > </xsl:template>
> > > 
> > > which processor are you using?
> > > 
> > > d-o-e is optional so a processor can ignore it altogether, 
> > but if it 
> > > supports it at all I think that in xslt1 the character 
> > should keep the 
> > > d-o-e property even when it goes through the variable.
> > > 
> > > Is your input form fixed?
> > > 
> > > It would be easier if your
> > > <ent>xxx</ent>
> > > only took entity names, as then you could convert them easily to 
> > > characters without using d-o-e just by looking them up in a 
> > document 
> > > of the form
> > > 
> > > <entity name="copy" char="&#169;"/>
> > > ...
> > > 
> > > 
> > > There is no need to have an input form of
> > > <ent>#x0A</ent>
> > > 
> > > as the user can more simply write
> > > &#x0A;
> > > which then doesn't need any processing at all at the xslt level.
> > > 
> > > David
> > > 
> > > 
> > _____________________________________________________________________
> > > This message has been checked for all known viruses by Star 
> > Internet 
> > > delivered through the MessageLabs Virus Scanning Service. 
> > For further 
> > > information visit http://www.star.net.uk/stats.asp or alternatively 
> > > call Star Internet for details on the Virus Scanning Service.
> > > 
> > >  XSL-List info and archive:  
> > http://www.mulberrytech.com/xsl/xsl-list
> > > 
> > > 
> > 
> > 
> >  XSL-List 
> > info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> > 
> 
> 
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> 
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread