Re: [xsl] xslt replace special characters

Subject: Re: [xsl] xslt replace special characters
From: "Alice Fan" <arisuu@xxxxxxxxxxx>
Date: Mon, 11 Nov 2002 12:53:19 -0800
Thanks Mike. This is very helpful. Actually, thanks to everyone who responded to my questions.


From: Mike Brown <mike@xxxxxxxx>
Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: [xsl] xslt replace special characters
Date: Mon, 11 Nov 2002 13:38:52 -0700 (MST)

Alice Fan wrote:
> Thanks Greg. Right in the UI, we want the user to enter their URL. Their
> URL will most likely have name/value pairs. Is there an easier way? There
> is no otherway of filtering '&' before it gets processed in the XSL?


It doesn't matter if they're entering a URL/URI or not. Any text that you
intend to put into an XML document needs to be screened, to preserve
well-formedness / parseability.

1. Always note the following:

- non-XML characters need to be removed or replaced
  (U+0000..U+0008, U+000B, U+000C, U+000E..U+001F, U+D800..U+DFFF,
   U+FFFE..U+FFFF)

- a string is not a URI if it violates URI syntax, so if the text is
   destined for a URI-pseudotype attribute value (like href or src in
   HTML/XHTML), characters above U+007F should be escaped by writing
   their equivalent UTF-8 bytes as '%xx' for each byte, where xx is the
   hex notation for the byte (though this isn't strictly necessary; a
   conforming HTML user agent will do this automatically)

- additional translation of ASCII-range characters (U+0000..U+007F) in
   text destined for URI attributes is not required but is wise, to
   ensure conformance to URI syntax; %-escape everything except
   a-z, A-Z, 0-9, and these: - _ . ! ~ * ' ( ) ; / ? : @ & = + $ , [ ]


2. If and when the XML document exists in serialized form (i.e., as a string, not as a DOM object), note the following:

- if the text is not destined for a CDATA section, markup characters '&'
   and '<' need to be escaped

- if the text is destined for a CDATA section, the '>' in ']]>'
   needs to be escaped

- if the text is destined for a comment, it must not contain '--'
   (how you handle such an offense is up to you)

- if the text is destined for an attribute value delimited by apostrophes,
then apostrophes in the value must be escaped (usually use &apos; unless
in HTML)


- if the text is destined for an attribute value delimited by quotes,
   then quotes in the value must be escaped (usually use &quot;)

- if the text is destined for a non-URI attribute value, then tab, LF,
   and CR need to be escaped to facilitate round-tripping

I probably missed one or two cases, but as you can see, you can't just slap
any old text into a document and call it XML...

   - Mike
____________________________________________________________________________
  mike j. brown                   |  xml/xslt: http://skew.org/xml/
  denver/boulder, colorado, usa   |  resume: http://skew.org/~mike/resume/

XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list


..........................................................
Email: arisuu@xxxxxxxxxxx
Cell: (650) 483-8164
Work: (212) 201-0881
..........................................................Thank

_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* http://join.msn.com/?page=features/virus



XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list



Current Thread