Subject: Re: multiple special characters in XML From: Tony Graham <tgraham@xxxxxxxxxxxxxxxx> Date: Fri, 3 Sep 1999 00:35:08 -0400 (EST) |
At 31 Aug 1999 18:26 -0700, regan@xxxxxxxxxxx wrote: > We have an application that takes sections of user generated HTML > files, embeds these sections into a large XML file, then later, when > requested, generates an HTML file from the XML and a XSL file (using XT). > Our users have started introducing funny characters into the HTML (OK, what > happens is they use Microsoft Word to introduce the funny characters and > Word does the conversion to HTML, and we end up with "é" or some such > in our HTML - then our XML) If Word thinks it's producing HTML, then it's probably using only the entities defined in the various HTML recommendations. HTML 3.2 borrowed the ISO Latin-1 entity set. HTML 4.0 got more adventurous, and borrowed bits and pieces from ISO entity sets plus declared some that ISO hasn't standardised. In both cases, the entities are defined in the respective recommendations and/or in the files that accompany them, all of which are available from the W3C web site. In both cases, you'll also have to do some massaging to make the entity declarations into XML. The following example from HTMLsymbol.ent from HTML 4.0: <!ENTITY fnof CDATA "ƒ" -- latin small f with hook = function = florin, U+0192 ISOtech --> should become: <!ENTITY fnof "ƒ"><!-- latin small f with hook = function = florin, U+0192 ISOtech --> since CDATA entities aren't in XML, and you can't put comments inside other declarations in XML. I haven't looked, but presumably the XHTML PR has the XML versions of the HTML 4.0 entity declarations. You can reference the entity set from your DTD, or from the internal subset of your documents if you don't have a fully fledged DTD. Regards, Tony Graham ====================================================================== Tony Graham mailto:tgraham@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9632 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: multiple special characters in , Mike Brown | Thread | Re: Javascript functions in tags, Larry Mason |
Re: flex/bison based xpath parser, zun | Date | Passing Java Object from parent to , Honglin Su |
Month |