Re: Parsing errors on unknown entities (unicode characters)

Subject: Re: Parsing errors on unknown entities (unicode characters)
From: Sebastian Rahtz <sebastian.rahtz@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 25 Nov 1999 09:37:10 +0000 (GMT)
Tangi Vass writes:

 > I've got an XML file, built from the result of a request on a
 > search engine (via a private API), that may contain weird Unicode
 >entities (such as &laqno;). Of course the parser crashes because my
 > DTD only contains the most usual Unicode entities.

how can there be `usual' Unicode entities? The fixed things are the
number, and the long (multi-word) name. whatever shortform names
anyone adds cannot be standard. Oh, you can use the old SGMl ISO
names, but they only cover some fairly basic stuff

 > Has anyone a smarter idea than building a DTD with all Unicodes?

as someone said, http://www.tug.org/applications/jadetex/unicode.xml
contains everything that I have ever discovered, from which you can
extract what you want. the real claim to fame of this monster is that
it contains all the MathML characters (all recent changes to this file 
come from David Carlisle, using it for MathML)

Sebastian


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread