RE: Formatting_the_result

Subject: RE: Formatting_the_result
From: "G. Ken Holman" <gkholman@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 05 Oct 1999 12:20:02 -0700
At 99/10/05 16:49 +0200, Reyes Garcia Rosado wrote:
Oh... thank you, now I understand a little bit better this problem. When I
have asked I didn't Know anything about the problem.

Consider the following.


Here is an SGML instance and a report of what the parser sees:

T:\ftemp>type test1.sgm
<!DOCTYPE doc [
<!ELEMENT doc - O ( para+ )>
<!ELEMENT para - O ( #PCDATA  | figref )*>
<!ELEMENT figref - O EMPTY>
]><doc>
<para>This is a <figref> test abcd<figref>efgh</para>
</doc>
T:\ftemp>nsgmls test1.sgm >test1.txt

T:\ftemp>type test1.txt
(DOC
(PARA
-This is a
(FIGREF
)FIGREF
- test abcd
(FIGREF
)FIGREF
-efgh
)PARA
)DOC
C

T:\ftemp>


Now, let's introduce a new line before the every TAGC (in my personal work I only do this to the TAGC of start tags):


T:\ftemp>type test2.sgm
<!DOCTYPE doc [
<!ELEMENT doc - O ( para+ )>
<!ELEMENT para - O ( #PCDATA  | figref )*>
<!ELEMENT figref - O EMPTY>
]><doc
>
<para
>This is a <figref
> test abcd<figref
>efgh</para
>
</doc
>
T:\ftemp>nsgmls test2.sgm >test2.txt

T:\ftemp>type test2.txt
(DOC
(PARA
-This is a
(FIGREF
)FIGREF
- test abcd
(FIGREF
)FIGREF
-efgh
)PARA
)DOC
C

T:\ftemp>diff test1.txt test2.txt

T:\ftemp>


Note there is *no* difference to the parser.


Now let's introduce new lines *after* every start tag:

T:\ftemp>type test3.sgm
<!DOCTYPE doc [
<!ELEMENT doc - O ( para+ )>
<!ELEMENT para - O ( #PCDATA  | figref )*>
<!ELEMENT figref - O EMPTY>
]><doc>
<para>This is a
<figref> test abcd
<figref>efgh</para>
</doc>
T:\ftemp>nsgmls test3.sgm >test3.txt

T:\ftemp>type test3.txt
(DOC
(PARA
-This is a \n
(FIGREF
)FIGREF
- test abcd\n
(FIGREF
)FIGREF
-efgh
)PARA
)DOC
C

T:\ftemp>


The parser sees *different* data than in the first two cases. Note the introduced new line characters. While this may not be significant to every processing engine (say a browser of HTML), how would a transformation engine know?


Lastly, let's take the same data as the third example, but change the DTD to use inclusions instead of directly using the component in the content model.

T:\ftemp>type test4.sgm
<!DOCTYPE doc [
<!ELEMENT doc - O ( para+ )>
<!ELEMENT para - O ( #PCDATA ) +(figref)>
<!ELEMENT figref - O EMPTY>
]><doc>
<para>This is a
<figref> test abcd
<figref>efgh</para>
</doc>
T:\ftemp>nsgmls test4.sgm >test4.txt

T:\ftemp>type test4.txt
(DOC
(PARA
-This is a
(FIGREF
)FIGREF
-\n test abcd
(FIGREF
)FIGREF
-\nefgh
)PARA
)DOC
C

T:\ftemp>

And we see that the even arbitrarily adding a new line after every start tag can have different results based on the DTD.

The transformation engine doesn't know the DTD of the output instance, so it doesn't know what is an inclusion and what isn't (even if it would help).

What has been done for a long time is there for a reason.

I hope this helps.

...................... Ken

p.s. I'll get off my soap box now :{)}

--
G. Ken Holman                    mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.             http://www.CraneSoftwrights.com/d/
Box 266, Kars, Ontario CANADA K0A-2E0   +1(613)489-0999   (Fax:-0995)
Website:  XSL/XML/DSSSL/SGML services, training, libraries, products.
Practical Transformation Using XSLT and XPath      ISBN 1-894049-01-2
Next instructor-led training:  1999-11-08, 1999-11-09, 1999-12-05/06,
                             1999-12-07, 2000-02-27/28, 2000-05-11/12


DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist



Current Thread