Subject: [xsl] Parsing plain text - xml application specifying parser From: Noah Scales <noahjscales@xxxxxxxxx> Date: Sun, 11 Sep 2005 00:23:02 -0700 (PDT) |
Hi. Is it feasible to use to specify a parser that, when translated into XSLT 2.0, turns plain text into xml according to the specification? Is something like this expected for XSL 3.0, skipping the use of a separate XML application? My searches on google and through this list's archives didn't provide me any information on this approach. My next step is to hack at it myself, but my knowledge of how parsers work is minimal. If XSLT can mimic a parser, though, this might work as a two-step process: parser_specification.xml + parser_application.xsl -> parser.xsl parser.xsl + plain.txt -> fully_parsed.xml Maybe the plain.txt is accessed through xpath's document() function applied to a parameter passed to the parser.xsl file when its processed. This idea was sparked for me by: - reading an online article (that I've lost) that discusses how an xml file preserves the parse tree using its tags - Michael Kay's writing (in his XSLT 2.0 Programmer's Reference) about analyzing plain text for hidden structure using XSLT 2.0 regex. It seems like a natural fit to me that XSLT could do this directly, turning plain text into XML without difficulty. I wouldn't be surprised if this approach (or something much better) is slated for a later XSLT release. In case it helps explain what I mean, below is an artificial example parser_specification.xml file that transforms an input plain.txt file into a fully_parsed.xml file. I'm just a student, not a programming expert. If the example is raw or just plain awful, sorry. Anyway, I'll appreciate any information that anyone can provide. -Noah ------------------------------------------------------- -----------parser_specification.xml-------------------- <?xml version="1.0"?> <specification ignore-white-space="yes"> <first-rule name="entities"> <either_or><rule name="identifier_listing" /><or/><rule name="descriptor_listing" /></either_or> </first-rule> <rule name="identifier_listing">Each <rule name="entity" /> is identified by <optional>the combination of</optional><rule name="descriptors" /><optional> and <rule name="descriptors" /></optional> </rule> <rule name="descriptor_listing">About each <rule name="entity" />, we can remember <rule name="descriptors" count="1+" /><optional> and <rule name="descriptors" /></optional> </rule> <rule name="descriptors" tag-output="no"> its <rule name="descriptor" count="1" tag-output="yes"/><either_or>,<or/>.</either_or> </rule> <rule name="descriptor"> <either_or><rule name="entity" or-preference="1" /><or/><rule name="attribute" or-preference="2" /></either_or> </rule> <rule name="entity"><either_or>cow<or/>herd<or/>farm<or/>herd-owner<or/>farm-owner</either_or> </rule> <rule name="attribute"><regex value="\w[[:alnum:]*\w" /> </rule> </specification> ------------------------------------------------------- ----------------------plain.txt------------------------ About each cow, we can remember its name, its breed, its weight, and its herd. Each cow is identified by the combination of its name, and its herd. About each herd, we can remember its name, its herd-owner, and its farm. Each herd is identified by the combination of its name, and its farm. About each farm we can remember its farm-owner, its name, and... . . . ------------------------------------------------------- ------------------fully_parsed.xml--------------------- <?xml version="1.0"?> <entities> <descriptor_listing>About each <entity>cow</entity>, we can remember its <attribute>name</attribute>, its <attribute>breed</attribute>, its <attribute>weight</attribute>, and its <entity>herd</entity>. </descriptor_listing> <identifier_listing>Each <entity>cow</entity> is identified by the combination of its <attribute>name</attribute>, and its <entity>herd<entity>.</identifier_listing> <descriptor_listing>About each <entity>herd</entity>, we can remember its <attribute>name</attribute>, its <entity>herd-owner</entity>, and its <entity>farm</entity>.</descriptor_listing> <identifier_listing>Each <entity>herd</entity> is identified by the combination of its <attribute>name</attribute>, and its <entity>farm </entity>.</identifier_listing> <descriptor_listing>About each <entity>farm</entity> we can remember its <entity>farm-owner</entity>, its <attribute>name</attribute>, and . . . </entities> ------------------------------------------------------- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] passing values, Colin Paul Adams | Thread | RE: [xsl] Parsing plain text - xml , Michael Kay |
[xsl] passing values, Wassim Mansour | Date | Re: [xsl] passing values, Colin Paul Adams |
Month |