Subject: RE: [xsl] use XSLT or XQuery in Saxon? From: alan m <highmarkz@xxxxxxxxx> Date: Fri, 7 Jan 2005 11:32:47 -0800 (PST) |
I forgot to mention that memory requirements are limited. Even in one GB system, 300 MB file cannot be processed as the DOM can take as much as 10 times the file size. It would crash the system. The big XML file has no references to small xml file names although small xml files are generated using STX and XSLT tranforms and contains the content from the big file but order or structure maybe different than that of the big xml file. I used STX in the first place to break down the big xml files into much smaller files and then further XSLT processing on these small files. STX doc is at http://stx.sourceforge.net/documents/ and I used Joost implementation. One good reference was given by Raffaele Sena: http://dsd.lbl.gov/nux/ which seems promising to solve the problem of dealing with big XML file. To Michael Kay. Performance is not an issue. I am very new to XQuery. I would like to get my hands dirty with XQuery to learn a new trick of the trade but would like to follow technically correct approach to solve this kind of problem. Lets assume I have solved the big XML file problem and now given a text node, I need to search for this text in the tens of thousands of small xml or html files, generate stats like where it was found, how many times etc. and if not found generate meaningful logs. I can write Java classes if necessary. I would want to avoid converting small files into one large file. I was thinking about treating collection of all small files as an XML database and use Xquery. from Michael Kay: >Another solution, again dependent on XSLT, is to use >grouping. This doesn't >require the small documents to be aggregated into a >single document. If you >take the union of the text nodes in the large >document and the values in the >small documents, and then do grouping, a group of >size 1 indicates a value >that is present in one file and not the other. I would like more clarification about above approach. Also is this XQuery or XSLT? This is in reference to original post: """""""""""""""""""""""""""""""""""""""""" I have extremely large (over 300 MB) XML file and tens of thousands of small xml files generated after applying various XSLT on the one big XML file. I am using Saxon for XSLT and will be using it also for XQuery. Is Xquery or XSLT is better solution for this problem? Query each text node in the big xml file and verify that this content is present in one of the results xml files. Based on this information generate a report that shows which content is present and in which file and in a separate section which content was not found in result xml files and also show this content parent element or other markup to indicate its position in the big xml file. All the small xml files are stored as flat files in various directories on Windows File system although most files are in one directory. The big XML file is fairly complex with multiple levels of nesting elemenents. Any comments or suggestions? Thank you """"""""""""""""""""""""""""""""""""" -Alan I would prefer to use XQuery to learn something new but I am not sure if this would be the right approach. If it is necessary I can develop Java classes but I would like to use XSLT and or XQuery files to achieve above. Alan __________________________________ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] use XSLT or XQuery in Sax, Pieter Reint Siegers | Thread | [xsl] XML/XSLT formatting problem, Chris Hicks |
Re: [xsl] mixed content nodes quest, Jeb Boniakowski | Date | RE: [xsl] mixed content nodes quest, Mark Lundquist |
Month |