Subject: [xsl] Xsplit, multilingual web site, and extracting text from HTML From: skhurshid@xxxxxxxxxx Date: Thu, 8 Mar 2001 12:09:20 -0500 |
Hi, Firt of all I'd like to Thank everyone for their help regarding my multilingual web site question and my questions regarding XSplit. I got the most useful responses from this group. I finally manged to download the XSplit and have been playing around with it. I discovered, to my dissapointment, that it doesn't automatically create the xml files for you though. I took an HTML page and performed the "Split command" but I simply got an XSL file with all the HTML in it - the XML file it "generated" was empty. I read the documentation and it explained that I had to tag the content in the HTML page first. After I did this, XSplit correctly generated the XSL file and the XML file. For example, in an HTML file containing <p>Hello World</p> I had to extract the "Hello World" string from the HTML and replace it with a label prefixed by "psx-" : <p>psx-mytext</p> and then add the "Hello World" String to the generated XML file - which contains <mytext></mytext> I was hoping XSplit would generate the XML for me by simply using the HTML tag names and numbering them wherever it found content. e.g. I was hoping the following html would convert to the following XML <p>Hello World</p> would convert to <p1>Hello World</p1> in the xml file. That way I wouldn't have to tag the data unless I really wanted to. Is what I'd like to do possible in any way with XSplit ? Am I missing something ? Are there tools out there that would extract all displayable text from HTML files replacing them with labels and then put the extracted text in a sperate file with the labels. Basically, I'm looking for a way to automate this since we have 1000's of HTML files. I think using an XML & XSL solution for a multilingual site is the way to go, but I'm having a hard time justifying the initial cost for converting all our HTML files. Since it's an automated process I'm hoping that there's tools out there that could help us. I'd write a tool myself, but I'd have to create an HTML parser which knew where to find all "displayable text" in an HTML page - which seems tough. I searched on the Web for HTML parsers which extract text but didn't find anything similiar to what I mentioned above (that would replace the text with labels etc). Any help would be greatly appreciated. Thanks :-) -Sher XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
RE: [xsl] How to specify a path to , Michael Kay | Thread | RE: [xsl] How to efficiently remove, FINLEY, Mike |
Re: [xsl] xsl for each, Mike Brown | Date | RE: [xsl] [Ann] jd.xslt - a XSLT 1., Michael Kay |
Month |