Subject: [xsl] Text based stage play scripts to XML From: Jacobus Reyneke <jacobusreyneke@xxxxxxxxx> Date: Mon, 24 Jan 2011 12:48:37 +0200 |
Good day, Any help, samples or links to similar problems and solutions will be dearly appreciated. I need to get this done today, and I'm not winning. I need to get stage play scripts into XML, and there are thousands of these pages. I am using Oxygen with Saxon, so XPath and XSLT 2.0 is at hand. The XML elements I need to identify are: <act> <scene> <actor> <dialogue-line> <stage-direction-scene-set> <stage-direction-action> <stage-directiion-dialog> <stage-direction-entry> <stage-direction-exit> <line-share> Input is simple TXT with tabs here and there that may be helpful. The documents are of consistent structure and format, so I guess it can be done. I am currently experimenting with replace() and tokenize(), but I must admit I'm lost. There are some structures one can depend on, such that stage directions are always preceded by tabs, hashes or shown in brackets depending on the type of scene direction. Actor names are also always in all caps and followed by a colon. What I need to do is to identify patterns and surround matching text with tags, e.g. if I find a text that is preceded by tabs and other text and followed by a line break, then I know I'm dealing with a stage direction action. Here is the sample input, and below the output I am trying to achieve: --- Input snippet --- o;? Act 1, Scene 1 On the deck of the ship. The captain and his first mate enters. CAPTAIN: Was that the sound of cannon fire? FIRST MATE: Yes captain, it came from the east. I saw a ship there about an hour ago, but I did not want to wake you. CAPTAIN: Always wake me if a ship is in sight. These are dangerous times. Another thunder is heard. FIRST MATE: (First Mate points) There it is again captain, it sounded louder this time. Ship's doctor appears from the cabin DOCTOR: Captain, come quick! Our water has been poisoned. CAPTAIN:###This cannot be! They go below deck. ---- Output ----- <act> <scene> <stage-direction-scene-set> On the deck of the ship. The captain and his first mate enters. </stage-direction-scene-set> <actor name="CAPTAIN"> <dialogue-line> Was that the sound of cannon fire? <dialogue-line> </actor> <actor name="FIRST MATE"> <dialogue-line> Yes captain, it came from the east. </dialogue-line> <dialogue-line> I saw a ship there about an hour ago, but I did not want to wake you </dialogue-line> </actor> <dialogue-line> Always wake me if a ship is in sight. </dialogue-line> <dialogue-line> These are dangerous times> </dialogue-line> <stage-direction-action> Another thunder is heard </stage-direction-action> </actor> <actor name="FIRST MATE"> <stage-direction-dialogue> First Mate points <stage-direction-dialogue> <dialogue-line> There it is again captain, it sounded louder this time. </dialogue-line> <actor> <stage-direction-entry> Ship's doctor appears from the cabin </stage-direction-entry> <actor name="DOCTOR"> Captain, come quick! Our water has been poisoned. </actor> <actor name="CAPTAIN"> <line-share> This cannot be! </line-share> <stage-direction-exit> They go below deck. </stage-direction-exit> <actor> </scene> </act> Any help will be greatly appreciated. Kind regards, Jacobus
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] XPath related query, Michael Kay | Thread | Re: [xsl] Text based stage play scr, Andrew Welch |
Re: [xsl] XPath related query, Michael Kay | Date | Re: [xsl] Text based stage play scr, Andrew Welch |
Month |