[xsl] Text based stage play scripts to XML

Subject: [xsl] Text based stage play scripts to XML
From: Jacobus Reyneke <jacobusreyneke@xxxxxxxxx>
Date: Mon, 24 Jan 2011 12:48:37 +0200
Good day,

Any help, samples or links to similar problems and solutions will be
dearly appreciated. I need to get this done today, and I'm not
winning.

I need to get stage play scripts into XML, and there are thousands of
these pages. I am using Oxygen with Saxon, so XPath and XSLT 2.0 is at
hand.

The XML elements I need to identify are:
<act>
<scene>
<actor>
<dialogue-line>
<stage-direction-scene-set>
<stage-direction-action>
<stage-directiion-dialog>
<stage-direction-entry>
<stage-direction-exit>
<line-share>

Input is simple TXT with tabs here and there that may be helpful. The
documents are of consistent structure and format, so I guess it can be
done. I am currently experimenting with replace() and tokenize(), but
I must admit I'm lost.

There are some structures one can depend on, such that stage
directions are always preceded by tabs, hashes or shown in brackets
depending on the type of scene direction. Actor names are also always
in all caps and followed by a colon.

What I need to do is to identify patterns and surround matching text
with tags, e.g. if I find a text that is preceded by tabs and other
text and followed by a line break, then I know I'm dealing with a
stage direction action.

Here is the sample input, and below the output I am trying to achieve:

--- Input snippet ---
o;?
		Act 1, Scene 1

On the deck of the ship. The captain and his first mate enters.

CAPTAIN:  Was that the sound of cannon fire?
FIRST MATE:  Yes captain, it came from the east.
                       I saw a ship there about an hour ago, but I did
not want to wake you.
CAPTAIN:  Always wake me if a ship is in sight.
                 These are dangerous times.
 Another thunder is heard.
FIRST MATE:  (First Mate points) There it is again captain, it sounded
louder this time.

Ship's doctor appears from the cabin

DOCTOR: Captain, come quick! Our water has been poisoned.
CAPTAIN:###This cannot be!
They go below deck.

---- Output -----

<act>
   <scene>
      <stage-direction-scene-set>
         On the deck of the ship. The captain and his first mate enters.
      </stage-direction-scene-set>
      <actor name="CAPTAIN">
         <dialogue-line>
             Was that the sound of cannon fire?
         <dialogue-line>
      </actor>
      <actor name="FIRST MATE">
         <dialogue-line>
             Yes captain, it came from the east.
         </dialogue-line>
         <dialogue-line>
             I saw a ship there about an hour ago, but I did not want
to wake you
         </dialogue-line>
       </actor>
          <dialogue-line>
              Always wake me if a ship is in sight.
          </dialogue-line>
          <dialogue-line>
              These are dangerous times>
          </dialogue-line>
          <stage-direction-action>
              Another thunder is heard
          </stage-direction-action>
        </actor>
        <actor name="FIRST MATE">
           <stage-direction-dialogue>
               First Mate points
           <stage-direction-dialogue>
           <dialogue-line>
               There it is again captain, it sounded louder this time.
           </dialogue-line>
         <actor>
         <stage-direction-entry>
               Ship's doctor appears from the cabin
         </stage-direction-entry>
         <actor name="DOCTOR">
               Captain, come quick! Our water has been poisoned.
         </actor>
         <actor name="CAPTAIN">
               <line-share>
                  This cannot be!
               </line-share>
               <stage-direction-exit>
                   They go below deck.
               </stage-direction-exit>
          <actor>
     </scene>
</act>

Any help will be greatly appreciated.

Kind regards,
Jacobus

Current Thread