|
Subject: Re: [xsl] Text based stage play scripts to XML From: Liam R E Quin <liam@xxxxxx> Date: Mon, 24 Jan 2011 13:05:34 -0500 |
On Mon, 2011-01-24 at 14:37 +0200, Jacobus Reyneke wrote:
> Take any input file and output a similar output file. While doing so
> however, look for text located between identifiable patterns. Surround
> this text with tags.
>
> If input file contains:
> a b c d e f g h i j
>
> Pattern description:
> any string that follow after the string "c d" and is followed by the
> string "g h"
>
> If pattern found:
> Surround with <found-you>
>
> Result:
> a b c d<found-you> e f </found-you>g h i j
Others have mentioned some XSLT approaches, and that's generally a good
way to go. Of course, if you don't mind learning a programming
language, Perl is the king (or at least a princess) of transformations
where you don't yet have XML, but want to add markup. Use XML-aware
tools as early in the process as possible, though!
while (<>) { # for each line of input
s{c d\K e f (?=g h)}{ # replace with the value of...:
element(
"found-you", # element name
$&, # what was matched (" e f " here)
# optional attributes:
"rule" => "31",
"before" => "c d"
)
}e; # "e" flag means the replacement is an expression, not text
print; # print the line whether or not it was changed
}
Given the input a b c d e f g h
this produces
a b c d<found-you rule="31" before="c d"> e f </found-you>g h
To process a whole file at once, you can use the rather odd Perl idiom,
my $text {
local $/; # slurp mode
$text = <>;
};
# and then do the substitution:
$text =~ s{as before}{as before}gme;
At that point you might (or might not) want to use \s+ rather than a
space between the tokens in the input, to match one or more whitespace
characters. Start by normalizing the text though -- look for lines
ending with spaces, for example, and trim them.
Adding an attribute showing which pattern put a tag in place can
considerably aid debugging the process. It also helps to be consistent
in your markup, e.g. *always* use double quotes for attribute values.
A simple definition of the "element" function follows - I have tried to
avoid "clever" Perl, and I have left a couple of items in place that
help debugging. For production it would probably also handle quoting
special characters (& < > in content) as well as (already done) " in
attribute values.
It's relatively straight forward using this approach to get files that
can be processed further with XML tools, although even then I sometimes
use Perl, e.g. because of its more powerful regular expressions, or
because I can more easily check for filenames...
You could have a separate file of patterns that are loaded and matched
against. On Linux, run the command, perldoc perlre, for some
documentation.
Liam
#! /usr/bin/perl -w
use warnings;
use strict;
sub element($$;%)
{
my ($name, $content, %attributes) = @_;
sub quotedattvalue($$)
{
my ($name, $value) = @_;
# print STDERR "q $name, $value\n";
$value =~ s/"/\"/g; # so we can safely use quotes
return '"' . $value . '"';
}
# make a list of att="value" pairs, each with a leading space:
# (could use join and map to do this too more succinctly,
# see perldoc -f map)
my $atts = "";
if (%attributes) {
foreach (keys %attributes) {
$atts .= " " .
$_ . '=' . quotedattvalue($_, $attributes{$_})
;
}
}
return "<${name}${atts}>${content}</${name}>";
}
my $text;
{
local $/;
$text = <>;
};
$text =~ s{c d\K e f (?=g h)}{
element(
"found-you",
$&,
"rule" => "31",
"before" => "c d"
)
}gme;
print $text;
# end
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org
| Current Thread |
|---|
|
| <- Previous | Index | Next -> |
|---|---|---|
| Re: [xsl] Text based stage play scr, Liam R E Quin | Thread | Re: [xsl] Text based stage play scr, Jacobus Reyneke |
| Re: [xsl] round-half-to-even proble, Michael Kay | Date | Re: [xsl] Text based stage play scr, Liam R E Quin |
| Month |