Subject: Re: [xsl] Aligning/merging two sequences From: Michael Kay <mike@xxxxxxxxxxxx> Date: Thu, 30 Sep 2010 18:08:32 +0100 |
Michael Kay Saxonica
I'm banging my head against a sequence alignment problem. I have a feeling that this is straightforward, but I can't put my finger on what's missing from my attempts.
Suppose I have two inputs like so, where input1//w is always a subset of input2//w:
<input1> <w n="1">I</w> <w n="2">am</w> <w n="3">a</w> <w n="4">sequence</w> </input1>
<input2> <w>I</w> <w>am</w> <w>a</w> <w>longer</w> <w>longer</w> <w>sequence</w> </input2>
I'd like to get output like so:
<output> <w n="1">I</w> <w n="2">am</w> <w n="3">a</w> <w n="skipped">longer</w> <w n="skipped">longer</w> <w n="4">sequence</w> </output>
I.e., for each input1//w, @n should be copied to the nearest following sibling <w> in input2 that matches .; <w>s in input2 that aren't in input1 should be flagged as "skipped".
P.S.: The use case is aligning an imperfect but timestamped transcription of an audio file (input1, machine-generated) with a perfect but not-timestamped one (input2, human-generated).
Thanks much for any help,
Markus
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
[xsl] Aligning/merging two sequence, Markus Flatscher | Thread | Re: [xsl] Aligning/merging two sequ, Martin Honnen |
[xsl] Hyphenation code, Dave Pawson | Date | Re: [xsl] Aligning/merging two sequ, Martin Honnen |
Month |