[xsl] Converting from <dt><dd> pairs to better XML

Subject: [xsl] Converting from <dt><dd> pairs to better XML
From: Evan Leibovitch <evan@xxxxxxxxx>
Date: Wed, 25 Aug 2010 17:56:34 -0400
Hello,

I've been teaching myself XSLT with the help of Michael's XSLT 2.0
book over the last few months. So far so good, but I think I've hit my
first roadblock.
I need to convert data from web pages into something usable in a
database. The data is in the format of

<dl>
B <dt>AAA</dt>
B <dd>111</dd>
B <dt>BBB</dt>
B <dd>222</dd>
B <dt>BBB</dt>
B <dd>333</dd>
B <dt>BBB</dt>
B <dd>444</dd>
B <dt>CCC</dt>
B <dd>555</dd>
B <dt>CCC</dt>
B <dd>666</dd>
 [...]
</dl>

There are variable numbers of each dt/dd combination, but they are
generally kept together by <dt> value.
Ultimately I'd like to convert this into a pipe-separated-value file
(with implied headers):

111|222,333,444|555,666|....


But for now I can work with either

<record>
<AAA>111</AAA>
<BBB>222</BBB>
<BBB>333</BBB>
<BBB>444</BBB>
<CCC>555</CCC>
<CCC>666</CCC>
</record>

or

<record>
<AAA id="111" />
<BBB id="222" />
<BBB id="333" />
<BBB id="444" />
<CCC id="444" />
<CCC id="444" />
</record>

which I then think I know how to process how I want. Even better would be

<record>
<AAA>111</AAA>
<BBB>222, 333, 444</BBB>
<CCC>555, 666</CCC>
</record>

Which would be easier to do?

Any tips suggestions or pointers are appreciated.
I'm using xslproc under Linux.

Evan Leibovitch
York University
Toronto

Current Thread