the source is controlled XML
is not part of the source-set
markup happens between words only

I don't think anyone has mentioned first rearranging the html with
HTMLCleaner and was wondering why? Certainly it for example would
clean up


and other nonsense which should make things easier.

