Re: [xsl] Converting embedded URLs int hot links via XSL

Subject: Re: [xsl] Converting embedded URLs int hot links via XSL
From: David Carlisle <davidc@xxxxxxxxx>
Date: Mon, 19 Apr 2004 10:28:17 +0100
  <xsl:template match="/vs/url">
    <xsl:message>
    <xsl:value-of select="matches (.,
  '(http|https|ftp)://((([a-z_0-9\-]+)+(([:]?)+([a-z_0-9\-]+))?)(@+)?)?(((((([
  0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5]))).(((([0-1])?([0-9])?[0-9])
  |(2[0-4][0-9])|(2[0-5][0-5]))).(((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-
  5][0-5])))\.(((([0-1])?([0-9])?[0-9])|(2[0-4][0-9])|(2[0-5][0-5]))))|((([a-z
  0-9\-])+.)+([a-z]{2}.[a-z]{2}|[a-z]{2,4})))(([:])(([1-9]{1}[0-9]{1,3})|([1-5
  ]{1}[0-9]{2,4})|(6[0-5]{2}[0-3][0-6])))?(/)?$')"/>
  </xsl:message>
  </xsl:template>


I don't think that's really the regexp you want to use, apart frm the
fact that it only allows 0-9 and a-z (not even A-Z). It is only matching
up to the first optional / so basically just the host name part of the 
URL plyus optional port specifier.

The usual convention in xml files (as used for example in SYSTEM
identifiers specified in the xML REC) is to allow arbitrary unicode
characters in the document (so called IRI's) but to assume (hope) that
the system %-encodes the utf8 representation of those characters before
passing the URI to a URI handler.

That said you don't want to use "tokenize() for this you want to use
xsl:analyze-string which should give you a handle on the bits of the
data matching and not-matching your regexp.

I'd use a fairly permissive regexp something like
[a-z]+://[^ &#10;()"']+
ie everything from foo:// to the next space or bracket or quote character.

David

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Current Thread