[xsl] Re: Regular expression to exclude files

Subject: [xsl] Re: Regular expression to exclude files
From: "Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Feb 2023 21:31:34 -0000
Negative lookahead is the term I was looking for and hadnt found in the XSD
and XPath regex discussions.

As far as I can tell, j is not a flag for XPath matches() but in any case
the Saxon collection URI syntax doesnt appear to provide a way to specify
flags as for matches().

Cheers,

E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://www.servicenow.com>
LinkedIn<https://www.linkedin.com/company/servicenow> |
Twitter<https://twitter.com/servicenow> |
YouTube<https://www.youtube.com/user/servicenowinc> |
Facebook<https://www.facebook.com/servicenow>

From: Chris Papademetrious christopher.papademetrious@xxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thursday, February 16, 2023 at 2:50 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx <xsl-list@xxxxxxxxxxxxxxxxxxxxxx>
Subject: [xsl] Re: Regular expression to exclude files
[External Email]

________________________________
Hi Eliot,

Normally I would use a negative lookahead for this, which requires the ;j
flag for match():

matches(., '^(?!foo|bar).*\.dita', ';j')

The documentation at

https://www.saxonica.com/documentation12/#!sourcedocs/collections/collection-
directories<https://www.saxonica.com/documentation12/#!sourcedocs/collections
/collection-directories>

suggests that collection() uses the Java regex engine, so maybe it will work
there too.


  *   Chris


From: Eliot Kimber eliot.kimber@xxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Thursday, February 16, 2023 3:31 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [xsl] Regular expression to exclude files

Im using Saxons collection() extension that lets you specify a regular
expression to select files within a directory. These are XPath regular
expressions so my question is I think a general XPath question.

I want to match all files with a given extension except those that start with
foo or bar.

I think the Perl expression would be something like:

.*?!(foo|bar).+.ditamap


Using this little XQuery:

let $strings as xs:string* := ('bundle-aaaa.ditamap',
'publication_pub-one.ditamap', 'not-pub-or-bundle.ditamap', 'atopic.dita')
return
count($strings[matches(., '.?!(bundle-|publication_).+\.ditamap')])


I get zero results, while this:

let $strings as xs:string* := ('bundle-aaaa.ditamap',
'publication_pub-one.ditamap', 'not-pub-or-bundle.ditamap', 'atopic.dita')
return
count($strings[matches(., '.+\.ditamap')]

Returns the expected 3

Reading the XSD regular expression spec I did not see an obvious way to
specify this kind of negative match but I also find the XSD specification to
be almost impenetrably difficult to decode.

Is there a way to do this with regular expressions alone?

I want a pure regex solution because Im using it in the context of an Oxygen
xpath_eval() call so its not easy (but not impossible) to filter the files
returned by the collection() call (Im using the metadata=yes form since I
want the file names, not the parsed docs in this context).

Thanks,

E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com<https://urldefense.com/v3/__https:/www.servicenow.com__;!!A4F2
R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_WyoitQhFXuc32
3b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVLltedpR$>
LinkedIn<https://urldefense.com/v3/__https:/www.linkedin.com/company/servicen
ow__;!!A4F2R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_Wy
oitQhFXuc323b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVIQRTDJd$> |
Twitter<https://urldefense.com/v3/__https:/twitter.com/servicenow__;!!A4F2R9G
_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_WyoitQhFXuc323b_
3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVK2WY-FH$> |
YouTube<https://urldefense.com/v3/__https:/www.youtube.com/user/servicenowinc
__;!!A4F2R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_Wyoi
tQhFXuc323b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVMQGbS8o$> |
Facebook<https://urldefense.com/v3/__https:/www.facebook.com/servicenow__;!!A
4F2R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_WyoitQhFXu
c323b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVGBaPquY$>
XSL-List info and
archive<https://urldefense.com/v3/__http:/www.mulberrytech.com/xsl/xsl-list__
;!!A4F2R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_xvc9DVR44A8Lz_WyoitQ
hFXuc323b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVPLjAUUF$>
EasyUnsubscribe<https://urldefense.com/v3/__http:/lists.mulberrytech.com/unsu
b/xsl-list/3380743__;!!A4F2R9G_pg!akqpZDBq-Ha0QrxUI-t4RNrfD_4DQJqLN-uAiK-6TX_
xvc9DVR44A8Lz_WyoitQhFXuc323b_3PSTy6cxQENmv8190vT9Lay1b7BwHa1dCTaVMb8uVuz$>
(by email)
XSL-List info and archive<http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe<http://lists.mulberrytech.com/unsub/xsl-list/3453418> (by
email<>)

Current Thread