Re: [xsl] fixing XSL search using values from a variable against multiple XML files

Subject: Re: [xsl] fixing XSL search using values from a variable against multiple XML files
From: "Syd Bauman s.bauman@xxxxxxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 4 Oct 2018 16:50:35 -0000
I realize this is the XSL list, and don't get me wrong, I *love*
XSLT. And while I'm singing XSLT's (and thus XPath's) praises, this
particular task looks like a fun one to attack with Hans-J|rgen's
FOXpath (which is an extension of XPath to handle the file
system).[1]

But that said, this strikes me as a task better handled by your shell
than you XSLT engine, no? In bash, e.g.,
  $ fgrep -f filenames_from_directory_listing.txt dir1/*.xml dir2/*.xml
gives you the answer, as it were, but not in the format you want.

I think to get the results you want (the phrase "[filename] was found
in [filepath]") you have to issue the fgrep command once for each
search term, instead of all-at-once. E.g., I think the following will
do the trick.
   $ for fn in `cat filenames_from_directory_listing.txt` ; do fgrep -l -e $fn
dir1/*.xml dir2/*.xml | perl -pe "s,^.*\$,$fn was found in \$&,;" ; done
These methods presume that none of the names in filenames_from_
directory_ listing contain any whitespace.

And, of course, one thing that makes this nice is by just using
`egrep` instead of `fgrep`, you can search for regular expressions,
e.g., "meeting_schema\.(rn[cg]|xsd?|wxs|odd|dtd|(iso)?sch)". :-)

Notes
-----
[1] See
https://www.balisage.net/Proceedings/vol17/html/Rennau01/BalisageVol17-Rennau
01.html

> Hi this is my first post here - looking for help - apologies if
> there's something I've overlooked!
>
> I have a tokenized variable that contains list of filenames from a
> .txt of a directory listing. I want to look for those filenames in
> a number of xml files in a number of subdirectories. If the
> filename is found, I want to output that "filename" was found in
> "xmlfile".
>
> There are a lot of xml directories and they are not static. Same
> with xml files. The filenames are not tagged in the xml, so I'm
> just looking for their plain text occurence in the file.
>
> Any help would be appreciated.
>
> to make the examples easier - I want to use
>
> $filenames_to_find (tokenized list of filenames from a .txt
> directory listing)
>
> to search against
>
> dir1/*.xml
> dir2/*.xml
> with the output being
>
> filename was found in xmlfilename
>
> I'm using an academic version of Oxygen XML so I think I have Saxon
> through that and I have the standalone Saxon file for running this
> from the command line.
>
> I've gotten this far, but it doesn't work. I know it's broken, but
> I don't know how to fix it!
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
>      xmlns:xs="http://www.w3.org/2001/XMLSchema";
>      xmlns:h="http://www.w3.org/1999/xhtml";
>      exclude-result-prefixes="xs"
>      version="3.0"
>      expand-text="yes"
>      >
>
>      <xsl:variable name="filenames_from_directory_listing"
> as="xs:string"
> select="unparsed-text('filenames_from_directory_listing.txt')"/>
>      <xsl:variable name="filenames_to_find"
> select="tokenize($filenames_from_directory_listing, '\s+')"/>
>
>      <xsl:template match="/">
>          <xsl:for-each select="collection('.?select=*.xml;recurse=yes')"/>
>              <xsl:variable name="xml_filenames" select="."/>
>                  <xsl:for-each select="$filenames_to_find">
>                      <xsl:if test="(contains($t, .))">
> <xsl:message>{document-uri($xml_filenames)} contains {.}</xsl:message>
>                      </xsl:if>
>                  </xsl:for-each>
>      </xsl:template>
> </xsl:stylesheet>
>
> Any suggestions? Clearly I am an XSL novice. Thanks for your patience.

--
 Syd Bauman, NRP
 Senior XML Programmer/Analyst
 Northeastern University Women Writers Project
 s.bauman@xxxxxxxxxxxxxxxx or
 Syd_Bauman@xxxxxxxxxxxxxxxx

Current Thread