[xsl] Using XSLT to process a directory of mixed files

Subject: [xsl] Using XSLT to process a directory of mixed files
From: "dvint@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 8 May 2019 02:39:52 -0000
I'm trying to use a collection() to process all files in a directory.
The directory may have text, pddf, images files in addition to my DITA
file. I've created this little test

	<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet
xmlns:xsl
=
"http://www.w3.org/1999/XSL/Transform";

B  B 

xmlns:xs
=
"http://www.w3.org/2001/XMLSchema";

B  B 
exclude-result-prefixes
=
"xs"

B  B 
version
=
"2.0"
>

B  B 

B  B 

<xsl:variable name
=
"fileSet"
 select
=
"
_collection_
(
'/Users/danvint/pubsrc-other/formatting-sample?select=*.*;recurse=yes'
)
"
/>

B  B 

<xsl:template match
=
"
/
"
>

B  B  B  B 

<xsl:apply-templates select
=
"
$FILESET
"
 mode
=
"collectionprocessing"
/>

B  B  B  B 

B  B 

</xsl:template>
B  B 

B  B 

<xsl:template match
=
"
/
"
 mode
=
"collectionprocessing"
>

B  B  B  B 
'
<xsl:value-of select
=
"
_document-uri_
()
"
/>
'
<xsl:value-of select
=
"
_doc-available_
(
_document-uri_
())
"
/>

B  B 

</xsl:template>

</xsl:stylesheet>

It seems to do what I expect for XML files with results like this

	B B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/glossentry-adapter.dita'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/conaction/reuse-push-ds-c
onfig-tool.dita'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/conaction/reuse-update-se
rver.dita'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/submap-ping_id_examples.d
itamap'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/concept_PDabouttheexplode
dindexformat.dita'
true

But then I have some odd things. It looks like I hit a binary file of
some sort, based upon the output, but I was just trying to get the
file names in this script

	B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/concept_PAWeb_Access_Mana
gement_Agent_Deployment.dita'
trueAAAAAUJ1ZDEAABAAAAAIAAAAEAAAAAIJAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgAAAAIAAAAA
AAAAAAAAAAAAAAAAAAAAAACAAAAAAAAAAIAAAABAAAQAHNwYmxvYgAAAPZicAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA....
lots of lines here similar to above

	mvQrxFWXHxD6hgAEIAABCGwnAXuvGsvOvVhNBYKutU2nnqv2YZ2rz04qQ7Rm8AoBCEAAAhCAAATe
mQBq5p0R0gAEIAABCEAAAhA4J4C4OmfBEQQgAAEIQAACEHhnAoird0ZIAxCAAAQgAAEIQOCcAOLqn
AVHEIAABCAAAQhA4J0JIK7eGSENQAACEIAABCAAgXMCiKtzFhxBAAIQgAAEIACBdyaAuHpnhDQAAQ
hAAAIQgAAEzgkgrs5ZcAQBCEAAAhCAAATemcD/B/Gl121mZIjuAAAAAElFTkSuQmCC
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/gloss_PFadminGlossary.dit
a'
true

I don't know what this chunk of content is. Then there is this odd bit

	B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/submap_2-notoc.ditamap'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/glossentry-openid.dita'
truesub addTaxonomy {
my $inname = $_[0];
my $tempname = $_[0] .B 
".new";
my $taxonomy = $_[1];
B 

open my $in,B 
'<
:encoding(UTF-8)',B 
$innameB  B  B 
or die "Can't read old file: $inname!";
open my $temp, '>
:encoding(UTF-8)', $tempname or die "Can't write new file:
$tempname!";

while( <
$in>
 )
B  B 
{

B  B 
s/(<
head>
)/<
head>
n$taxonomyn/g;

B  B 

print $temp $_;
B  B 
}

 close $temp;

 close $in;

# Replace inout file with temp, remove temp
rename "./" . $tempname, "./" . $inname or die "Can't move file
$tempname to $inname";
}
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/submap-knownissues.ditama
p'
true
B  B  B  B 
'file:/Users/danvint/pubsrc-other/formatting-sample/concept_PAPort_Requiremen
ts.dita'
true

These blobs of odd stuff don't follow the pattern of '' around the
file name and the test I thought that would tell me if it was an XML
file or not. There is no true/false provided either.

What I want to build is a list of files (shell script) that would copy
these other files to a new copy in my processed folder where I will be
writing the results of other work against the DITA files.

Current Thread