Re: [xsl] Sequential numbers in pure xslt, breaking the no-side-effect rule

Subject: Re: [xsl] Sequential numbers in pure xslt, breaking the no-side-effect rule
From: Abel Braaksma <abel.online@xxxxxxxxx>
Date: Fri, 16 Mar 2007 17:36:56 +0100
Wendell Piez wrote:

I can see Andrew has made much the same point, in a different way, in his hack to counter your hack. The "hidden feature" of generate-id() you are using is, in fact, a feature of generate-id() only in Saxon, and not even quite the feature you are taking it to be, so it's really a variant of methods 1 and 2.

Note that generate-id() is a function that has the not-so-hidden feature of returning a unique string for each unique node. This is not Saxon-specific. However, the way I treat the resulting string (which is 'd2', 'd3', 'd4' etc for document nodes in Saxon) is Saxon specific.



In other words, the impossibility of it is in its assumptions. "Create a function that returns '1' on first call, and returns one higher each next call" etc. requires a particular definition of "first call" that has no meaning in XSLT terms, making sense only within an execution context that has nothing to do with the XSLT transform as such.



You are right. It is an important point that I was overlooking when writing my OP. However, if I'd be using these numbers for tracking the execution context with xsl:message, then 'first call' is defined as when the processor decides to do its first call, and the actual order does not matter (but why would I want to follow the execution context in the first place... I have no answer to that)


In the matter of the UUID, if I take RFC4122 less strict, then the whole idea is to have UUIDs that are guaranteed unique and follow some semantics. I cannot fully guarantee uniqueness, because I have to strip a possible long nr from generate-id() (Altova generates random long number, unfortunately). But by taking the least-significant part, I have at least some guarantee, say, 99%...

Based on the input of you, David C, Andrew and others, this is probably the best it can get without jeopardizing the "nature" of both order of execution and 'no side effects'. It quite well generates UUIDs, that do not precisely follow the timestamp method of the RFC (instead of adding +1 on each consecutive call, it adds a random number), but are nevertheless compliant and (almost) guaranteed unique. If you only need one, it will always be globally unique.

The output of the below stylesheet, regardless its context, is (about) as follows (note the differences in the UUIDs):

First random ID:Q4315022
Base timestamp: 133933553207180002
Clock id:       1146
Network node:   09173F13E4C5
UUID Version:   1
Generated UUID: 666B7AE3-D3DB-11DB-1146-09173F13E4C5
Generated UUID: 666B7AE4-D3DB-11DB-1146-09173F13E4C5
Generated UUID: 666B7AE5-D3DB-11DB-1146-09173F13E4C5
Generated UUID: 666B7AE6-D3DB-11DB-1146-09173F13E4C5


Thank you all for getting my mind straight on this!


Cheers,
-- Abel Braaksma
  http://www.nuntia.nl


<xsl:stylesheet xmlns:uuid="http://www.uuid.org"; xmlns:math = "http://exslt.org/math"; xmlns:xs = "http://www.w3.org/2001/XMLSchema"; xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="2.0">

<xsl:template match="/">
<xsl:value-of select="
concat('First random ID:', uuid:get-id()),
concat('Base timestamp: ', uuid:generate-timestamp()),
concat('Clock id: ' ,uuid:generate-clock-id()),
concat('Network node: ' ,uuid:get-network-node()),
concat('UUID Version: ' ,uuid:get-uuid-version()),
concat('Generated UUID: ' ,uuid:get-uuid()),
concat('Generated UUID: ' ,uuid:get-uuid()),
concat('Generated UUID: ' ,uuid:get-uuid()),
concat('Generated UUID: ' ,uuid:get-uuid())
" separator="&#10;" />
</xsl:template>
<!--
Functions in the uuid: namespace are used to calculate a UUID
The method used is a derived timestamp method, which is explained
here: http://www.famkruithof.net/guid-uuid-timebased.html
and here: http://www.ietf.org/rfc/rfc4122.txt
-->
<!--
Returns the UUID
-->
<xsl:function name="uuid:get-uuid" as="xs:string*">
<xsl:variable name="ts"
select="uuid:ts-to-hex(uuid:generate-timestamp())" />
<xsl:value-of separator="-" select="
substring($ts, 8, 8),
substring($ts, 4, 4),
string-join((uuid:get-uuid-version(), substring($ts, 1, 3)), ''),
uuid:generate-clock-id(),
uuid:get-network-node()" />
</xsl:function>
<!--
internal aux. fu
with saxon, this creates a more-unique result with
generate-id then when just using a variable containing a node
-->
<xsl:function name="uuid:_get-node"><xsl:comment /></xsl:function>


<!-- generates some kind of unique id -->
<xsl:function name="uuid:get-id" as="xs:string">
<xsl:sequence select="generate-id(uuid:_get-node())" />
</xsl:function>
<!--
should return the next nr in sequence, but this can't be done
in xslt. Instead, it returns a guaranteed unique number
-->
<xsl:function name="uuid:next-nr" as="xs:integer">
<xsl:variable name="node"><xsl:comment /></xsl:variable>
<xsl:sequence select="
xs:integer(replace(
generate-id($node), '\D', ''))" />
</xsl:function>
<!-- internal fu for returning hex digits only -->
<xsl:function name="uuid:_hex-only" as="xs:string">
<xsl:param name="string" />
<xsl:param name="count" />
<xsl:sequence select="
substring(replace(
$string, '[^0-9a-fA-F]', '')
, 1, $count)" />
</xsl:function>
<!-- may as well be defined as returning the same seq each time -->
<xsl:variable name="_clock" select="uuid:get-id()" />
<xsl:function name="uuid:generate-clock-id" as="xs:string">
<xsl:sequence select="uuid:_hex-only($_clock, 4)" />
</xsl:function>
<!--
returns the network node, this one is 'random', but must
be the same within calls. The least-significant bit must be '1'
when it is not a real MAC address (in this case it is set to '1')
-->
<xsl:function name="uuid:get-network-node" as="xs:string">
<xsl:sequence select="uuid:_hex-only('09-17-3F-13-E4-C5', 12)" />
</xsl:function>
<!-- returns version, for timestamp uuids, this is "1" -->
<xsl:function name="uuid:get-uuid-version" as="xs:string">
<xsl:sequence select="'1'" />
</xsl:function>
<!--
Generates a timestamp of the amount of 100 nanosecond
intervals from 15 October 1582, in UTC time.
-->
<xsl:function name="uuid:generate-timestamp">
<!--
date calculation automatically goes
correct when you add the timezone information, in this
case that is UTC.
-->
<xsl:variable name="duration-from-1582" as="xs:dayTimeDuration" >
<xsl:sequence select="
current-dateTime() -
xs:dateTime('1582-10-15T00:00:00.000Z')" />
</xsl:variable>
<xsl:variable name="random-offset" as="xs:integer">
<xsl:sequence select="uuid:next-nr() mod 10000"></xsl:sequence>
</xsl:variable>


<!-- do the math to get the 100 nano second intervals -->
<xsl:sequence select="
(days-from-duration($duration-from-1582) * 24 * 60 * 60 +
hours-from-duration($duration-from-1582) * 60 * 60 +
minutes-from-duration($duration-from-1582) * 60 +
seconds-from-duration($duration-from-1582)) * 1000
* 10000 + $random-offset" />
</xsl:function>
<!-- simple non-generalized function to convert from timestamp to hex -->
<xsl:function name="uuid:ts-to-hex">
<xsl:param name="dec-val" />
<xsl:value-of separator="" select="
for $i in 1 to 15
return (0 to 9, tokenize('A B C D E F', ' '))
[
$dec-val idiv
xs:integer(math:power(16, 15 - $i))
mod 16 + 1
]" />
</xsl:function>
</xsl:stylesheet>


Current Thread