Aw: [xsl] Which is less expensive group by or select distinct-values

Subject: Aw: [xsl] Which is less expensive group by or select distinct-values
From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 15 Jul 2016 19:28:15 -0000
Well, using distinct-values is fine, but converting its result to a
string only to tokenize it back to a sequence is nonsense.
--
Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail
gesendet.Am 15.07.2016, 21:19, "dvint@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> schrieb:

  So I have a large document that I need to pull a list of unique
  values
  from a given element. These are taxonomy and term tag values from a
  4,000
  topic collection of DITA content.

  Without knowing how these are implemented, is there something I
  should be
  able to intuit just from the spec? This is some code that I inherited
  and
  it wouldn't have been how I would have attacked the problem:

  <xsl:variable name="TermList">
  <xsl:value-of select="distinct-values(.//term[not(@keyref)])"
  separator=", " />
  </xsl:variable>
  <data type="topicreport" name="WDTermList">
  <xsl:for-each select="tokenize(normalize-space($TermList), ', ')">
  <xsl:sort select="." />
  <xsl:value-of select="."/>
  <xsl:if test="position() != last()">, </xsl:if>
  </xsl:for-each>
  </data>

  If this hadn't existed in the stylesheet already, I would have
  probably
  done something like:

  <xsl:for-each-group select=".//term[not(@keyref)])" group-by=".">
  <xsl:sort select="current-grouping-key()" />
  <xsl:value-of select="current-grouping-key()"/>
  <xsl:if test="position() != last()">, </xsl:if>
  </xsl:for-each-group>

  Currently the process (with a bunch of other checks) runs for a very
  long
  time due to the size of the file I'm processing and the number of the
  checks. Recently after adding a couple of more checks it keeps
  requiring
  the java heap to be increased as it runs out of memory.

  I don't think the above is my major time synch in this process but it
  is
  one class of things that I'm reporting. I think the real processing
  time
  issue is coming from a lot of string analysis/parsing that is
  occurring.

  I'll probably run a physical test in a simple stylesheet with this
  content
  to try and time any significant difference, but I was wondering what
  your
  thoughts would be.

XSL-List info and archiveEasyUnsubscribe (by email)

Current Thread