Subject: Aw: [xsl] Which is less expensive group by or select distinct-values From: "Martin Honnen martin.honnen@xxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 15 Jul 2016 19:28:15 -0000 |
Well, using distinct-values is fine, but converting its result to a string only to tokenize it back to a sequence is nonsense. -- Diese Nachricht wurde von meinem Android Mobiltelefon mit GMX Mail gesendet.Am 15.07.2016, 21:19, "dvint@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> schrieb: So I have a large document that I need to pull a list of unique values from a given element. These are taxonomy and term tag values from a 4,000 topic collection of DITA content. Without knowing how these are implemented, is there something I should be able to intuit just from the spec? This is some code that I inherited and it wouldn't have been how I would have attacked the problem: <xsl:variable name="TermList"> <xsl:value-of select="distinct-values(.//term[not(@keyref)])" separator=", " /> </xsl:variable> <data type="topicreport" name="WDTermList"> <xsl:for-each select="tokenize(normalize-space($TermList), ', ')"> <xsl:sort select="." /> <xsl:value-of select="."/> <xsl:if test="position() != last()">, </xsl:if> </xsl:for-each> </data> If this hadn't existed in the stylesheet already, I would have probably done something like: <xsl:for-each-group select=".//term[not(@keyref)])" group-by="."> <xsl:sort select="current-grouping-key()" /> <xsl:value-of select="current-grouping-key()"/> <xsl:if test="position() != last()">, </xsl:if> </xsl:for-each-group> Currently the process (with a bunch of other checks) runs for a very long time due to the size of the file I'm processing and the number of the checks. Recently after adding a couple of more checks it keeps requiring the java heap to be increased as it runs out of memory. I don't think the above is my major time synch in this process but it is one class of things that I'm reporting. I think the real processing time issue is coming from a lot of string analysis/parsing that is occurring. I'll probably run a physical test in a simple stylesheet with this content to try and time any significant difference, but I was wondering what your thoughts would be. XSL-List info and archiveEasyUnsubscribe (by email)
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Which is less expensive g, Dan Vint dvint@xxxxx | Thread | Re: Aw: [xsl] Which is less expensi, dvint dvint@xxxxxxxx |
[xsl] Which is less expensive group, dvint@xxxxxxxxx | Date | Re: Aw: [xsl] Which is less expensi, dvint dvint@xxxxxxxx |
Month |