Subject: [xsl] Grouping and sorting using custom collation class with Saxon From: Larry Hayashi <lhtrees@xxxxxxxxx> Date: Tue, 23 Mar 2010 14:09:35 -0700 |
I have a built a custom collation and there are a number of multigraphs in the language I am working in. Here is a sampling of the sort sequence (minus non-ASCII characters) from the java collation class. ("='-';'=';'*' " + /** -,=,* are used to indicate various types of affixes and clitics. These should be ignored.*/ "< a,A " + "< '''a,'''A " + /** 'a,'A*/ "< aa,Aa " + "< b,B " + "< c,C " + "< d,D " + "< dz,Dz " + "< e,E " + "< '''e,'''E " + /** 'e,'E*/ "< ee,Ee " + "< f,F " + "< g,G " + "< gw,Gw " + "< gy,Gy " + "< h,H " + "< i,I " + "< '''i,'''I " + /** 'i,'I*/ "< ii,Ii " + "< k,K " + "< k'''K''' " + /** k',K'*/ "< kw,Kw " + "< ky,Ky " + "< k'''w,K'''w " + /** k'w,K'w */ "< k'''y,K'''y " + /** k'y,K'y */ "< l,L " + etc. "< '''y,'''Y ") Desired output is something like this: a,A ********** -ana atata 'a,'A ********** 'ap 'atata etc. k,K ********** kaba kopii ks= -ks ksa k',K' ********* k'aba k'ol kw,kW ********* kwduun kwtaxs k'w,K'w ********* k'was k'wiss kwiloolag The source XML structure for each entry looks like this: <dictionary> <entry> <lexical-unit> <form lang="tsi"><text>kaba=</text></form> </lexical-unit> <trait name="morph-type" value="proclitic"/> <sense> <grammatical-info value="prenominal"/> <gloss lang="en"><text>small</text></gloss> </sense> </entry> <!--more entries ....-> </dictionary> Any suggestions as to how to most efficiently group the data according to the parameters of the custom collation? Currently, I manually build a regular expression, putting the largest multigraphs first so that the greedy regex parser chooses the longest matching string. I use this with xsl:analyze-string to add @alphaGroupKey to each entry as shown below. <xsl:attribute name="alphaGroupKey"> <xsl:analyze-string select="lexical-unit/form[@lang='tsi']/text" regex="^[-=]*((aa|Aa)|(a|A)|(kw|Kw)|(ky|Ky)|(k|K)|(a85|a84))" default-collation="http://saxon.sf.net/collation?class=com.lhtrees.xslt. LangXCollation;"> <xsl:matching-substring> <xsl:analyze-string select="." regex="[^-=\*]+$"> <xsl:matching-substring> <xsl:value-of select="."/> </xsl:matching-substring> </xsl:analyze-string> </xsl:matching-substring> </xsl:analyze-string> </xsl:attribute> I can then do the grouping of entries using for-each-group with the attribute alphaGroupKey. But I am wondering if there is a pre-existing way to use the custom collation class to do the grouping so I don't need to build the regex string. It seems like all of the information that is needed is already in that class. Larry
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] Merging structure and con, Jacobus Reyneke | Thread | RE: [xsl] Grouping and sorting usin, Michael Kay |
Re: [xsl] Referencing content in XM, James Fuller | Date | RE: [xsl] Grouping and sorting usin, Michael Kay |
Month |