Subject: Re: [xsl] How to output the characters denoted by the regex \p{P} From: "David Carlisle d.p.carlisle@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> Date: Fri, 28 Feb 2025 14:46:44 -0000 |
First reply got blocked for being too long.. Unicode 16 appears to have 850 characters with category P... <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:output method="text"/> <xsl:variable name="u" select="doc(' https://raw.githubusercontent.com/w3c/xml-entities/refs/heads/gh-pages/unicod e.xml' )"/> <xsl:template name="P"> <xsl:value-of select="'Unicode', $u/*/@unicode,' '"/> <xsl:for-each select="$u//character[unicodedata/@category[starts-with(.,'P')]]"> <xsl:value-of select=" ' ', replace(@id,'U0?','U+'), codepoints-to-string(@dec), description "/> </xsl:for-each> </xsl:template> </xsl:stylesheet> then saxon10 -it:P rc5.xsl produces Unicode 16 U+0021 ! EXCLAMATION MARK U+0022 " QUOTATION MARK U+0023 # NUMBER SIGN U+0025 % PERCENT SIGN U+0026 & AMPERSAND U+0027 ' APOSTROPHE ... On Fri, 28 Feb 2025 at 12:53, Roger L Costello costello@xxxxxxxxx < xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote: > ChatGPT: > > > Please list all the characters denoted by the regular expression \p{P} > > The regular expression \p{P} is a Unicode property that matches any > punctuation character. This includes a wide range of symbols used in > writing to separate sentences and words, indicate pauses, or convey other > structural or expressive functions. Here is a list of some common > punctuation characters that \p{P} would match: > > Period: . > Comma: , > Question mark: ? > Exclamation mark: ! > Colon: : > Semicolon: ; > Quotation marks: " and ' > Apostrophe: ' > Hyphen: - > Dash: b (em dash) and b (en dash) > Parentheses: ( and ) > Brackets: [ and ] > Braces: { and } > Slash: / and \ > Vertical bar: | > Ellipsis: b& > Angle brackets: < and > > > This list is not exhaustive, as Unicode includes many other punctuation > marks used in various languages and scripts. The \p{P} property covers all > such characters, including those specific to certain languages or > typographic conventions. > > > How many characters are in \p{P} > > The exact number of characters matched by \p{P} can vary depending on the > version of the Unicode standard being used, as new punctuation characters > may be added in future updates. As of Unicode 15.0, which was released in > September 2022, there are over 100 punctuation characters. This includes > common punctuation marks like periods, commas, and quotation marks, as well > as less common ones used in various languages and scripts. > > For the most precise and up-to-date count, you would need to refer to the > Unicode Character Database for the specific version you are interested in. > The database provides detailed information about each character, including > its properties and classifications.
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: [xsl] How to output the charact, Roger L Costello cos | Thread | Re: [xsl] How to output the charact, Roger L Costello cos |
Re: [xsl] How to output the charact, Michael Kay michaelk | Date | |
Month |