Re: [xsl] Function converting RFC 2822 date to xsd:dateTime

Subject: Re: [xsl] Function converting RFC 2822 date to xsd:dateTime
From: "Michael Kay mike@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 8 Apr 2019 23:03:57 -0000
The main question about error handling here is how liberal you want to be,
e.g. do you want to fail on "Wed 9 Apr 2019" on the grounds that 9 April isn't
a Wednesday? Similarly, do you want to validate leap years etc? As far as I
can see, you're allowing 31 Apr but not 32 Apr, and that seems a bit pointless
to me.

The secondary point is what you do when invalid input is detected. If you want
to do a hard fail then xsl:message terminate="yes" is as good as anything.

Also, I can't work out what you're trying to do with timezones.

Michael Kay
Saxonica

> On 8 Apr 2019, at 23:32, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Thanks Michael, I will have parse-ietf-date() in mind.
>
> As for invalid values, the template invoking the function should
> handle the error with some kind of conditional and skip creating
> content with such values.
>
> How would such error handling look like? Preferably in XSLT 2.0, but
> I'm also interested in how it would compare to XSLT 3.0.
>
> I was reading about the error() function in relation to this, but
> couldn't figure out how exactly it could be used here.
>
> On Tue, Apr 9, 2019 at 12:16 AM Michael Kay mike@xxxxxxxxxxxx
> <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Are you aware that XPath 3.0 has the function parse-ietf-date() for this?
>>
>> https://www.w3.org/TR/xpath-functions-31/#func-parse-ietf-date
>>
>> The Saxon implementation is in Java; I haven't attempted an XPath
implementation. But you might find the spec (and the associated notes) is
useful in itself; and of course the QT3 test suite has test cases.
>>
>> I don't know how date/times in RFC 2822 relate to all the other
miscellaneous RFCs referenced in the spec. Liam Quin did most of the research
for this.
>>
>> What are your requirements for handling invalid values?
>>
>> Michael Kay
>> Saxonica
>>
>>> On 8 Apr 2019, at 22:58, Martynas JuseviD
ius martynas@xxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Hi,
>>>
>>> I have an XSLT 2.0 task where I'm parsing email Date headers defined
>>> in RFC 2822 and converting them to xsd:dateTime.
>>>
>>> Below is a function that converts between the two. I wanted to hear if
>>> there are improvements that could be made?
>>>
>>>   <xsl:function name="aex:rfc2822dateTime-to-dateTime" as="xs:dateTime">
>>>       <xsl:param name="date-time" as="xs:string"/> <!-- Tue, 9 Apr
>>> 2019 00:07:24 +1200 (NZST) -->
>>>       <xsl:variable name="months" select="'Jan', 'Feb', 'Mar',
>>> 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'"
>>> as="xs:string*"/>
>>>       <xsl:analyze-string select="$date-time"
>>>
regex="^(?:(Sun|Mon|Tue|Wed|Thu|Fri|Sat),\s+)?(0[1-9]|[1-2]?[0-9]|3[01])\s+(J
an|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(19[0-9]{{2}}|[2-9][0-9]{{3
}})\s+(2[0-3]|[0-1][0-9]):([0-5][0-9])(?::(60|[0-5][0-9]))?\s+([-\+][0-9]{{2}
}[0-5][0-9]|(?:UT|GMT|(?:E|C|M|P)(?:ST|DT)|[A-IK-Z]))(\s+|\(([^\(\)]+|\\\(|\\
\))*\))*$">
>>>           <xsl:matching-substring>
>>>               <xsl:sequence
>>> select="xs:dateTime(concat(format-number(xs:integer(regex-group(4)),
>>> '0001'), '-', format-number(index-of($months, regex-group(3)), '01'),
>>> '-', format-number(xs:integer(regex-group(2)), '01'), 'T',
>>> format-number(xs:integer(regex-group(5)), '01'), ':',
>>> format-number(xs:integer(regex-group(6)), '01'), ':',
>>> format-number(xs:integer(regex-group(7)), '01'),
>>> substring(regex-group(8), 1, 3), ':', substring(regex-group(8), 4,
>>> 2)))"/>
>>>           </xsl:matching-substring>
>>>           <xsl:non-matching-substring>
>>>               <xsl:message>Invalid RFC 2822 datetime: <xsl:value-of
>>> select="$date-time"/></xsl:message>
>>>           </xsl:non-matching-substring>
>>>       </xsl:analyze-string>
>>>   </xsl:function>
>>>
>>> The regex pattern is taken from
>>> https://stackoverflow.com/questions/9352003/rfc-2822-date-regex
>>>
>>> Martynas
>>> atomgraph.com

Current Thread