Re: [xsl] Representing EBCDIC code 37 in xslt

Subject: Re: [xsl] Representing EBCDIC code 37 in xslt
From: Greg Hunt <greg@xxxxxxxxxxxxxx>
Date: Wed, 1 Jan 2014 08:38:56 +1100
The FTP may be treating the file as some kind of ASCII. UTF-8 is a
superset of 7 bit ascii, so most of the time the conversion works
(when it really isn't).

It may be better to generate the file as EBCDIC on the unix box
(which, yes, will look like gibberish) and then transfer it as binary.
 That way you can confirm that the bit patterns that you want are
actually there before doing the transfer.  Fixing the FTP program's
text conversion may not be easy.

On Wed, Jan 1, 2014 at 12:50 AM, a kusa <akusa8@xxxxxxxxx> wrote:
> Thank you Greg.
>
> The source file has utf-8 characters. But the problem seems to be
> happening when FTPing the converted text file to mainframe. Mainframe
> is not retaining it. So, I will have to check on the mainframe side to
> see if there is any setting that can be manipulated.
>
> Thank you all for your time and input.
>
>
>
> On Mon, Dec 30, 2013 at 3:59 PM, Greg Hunt <greg@xxxxxxxxxxxxxx> wrote:
>> The characters do not exist independently of the encoding of the
>> characters that are around them.    What you are trying to do, it
>> appears, is to construct a file containing a mix of ascii/utf-8
>> characters and ebcdic characters, and then pass that file through a
>> characterset conversion that has no idea that the "ebcdic" characters
>> are in there.  What it will do is either corrupt the characters in
>> some interesting way or replace them with some kind of substitution
>> character - control-z, a question mark, a full stop, or unicode code
>> point fffd depending on the source and target encodings (in reality,
>> in a file, there are only bit patterns, not characters, there is
>> nothing to mark one sequence of bits as one character set encoding or
>> another) .
>>
>> The file has to be all the same character set of it is to pass through
>> an Ascii/ebcdic conversion undamaged.  If you make it ebcdic on your
>> unix platform it needs to look like gibberish because the bit patterns
>> for ebcdic are not the same as the bit patterns for either utf-8, 8859
>> or 1252 and the unix box will not understand them.  If the characters
>> can be represented as utf-8, 8859-1 or 1252 (the R symbol is present
>> in all of them so it ought to be ok) and you already have transcoding
>> happening to ebcdic then you either have to use the some transcoding
>> to convert the characters (provided that your transcoding is actually
>> working on 8-bit 8859 or 1252 and not some ancient 7 bit idea of
>> ascii) or you need to make a file with the right ebcdic bit patterns
>> in it and pass it around as binary.
>>
>> On Tue, Dec 31, 2013 at 7:59 AM, a kusa <akusa8@xxxxxxxxx> wrote:
>>>
>>> Thanks Ivan. That is where this question started, what output encoding
>>> can I use to preserve these EBCDIC characters?
>>>
>>> On Mon, Dec 30, 2013 at 2:12 PM, Ivan Shmakov <oneingray@xxxxxxxxx>
wrote:
>>> >>>>>> a kusa <akusa8@xxxxxxxxx> writes:
>>> >
>>> > []
>>> >
>>> >  > Well, I have <xsl:output encoding> set to utf-8 right now.  If I set
>>> >  > it to EBCDIC, then the rest of the content in the XML converts to
>>> >  > gibberish.
>>> >
>>> >         Which is expected, if you view an EBCDIC-encoded XML file with
>>> >         an application that assumes ASCII-based encoding.  Try to
upload
>>> >         the resulting file using FTP /binary/ mode to the mainframe and
>>> >         check if the file is still unreadable /there./
>>> >
>>> >         (Alternatively, or perhaps complementarily, use an
>>> >         EBCDIC-capable application to view the resulting file locally.)
>>> >
>>> >  > Thats what I meant.
>>> >
>>> >  > I only need the special characters -esp. Latin-1 characters like the
>>> >  > plusminus sign, to convert to the right EBCDIC code.
>>> >
>>> >  > I have a java program that FTPs the file; I believe the default is
>>> >  > ASCII.
>>> >
>>> >         There /may/ be a problem if /either/ this program or the FTP
>>> >         server assume that the input is ASCII, because the characters
>>> >         such as PLUS-MINUS SIGN are /not/ representable in ASCII.
>>> >
>>> >         One solution is to configure either the FTP client or FTP
server
>>> >         to /correctly/ convert UTF-8 to EBCDIC.  The other is to
>>> >         configure the XSLT implementation (with <xsl:output />) to
>>> >         output EBCDIC, and send the result to the target host /without/
>>> >         any encoding conversion (i. e., using FTP binary mode.)
>>> >
>>> > --
>>> > FSF associate member #7257

Current Thread