## Fwd: Re: [niso-sts] Re: Using fractions in text

 Subject: Fwd: Re: [niso-sts] Re: Using fractions in text From: "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx" Date: Tue, 24 Apr 2018 23:03:48 -0000
posting again without attachment...

-------- Forwarded Message --------
Subject: Re: [niso-sts] Re: Using fractions in text
Date: Wed, 25 Apr 2018 00:57:36 +0200
From: Imsieke, Gerrit, le-tex <gerrit.imsieke@xxxxxxxxx>
Organization: le-tex publishing services GmbH
To: niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx

Hi Dan,

The practice at DIN, in their legacy XML format, has been to _always_ mark up these fractions as ISO 12083 formulas, like this one:

<formula alphabet="latin">7,85 <fraction align="center" shape="case" style="single">
<num><roman>kg</roman></num>
<den><roman>dm</roman><sup arrange="compact" location="post">3</sup></den>
</fraction></formula>

When rendered, it approximately looks like 7,85 kg/dmB3, but with the kg slightly above the slash and the dm slightly below, as you described it.

When converting to NISO STS / MathML, it becomes

<inline-formula><mml:math>
<mml:mn>7,85</mml:mn>
<mml:mfrac bevelled="true" linethickness="1">
<mml:mtext>kg</mml:mtext>
<mml:mrow>
<mml:msup>
<mml:mtext>dm</mml:mtext>
<mml:mn>3</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:math></inline-formula>

So ISO 12083's shape="case" corresponds to MathML mfrac's bevelled="true".

I'm attaching a PNG that is a screenshot of a Firefox with a built-in MathML rendering capability, hoping that this attachment will pass through the mailing list server.

In the screenshot you see that the MathML is essentially the same, but without the mml prefix, only with xmlns="http://www.w3.org/1998/Math/MathML";. It looks like your sup/sub suggestion, and whether you use MathML in an HTML rendering or this built-in HTML sup/sub markup depends on the typical (or the weakest) rendering device's capabilities.

What I described above is the historic practice at DIN. There is currently an encoding shift under way like the one that Bruce is suggesting: They are not as strict any more in encoding their text fractions as formulas. However, it still happens, in particular if you're in math context anyway.

So since approx. two years ago, if a text fraction was included in the Word source as a OLE object or OOMML equation, we'd convert it to ISO 12083 <fraction shape="case">. More recently (if I remember correctly -- it might be the case that textifying formulas extends to superscripts, subscripts, Greek letters, and italics, but not to bevelled fractions), we are flattening them to text if the formula in question does not contain roots, stacked fractions and the like.

I personally would prefer a non-flattened MathML representation in the NISO STS XML content and HTML rendering that uses built-in tags to the largest extent. This is mostly because, as Bruce mentioned, in many contexts there is no MathML rendering available, and even if MathJax were available, it would be incredibly slow to render myriads of small formulas with Javascript.

On the other hand, it is sometimes demanding to obtain this kind of homogeneous MathML markup in the first place. If you cannot make sure that authors will use proper formulas (MathType, OOMML, AMSTeX, MathML, b&) consistently even for small, text-mode-compatible expressions, you will have a hard time enforcing this or heuristically converting these expressions to math markup proper. This is another reason why people are recently switching to flat / built-in vocabulary representations even in their source XML formats.

TeX users in particular instinctively put all symbols and often also terms such as 3/32 into math mode. A naC/ve math renderer will then include even these simple expressions in the HTML as MathML or often, and worse, as images with poorly adapted DPIs and baseline shifts.

But as I said, mark it up in the source data as detailed as you like (MathML is ok but it will only get you that far if there are units in the fractions, see below), and then flatten it out when rendering. This rendering may well be like <sup>3</sup>/<sub>32</sub> rather than the completely flat 3/32. But we wouldn't encode something like <sub>3</sub>/<sup>32</sup> in STS, only in HTML, and only if that HTML were not to satisfy accessibility demands. If the sup/sub nuance matters and if it was already present in the source, we'd use a bevelled MathML in STS. In addition, we'd extend the MathML markup to all other terms that may constitute a single inline formula (we would keep the 2 and the 3/8 in '2 3/8' in a single MathML expression).

A major recent development among standardization bodies is semantic annotation of physical quantities and units, like for '7,85 kg/dmB3' above. This will allow advanced search applications where you can search for an alloy with a certain density, expansion coefficient, etc. It should be noted that presentational MathML by itself is not suited to mark up this information unambiguously. We have to think about ways to enhance all of them: MathML-encoded formulas, STS-vocabulary-encoded formulas, and maybe also HTML-encoded formulas, with information about the quantities, units, chemical elments, etc. that they contain. A candidate for this semantic enhancement is RDFa, referring to established vocabularies that describe these things, or some JATS-family-specific vocabulary. These annotating attributes need to be stripped away when exporting the MathML, and of course the MathML part of the DTD (as well as the <sub>, <sup>, <italic> part of the DTD) have to be augmented so that they allow these semantic attributes.

Although we neither did constitute nor agreed on a scope for the next iteration of STS, unit/quantity markup is a thing that we will probably discuss in the STS standing committee. There is clear demand for this kind of semantic markup in the market, and I donbt think that semantic markup is an added value that each standardization body should encode in their private STS extension vocabularies. Ideally, ISO standards will be authored in a semantically annotated way in the first place.

A final remark: I consciously use the term 'formula' instead of 'equation'. You seem to assume that only equations may enjoy the luxury of a full MathML markup (or a fallback EPS), while smaller expressions should not marked up with MathML or available as images. To which I retort: They aren't called <inline-formula> or <disp-formula> (instead of <inline-equation> or <disp-equation>) for no reason. Although they may hold tiny expressions, they should be treated like "real" equations.

You certainly may indicate that an inline-formula is holding a fraction only, as you suggested. I donbt have a preference for using either @content-type or @specific-use here. You can also retroactively assign this attribute to fractions in existing content, by analyzing the MathML. Or you can skip this attribute assignment and use the analysis just-in-time when rendering a formula.

We perform such an analysis like this:

- Try transforming the equation to plain STS or HTML markup, using XSLT
- If MathML elements such as <mml:mroot> remain in the result (to be determined by XPath) since there is no transformation rule for them, don't use the transformation result, but rather keep the original formula instead.

I Hope this helps. For those who kept reading until this point: Sorry for the lengthy post, again.

Gerrit

On 24/04/2018 22:39, Bruce Rosenblum bruce@xxxxxxxxx wrote:
Hi Dan,

I think most publishers a) avoid the small number of special fractions like 1/2 or 1/4, and b) either setup as plain text like "3/32" or setup as MathML depending on their specific business requirements.

I think the key question to ask is why do you want to treat them as other than plain text like "3/32"? Is it so they look great in PDF? That may be fine, but in today's multi-channel world, what you do to make great-looking fractions in PDF may not work well in HTML or ePub format (e.g. many eReaders don't support MathML). So before deciding on a technology solution, make sure your business requirement is clear, and then make sure solution you choose will work in multiple media, not just PDF.

Best,

Bruce

On Tue, Apr 24, 2018 at 4:33 PM, Dan Berger dberger@xxxxxxxx <mailto:dberger@xxxxxxxx> <niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx <mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:

Hi group,____

__ __

    Ibm making a second attempt for some feedback here. Anyone have
thoughts on this thread? ____

Thank you!

____

Dan____

__ __

    *From:* Dan Berger dberger@xxxxxxxx <mailto:dberger@xxxxxxxx>
[mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx>]
*Sent:* Thursday, April 19, 2018 10:00 AM
*To:* niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx
<mailto:niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx>
*Subject:* [niso-sts] Using fractions in text____

__ __

    Hi list b I am trying to figure out how best to deal with numerical
fractions in our standards. ____

__ __

    In many cases, there is a character for a fraction, such as B=. But
in other cases, such as 7/23 there is no single character. I donbt
want to do this <sup>7</sup>/<sub>23</sub> as this doesnbt seem
semantic. So we were doing the non-normal fractions as mathML. But
then we had inconsistencies in the markup. So then we went to all
math ML for fractions. But since our standards also have several
equations, we ran into the issue of a transform not knowing what is
a fraction and what is an equation. (We are also capturing an eps of
all equations as a backup.) So we considered the idea of adding in
an attribute like this: <inline-formula content-type="fraction">.____

__ __

    My question is what are others doing in this situation? It seems
pretty common, but I am not sure what the best approach is, or what
else to consider. ____

__ __

__ __

Dan____

__ __

*Daniel Berger*____

Senior Manager ofB Production ____

American Water Works Association____

__ __

____

    ------------------------------------------------------------------------
This communication is the property of the American Water Works
Association and may contain confidential or privileged information.
Unauthorized use of this communication is strictly prohibited and
may be unlawful. If you have received this communication in error,
copies of the communication and any attachments.

    American Water Works Association
Dedicated to the World's Most Important Resource B.

You are subscribed as gerrit.imsieke@xxxxxxxxx to NISO STS Discussion by Mulberry Technologies, Inc. <http://www.mulberrytech.com/>
EasyUnsubscribe <-sts-list/225679> (by email <>)
--
Gerrit Imsieke
GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 GeschC$ftsfC<hrer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt