posting again without attachment...
-------- Forwarded Message --------
Subject: Re: [niso-sts] Re: Using fractions in text
Date: Wed, 25 Apr 2018 00:57:36 +0200
From: Imsieke, Gerrit, le-tex <gerrit.imsieke@xxxxxxxxx>
Organization: le-tex publishing services GmbH
To: niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx
Hi Dan,
The practice at DIN, in their legacy XML format, has been to _always_
mark up these fractions as ISO 12083 formulas, like this one:
<formula alphabet="latin">7,85 <fraction align="center" shape="case"
style="single">
<num><roman>kg</roman></num>
<den><roman>dm</roman><sup arrange="compact"
location="post">3</sup></den>
</fraction></formula>
When rendered, it approximately looks like 7,85 kg/dmB3, but with the kg
slightly above the slash and the dm slightly below, as you described it.
When converting to NISO STS / MathML, it becomes
<inline-formula><mml:math>
<mml:mn>7,85</mml:mn>
<mml:mfrac bevelled="true" linethickness="1">
<mml:mtext>kg</mml:mtext>
<mml:mrow>
<mml:msup>
<mml:mtext>dm</mml:mtext>
<mml:mn>3</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfrac>
</mml:math></inline-formula>
So ISO 12083's shape="case" corresponds to MathML mfrac's bevelled="true".
I'm attaching a PNG that is a screenshot of a Firefox with a built-in
MathML rendering capability, hoping that this attachment will pass
through the mailing list server.
[The PNG is here:
https://drive.google.com/file/d/1j1y_XUAY4JPnlOI72TupnVT9lv2naiCR/view?usp=sharing]
In the screenshot you see that the MathML is essentially the same, but
without the mml prefix, only with
xmlns="http://www.w3.org/1998/Math/MathML". It looks like your sup/sub
suggestion, and whether you use MathML in an HTML rendering or this
built-in HTML sup/sub markup depends on the typical (or the weakest)
rendering device's capabilities.
What I described above is the historic practice at DIN. There is
currently an encoding shift under way like the one that Bruce is
suggesting: They are not as strict any more in encoding their text
fractions as formulas. However, it still happens, in particular if
you're in math context anyway.
So since approx. two years ago, if a text fraction was included in the
Word source as a OLE object or OOMML equation, we'd convert it to ISO
12083 <fraction shape="case">. More recently (if I remember correctly --
it might be the case that textifying formulas extends to superscripts,
subscripts, Greek letters, and italics, but not to bevelled fractions),
we are flattening them to text if the formula in question does not
contain roots, stacked fractions and the like.
I personally would prefer a non-flattened MathML representation in the
NISO STS XML content and HTML rendering that uses built-in tags to the
largest extent. This is mostly because, as Bruce mentioned, in many
contexts there is no MathML rendering available, and even if MathJax
were available, it would be incredibly slow to render myriads of small
formulas with Javascript.
On the other hand, it is sometimes demanding to obtain this kind of
homogeneous MathML markup in the first place. If you cannot make sure
that authors will use proper formulas (MathType, OOMML, AMSTeX, MathML,
b&) consistently even for small, text-mode-compatible expressions, you
will have a hard time enforcing this or heuristically converting these
expressions to math markup proper. This is another reason why people are
recently switching to flat / built-in vocabulary representations even in
their source XML formats.
TeX users in particular instinctively put all symbols and often also
terms such as 3/32 into math mode. A naC/ve math renderer will then
include even these simple expressions in the HTML as MathML or often,
and worse, as images with poorly adapted DPIs and baseline shifts.
But as I said, mark it up in the source data as detailed as you like
(MathML is ok but it will only get you that far if there are units in
the fractions, see below), and then flatten it out when rendering. This
rendering may well be like <sup>3</sup>/<sub>32</sub> rather than the
completely flat 3/32. But we wouldn't encode something like
<sub>3</sub>/<sup>32</sup> in STS, only in HTML, and only if that HTML
were not to satisfy accessibility demands. If the sup/sub nuance matters
and if it was already present in the source, we'd use a bevelled MathML
in STS. In addition, we'd extend the MathML markup to all other terms
that may constitute a single inline formula (we would keep the 2 and the
3/8 in '2 3/8' in a single MathML expression).
A major recent development among standardization bodies is semantic
annotation of physical quantities and units, like for '7,85 kg/dmB3'
above. This will allow advanced search applications where you can search
for an alloy with a certain density, expansion coefficient, etc. It
should be noted that presentational MathML by itself is not suited to
mark up this information unambiguously. We have to think about ways to
enhance all of them: MathML-encoded formulas, STS-vocabulary-encoded
formulas, and maybe also HTML-encoded formulas, with information about
the quantities, units, chemical elments, etc. that they contain. A
candidate for this semantic enhancement is RDFa, referring to
established vocabularies that describe these things, or some
JATS-family-specific vocabulary. These annotating attributes need to be
stripped away when exporting the MathML, and of course the MathML part
of the DTD (as well as the <sub>, <sup>, <italic> part of the DTD) have
to be augmented so that they allow these semantic attributes.
Although we neither did constitute nor agreed on a scope for the next
iteration of STS, unit/quantity markup is a thing that we will probably
discuss in the STS standing committee. There is clear demand for this
kind of semantic markup in the market, and I donbt think that semantic
markup is an added value that each standardization body should encode in
their private STS extension vocabularies. Ideally, ISO standards will be
authored in a semantically annotated way in the first place.
A final remark: I consciously use the term 'formula' instead of
'equation'. You seem to assume that only equations may enjoy the luxury
of a full MathML markup (or a fallback EPS), while smaller expressions
should not marked up with MathML or available as images. To which I
retort: They aren't called <inline-formula> or <disp-formula> (instead
of <inline-equation> or <disp-equation>) for no reason. Although they
may hold tiny expressions, they should be treated like "real" equations.
You certainly may indicate that an inline-formula is holding a fraction
only, as you suggested. I donbt have a preference for using either
@content-type or @specific-use here. You can also retroactively assign
this attribute to fractions in existing content, by analyzing the
MathML. Or you can skip this attribute assignment and use the analysis
just-in-time when rendering a formula.
We perform such an analysis like this:
- Try transforming the equation to plain STS or HTML markup, using XSLT
- If MathML elements such as <mml:mroot> remain in the result (to be
determined by XPath) since there is no transformation rule for them,
don't use the transformation result, but rather keep the original
formula instead.
I Hope this helps. For those who kept reading until this point: Sorry
for the lengthy post, again.
Gerrit
On 24/04/2018 22:39, Bruce Rosenblum bruce@xxxxxxxxx wrote:
Hi Dan,
I think most publishers a) avoid the small number of special fractions
like 1/2 or 1/4, and b) either setup as plain text like "3/32" or setup
as MathML depending on their specific business requirements.
I think the key question to ask is why do you want to treat them as
other than plain text like "3/32"? Is it so they look great in PDF? That
may be fine, but in today's multi-channel world, what you do to make
great-looking fractions in PDF may not work well in HTML or ePub format
(e.g. many eReaders don't support MathML). So before deciding on a
technology solution, make sure your business requirement is clear, and
then make sure solution you choose will work in multiple media, not just
PDF.
Best,
Bruce
On Tue, Apr 24, 2018 at 4:33 PM, Dan Berger dberger@xxxxxxxx
<mailto:dberger@xxxxxxxx> <niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx>> wrote:
Hi group,____
__ __
Ibm making a second attempt for some feedback here. Anyone have
thoughts on this thread? ____
Thank you!
____
Dan____
__ __
*From:* Dan Berger dberger@xxxxxxxx <mailto:dberger@xxxxxxxx>
[mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx
<mailto:niso-sts-list-service@xxxxxxxxxxxxxxxxxxxxxx>]
*Sent:* Thursday, April 19, 2018 10:00 AM
*To:* niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx
<mailto:niso-sts-list@xxxxxxxxxxxxxxxxxxxxxx>
*Subject:* [niso-sts] Using fractions in text____
__ __
Hi list b I am trying to figure out how best to deal with numerical
fractions in our standards. ____
__ __
In many cases, there is a character for a fraction, such as B=. But
in other cases, such as 7/23 there is no single character. I donbt
want to do this <sup>7</sup>/<sub>23</sub> as this doesnbt seem
semantic. So we were doing the non-normal fractions as mathML. But
then we had inconsistencies in the markup. So then we went to all
math ML for fractions. But since our standards also have several
equations, we ran into the issue of a transform not knowing what is
a fraction and what is an equation. (We are also capturing an eps of
all equations as a backup.) So we considered the idea of adding in
an attribute like this: <inline-formula content-type="fraction">.____
__ __
My question is what are others doing in this situation? It seems
pretty common, but I am not sure what the best approach is, or what
else to consider. ____
__ __
Thank you for your help!____
__ __
Dan____
__ __
*Daniel Berger*____
Senior Manager ofB Production ____
American Water Works Association____
__ __
____
------------------------------------------------------------------------
This communication is the property of the American Water Works
Association and may contain confidential or privileged information.
Unauthorized use of this communication is strictly prohibited and
may be unlawful. If you have received this communication in error,
please immediately notify the sender by reply email and destroy all
copies of the communication and any attachments.
American Water Works Association
Dedicated to the World's Most Important Resource B.
You are subscribed as gerrit.imsieke@xxxxxxxxx to NISO STS Discussion by
Mulberry Technologies, Inc. <http://www.mulberrytech.com/>
EasyUnsubscribe
<-sts-list/225679> (by email
<>)
--
Gerrit Imsieke
GeschC$ftsfC<hrer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de
Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930
GeschC$ftsfC<hrer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt