Re: [xsl] CSSXX to XML

Subject: Re: [xsl] CSSXX to XML
From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@xxxxxxxxx>
Date: Sat, 30 Mar 2013 20:59:49 +0100
Hi Dorothy,

You might want to try our css expander.

Itbs a three-step process:

b Transform the XHTML and its CSS, be it linked, included as style element or style attribute, into an XML representation of the CSS.

b Transform this representation into an XSLT stylesheet, where XSLT matching patterns correspond to CSS selectors and XSLT priority attributes correspond to CSS precedence rules.

b Apply this stylesheet to the original XHTML document.

You can use it as an XProc step. If you check out this sample project with svn, https://subversion.le-tex.de/common/sandbox/css_expand_standalone/trunk/, you can invoke it as described here (replacing calabash.sh with calabash.bat if youbre on Windows): https://subversion.le-tex.de/common/sandbox/css_expand_standalone/trunk/README.txt

Or just check out this directory: https://subversion.le-tex.de/common/css-expand/xslt/ and use it like this (assuming that you have a front-end script for Saxon, called saxon):

saxon -xsl:css-parser.xsl -s:/path/to/file.xhtml -o:css.xml
saxon -xsl:css2xsl.xsl -s:css.xml -o:expand.xsl path-constraint='[self::*:img]'
saxon -xsl:expand.xsl -s:/path/to/file.xhtml -o:expanded.xhtml


You need the path-constraint attribute on the second step only if you want to restrict expansion to the img element.

On the same step, you may specify another parameter, prop-constraint. Example: prop-constraint="width max-width".

You need to further transform the css:* attributes in the expanded output to match your needs.

There is also a Relax NG schema for CSS as XML attributes: https://github.com/gimsieke/CSSa

There is currently no combined XHTML+CSSa schema, though. I should create one, because itbs just cool to be able to validate the CSS property values, as we experience daily when validating DocBook+CSSa with our Hub schema, https://github.com/gimsieke/Hub, deployed here: http://www.le-tex.de/resource/schema/hub/1.1/hub.rng

Since you are also using InDesign, you might want to try our IDMLbHub XML converter, https://subversion.le-tex.de/idmltools/trunk/idml2xml/
Just yesterday, I implemented nested styles (i.e., their resolution to spans with character styles).


Gerrit



On 30.03.2013 20:03, Dorothy Hoskins wrote:
HI, I have an interesting problem in that I am trying to figure out
how to load and process a CSS file to grab content from CSS class
definitions and poke them into XML files.
In the source XML, which is scraped from XHTML pages, I find images
with CSS classes:
<img class="frame-3" src="image/file.jpeg" alt="image" />

In the CSS of the ePub, I find the dimension information that I want
for the image:
img.frame-3 {
     height:448px;
     width:339px;
}

My desired XML output is <image height="44" width="339"
src="image/file.jpeg" alt="image" />

I have the idea of grabbing the CSS and processing the CSS text to
achieve something like this:
<css>
<class element="img" name="frame-3">
<attribute name="height" value="44"/><!-- px assumed in XHTML -->
<attribute name="width" value="339"/>
</class>
</css>

I know I can handle everything else I want to do once I get the CSS
into an XML structure. The commonalities of the CSS text are that a
line which contains "{" has the information I want for the
class/@element and class/@name. The subsequent lines until the "}"
occurs have the content that I want to process into the attribute/name
and attribute/value. It seems like regex is the way to go but I don't
know how to start - do I load the CSS file into a variable as
xs:string? process it as unparsed-text? if anyone knows a good example
of creating such structure from a text input in the archives or
online, please point me in the right direction.
Thanks, Dorothy


-- Gerrit Imsieke GeschC$ftsfC<hrer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

GeschC$ftsfC<hrer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard VC6ckler

Current Thread