Re: [xsl] identify sections in an xhtml document

Subject: Re: [xsl] identify sections in an xhtml document
From: Dean Maslic <dean.maslic@xxxxxxxxx>
Date: Fri, 11 Feb 2005 11:28:30 +1100
On Thu, 10 Feb 2005 15:36:05 -0800, Robert Koberg <rob@xxxxxxxxxx> wrote:
> Dean Maslic wrote:
> > Not sure if this is the right place to ask (as it could easily be a PhD research
> > topic) but maybe someone can sugest a good approach/reading, even better some
> > xslt code to do this...
> > Im trying to identify a maximum of 10 logical sections of an arbitary web/xhtml
> > document and add a name-anchor at the beginning of each section.
> > What I mean by section is things like navigation-menus, blocks of text/image
> > content, groups of links and similar.
> > For example, http://xmlsoft.org/ has four distinct sections:
> > 1. Heading + Images,
> > 2. Main Menu
> > 3. Related Links
> > 4. main content (could also be subdivided into further 3 text and 3 link/list
> > sections)
> > I would like a stylesheet to identify those sections and add <a name="$id"/> at
> > the beginning of each, leaving everything else intact.
> 
> Hi,
> 
> Do you want to do this with any xhtml or do you have a site with
> consistent markup? You really can't do this in a generic way.

Im thinking in a generic way, with any site. 
Some ideas I had were eg. calculate total num of nodes, then go
through block level nodes (div, table,tr, ol etc) and calculate a
ratio between their number of nodes vs. total number of nodes. If the
numbers are roughly the same (say > 0.9), don't label, go to the child
nodes and apply the same. If they are different, look for collections
of of links (eg. count(descendant::html:a) > 5) or size of text nodes
etc.
Im sure there would be a way to do it for a generic 'standard' site
(ie.page that contains a Top link-bar, left/right sidebar, and some
text/image content)

Dean

Current Thread