Re: DAVENPORT: explanation of indexing

Subject: Re: DAVENPORT: explanation of indexing
From: Chris Maden <crism@xxxxxxxxxxx>
Date: Fri, 24 Jul 1998 11:47:38 -0400 (EDT)
My main problem with HTML indexing is that you can't append to a file;
you make an entity and then you stop.  And re-processing an entire
book, then keeping all of the index terms in memory for sorting and
collating, is unrealistic for me; changing one chapter would require
reprocessing the entire book, which could be quite slow for some of my
volumes (_UNIX Power Tools_, anyone?).

I also have a constraint of merging the indices for multiple books,
and presenting the information in a form our indexer can edit before
merging (the word choice for the merged index might not be the same as
for individual books).

So what I do is generate a file of indexterms for every chapter when
the HTML for the chapter is created.  A term like this:

<indexterm>
  <primary>writing</primary>
  <secondary>scripts</secondary>
  <tertiary>sed</tertiary>
</indexterm>

becomes

writing
  scripts
    sed
			!SEDAWK:02592 ch04_01.htm 4. Writing sed Sc...

A Perl script sorts and collates these chapter files (and can sort
multiple book files into the collated index):

writing
  from files (SORTAS: files from
			!SEDAWK:03891 ch05_11.htm 5.11. Reading and...
  to files (SORTAS: files to
			!SEDAWK:01038 ch02_03.htm#SEDAWK-CH-2-SECT-...
			!SEDAWK:03854 ch05_11.htm 5.11. Reading and...
			!SEDAWK:08569 ch10_05.htm 10.5. Directing O...
  regular expressions
			!SEDAWK:01716 ch03_02.htm#SEDAWK-CH-3-SECT-...
  scripts
			!SEDAWK:00525 ch01_04.htm 1.4. Four Hurdles...
    awk
			!SEDAWK:04708 ch07_01.htm 7. Writing Script...
    sed
			!SEDAWK:02592 ch04_01.htm 4. Writing sed Sc...
  user-defined functions
			!SEDAWK:07940 ch09_03.htm 9.3. Writing Your...

A final Perl script turns this stuff into HTML, split by letter of the
alphabet:

<DT><A NAME="writing">writing</A>
<DD><DL>
  <DT>from files
: <A HREF="../ch05_11.htm">5.11. Reading and Writing Files</A>
  <DT>to files
  <DD><DL>
    <DT><A HREF="../ch02_03.htm#SEDAWK-CH-2-SECT-3.2.1">2.3.2.1. Saving output</A>
    <DT><A HREF="../ch05_11.htm">5.11. Reading and Writing Files</A>
    <DT><A HREF="../ch10_05.htm">10.5. Directing Output to Files and Pipes</A>
  </DL>
  <DT>regular expressions
: <A HREF="../ch03_02.htm#SEDAWK-CH-3-SECT-2.3">3.2.3. Writing Regular Expressions</A>
  <DT>scripts
: <A HREF="../ch01_04.htm">1.4. Four Hurdles to Mastering sed and awk</A>
  <DD><DL>
    <DT>awk
: <A HREF="../ch07_01.htm">7. Writing Scripts for awk</A>
    <DT>sed
    <DD><DL>
      <DT><A HREF="../ch07_01.htm">7. Writing Scripts for awk</A>
      <DT><A HREF="../ch04_01.htm">4. Writing sed Scripts</A>
    </DL>
  </DL>
  <DT>user-defined functions
: <A HREF="../ch09_03.htm">9.3. Writing Your Own Functions</A>
</DL>

To make this relevant to DSSSList, here's the relevant stuff.  It's
uncommented, which is why I haven't released this stylesheet yet...

-Chris

(element chapter
	 (sosofo-append (make-html-file (current-node))
			(make-index)))

(define (make-index)
  (make entity
	system-id: (string-append "idxtmp/"
				  (gen-file-name (current-node)))
	(with-mode index
		   (process-node-list (current-node)))))

(mode index
      (default (process-node-list (node-list-filter (lambda (snl)
						      (equal? (node-property 'class-name
									     snl)
							      'element))
						    (children (current-node)))))

      (element indexterm
	       (sosofo-append (process-children)
			      (if (and (node-list-empty? (get-children-by-type (list (norm "see"))
									       (current-node)))
				       (not (attribute-string (norm "spanend"))))
				  (make formatting-instruction
					data: (string-append "			!"
							     (attribute-string (norm "id")
									       (ancestor (norm "book")))
							     ":"
							     (format-number (all-element-number)
									    "00001")
							     " "
							     (gen-file-name)
							     (let* ((app (ancestor (norm "appendix")))
								    (chap (ancestor (norm "chapter")))
								    (gloss (ancestor (norm "glossentry")))
								    (nutentry (ancestor (norm "nutentry")))
								    (pref (ancestor (norm "preface")))
								    (sect1 (ancestor (norm "sect1")))
								    (sect2 (ancestor (norm "sect2")))
								    (sect3 (ancestor (norm "sect3")))
								    (container (if (node-list-empty? nutentry)
										   (if (node-list-empty? sect3)
										       (if (node-list-empty? sect2)
											   (if (node-list-empty? sect1)
											       (if (node-list-empty? chap)
												   (if (node-list-empty? app)
												       (if (node-list-empty? pref)
													   (if (node-list-empty? gloss)
													       (error (string-append "Unhandled <indexterm> location: "
																     (gen-file-name)))
													       gloss)
													   pref)
												       app)
												   chap)
											       sect1)
											   sect2)
										       sect3)
										   nutentry)))
							       (string-append (if (or (and (node-list-empty? sect1)
											   (node-list-empty? gloss))
										      (node-list=? container
												   nutentry)
										      (and (node-list=? container
													sect1)
											   (not (first-sibling? sect1))))
										  ""
										  (string-append "#"
												 (gen-id container)))
									      " "
									      (if (or (and (node-list-empty? chap)
											   (node-list-empty? app))
										      (node-list=? container
												   nutentry))
										  ""
										  (string-append (if (node-list-empty? chap)
												     (format-number (element-number app)
														    "A")
												     (number->string (element-number chap)))
												 "."
												 (if (not (node-list-empty? sect1))
												     (string-append (number->string (child-number sect1))
														    "."
														    (if (not (node-list-empty? sect2))
															(string-append (number->string (child-number sect2))
																       "."
																       (if (not (node-list-empty? sect3))
																	   (string-append (number->string (child-number sect3))
																			  ".")
																	   ""))
														        ""))
												     "")
												 " "))
									      (if (node-list-empty? gloss)
										  (if (node-list=? container
												   nutentry)
										      (string-append "Chapter "
												     (number->string (element-number (ancestor (norm "chapter"))))
												     ", Reference: "
												     (process-string (get-children-by-type (list (norm "term"))
																	   container)))
										      (process-string (get-children-by-type (list (norm "title"))
															    container)))
										  (string-append (process-string (get-children-by-type (list (norm "title"))
																       (ancestor (norm "glossary"))))
												 ": "
												 (process-string (get-children-by-type (list (norm "glossterm"))
																       gloss))))))
							     "
"))
				  (empty-sosofo))))
      (element part
	       (process-node-list (get-children-by-type (list (norm "docinfo")
							      (norm "partintro")
							      (norm "title")))))
      (element primary
	       (make formatting-instruction
		     data: (string-append (process-string (current-node))
					  (let ((sortas (attribute-string (norm "sortas"))))
					    (if sortas
						(string-append " (SORTAS: "
							       sortas)
					        ""))
					  "
")))
      (element secondary
	       (make formatting-instruction
		     data: (string-append "  "
					  (process-string (current-node))
					  (let ((sortas (attribute-string (norm "sortas"))))
					    (if sortas
						(string-append " (SORTAS: "
							       sortas)
					        ""))
					  "
")))
      (element see
	       (make formatting-instruction
		     data: (string-append "			"
					  "(see "
					  (process-string (current-node))
					  ")
")))
      (element seealso
	       (make formatting-instruction
		     data: (string-append "			"
					  "(see also "
					  (process-string (current-node))
					  ")
")))
      (element tertiary
	       (make formatting-instruction
		     data: (string-append "    "
					  (process-string (current-node))
					  (let ((sortas (attribute-string (norm "sortas"))))
					    (if sortas
						(string-append " (SORTAS: "
							       sortas)
					        ""))
					  "
"))))


 DSSSList info and archive:  http://www.mulberrytech.com/dsssl/dssslist


Current Thread
  • Re: DAVENPORT: explanation of indexing
    • Norman Walsh - from mail1.ability.netby web4-1.ability.net (8.8.5/8.6.12) with ESMTP id JAA15860Fri, 24 Jul 1998 09:58:26 -0400 (EDT)
      • Sebastian Rahtz - from mail1.ability.netby web4-1.ability.net (8.8.5/8.6.12) with ESMTP id KAA17213Fri, 24 Jul 1998 10:07:44 -0400 (EDT)
        • Matthias Clasen - from mail1.ability.netby web4-1.ability.net (8.8.5/8.6.12) with ESMTP id RAA09781Sun, 26 Jul 1998 17:26:26 -0400 (EDT)
      • Mark Burton - from mail1.ability.netby web4-1.ability.net (8.8.5/8.6.12) with ESMTP id KAA18396Fri, 24 Jul 1998 10:40:16 -0400 (EDT)
        • Chris Maden - from mail1.ability.netby web4-1.ability.net (8.8.5/8.6.12) with ESMTP id LAA20238Fri, 24 Jul 1998 11:54:47 -0400 (EDT) <=