Subject: Re: (dsssl) Practical Bibliography question From: "Markus Hoenicka" <hoenicka_markus@xxxxxxxxxxxxxx> Date: Sat, 13 Oct 2001 23:22:49 -0500 |
Trent Shipley writes: > Furthermore, it should be cognizant of existing practices and standards in > library science and records management. You would want to look at several > XML (and SGML) projects including DocBook and TEI, but also Dublin Core (a > project by and for Librarians) and the activities of the Semantic Web working > group (that in part build on the Dublin Core). In additon, you would want to > familiarize yourself with older document representation and storage formats > like MARC. RefDB currently lacks these capabilities. It is not meant to be a system used by librarians. It is rather limited to the scope of what Reference Manager and EndNote do: let that scientist manage his references and create bibliographies. > In the end, you expect to wind up with some XML document type for > document and media management. It might be sufficient to just borrow some > existing biblography standard. At worst the project's XML DTD will be an > extension of some existng bibliography base. > RefDB is based on RIS which is a tagged (non-SGML-like) format used by essentially all end-user reference managers. It was one of the design constraints to be compatible to existing commercial reference managers. SGML/XML-based input could easily be added if it is designed as a superset of what RIS offers. This might make the librarian happy eventually. > Phase two is to design a storage, search, retrival and maintenance schema for > the data entered in phase zero and put into a cannonical representation in > phase one. > Here is where the OO database comes into play. Even more than an OO > database, what I would love would be what I call a "document-base." This is > a type of automated knowledge base with OO functions that uses the structure > of a markup language to store, search, retrieve and manage marked-up > documents. > I'm not aware of such a tool yet. Existing XML databases are not OO afaik, and the search/retrieve capabilities are far less advanced than even the lamest SQL implementation. > > > While I admire your guts to implement this in DSSSL, I still think > > DSSSL plus external preformatting is more suitable for this task. This > > is not beautiful in any sense, but it appears to work. The strategy in > > my RefDB package is like this (I use DocBook tag names, but I assume > > TEI is not too different): > > Yes this will work. But it is *not* necessary. For example, the commercial > product EndNote does not store external formating, but it can return > formatted data for inclusion in a Word or WordPerfect document. > Maybe I don't get your point here. RefDB does not store any external formatting, the datasets are as raw as can be. The RefDB bibliography tool does preformatting, though: create the proper character sequence for each element (e.g. authorname formatting: F.M. Last or Last, F. M. or Last,F.M. or Last FM or whatever), and create the proper element sequence with the proper punctuation inbetween. This preformatting is performed on the fly whenever a bibliography is requested, and this is based on the requested reference style. > the reference is a mnemonic primary key (usually author, date, and part of a > title). > > In both cases if you immagine that the users work off a mamoth shared > knowledge base then use of abstract IDs becomes cumbersome. It would be much > better to use some natural primary key (or approximate primary key), like > authors + date + title. > > [ [ > In fact authors + date + title will be an alternate primary key. The > knowledge base will actually use an id number (probably an accession number) > as its internal primary key. > ] ] > > This is cannonical database engineering. Never force the end-user to use > non-meaningful primary keys (like ID numbers) to access the database. > This could be implemented, although it raises a few questions: E.g. how do you know the key in advance? If it uses a part of the title, which part? How is capitalization handled? What happens if you know only one of several authors? etc. My experience with citing is that you have to look it up in the database anyway. In that case, I prefer to enter three or four digits into my xref element instead of author, date, title. If you really need a hint what publication that is, why not add this in a SGML comment? > Up to this point I do not think I have over-simplified the problem too much. > > The part where I did oversimplify is in describing the application or > application that use the biblographic database to create in-text citations > and reference lists that conform to the style manual of a given journal. > (Any number of given journals, really) > > > We have to write an XML document for each bibliography style (i.e. for > > each supported journal) that contains all formatting and punctuation > > rules for the in-text citations and the bibliography. These styles are > > stored in a SQL database for easy access. > > If we have a universal citation formatting tool (and that *is* the goal), > then it needs to know what style manual we are using (and the rules for that > style). It will also need to be told or need to infer the type of each > citation. We assume it already knows what base document it will be working > on. The type of the citation must be in the reference dataset in the database (this is how RefDB handles it). Nothing else would reasonably work. > > It is reasonable to store the style source code in a database or document > base. > > > The references themselves are stored in another SQL database. They can > > contain any additional information like keywords, notes, abstracts to > > retrieve them easily. > > Agreed. (Except for the SQL part. But SQL and full Relational competence is > a big plus.) > SQL is for practical reasons only. I don't infer any theoretical advantage here. SQL implementations are widely available, and the software could be made implementation-independent. RefDB currently handles only MySQL, but support for other databases will be added shortly. > > I envision a somewhat different sequence. First I consider auto generating > non-interactive text for printing. I describe a two pass process. Purists > can merge the two passes if they want. > > --- > > Use an appropriate query and transform tool (eg OpenJade) for a first pass to > convert Pre-Press marked up document A[raw citations] to A[cooked citations]. > > Extract the xrefs from the text, whether or not they are real xrefs or > logical primary keys. > Some references may be 1) dangling with no referent. 2) be ambiguous with > more than one referent. Note these in the exception log(s). [This is > synchronization] > > [Begin pre-formatting] > > Pull the structured bibliography data from the knowledge base. Pull the > collation data from the designated style sheet. > Internal sort authors, editors, etc. for each entry > 'External' sort the entries. > Log errors and warnings. > > [End pre-formatting. Begin transform[ > > Pull the reference style data from the stylesheet. > Transform the references to cooked references. > Log errors and warnings. > > Cook the in-text citations. > Log errors and warnings. > Log summary statistics. > > [End transform] > > Phase two: Use a styling tool to make the next step to hardcopy. (If we use > OJ and DSSSL then obviously we have TeX --> DVI --> PS | PDF) > > -------- > > For HTML you replace the to-text styling tool with another transform phase. > If I understand you correctly, RefDB does pretty much what you suggest. > Instead of hyper-linking the in-text citations to entries in the master > database I would make them internal links to the long citation in the > bibliography. (If the bibliography knowledge base is a public or corporate > resource sophisticated links might go from there to the bibliography > knowledge base browser ... or whatever.) > I'm afraid you took me all wrong here. The hyperlinks go from the in-text citation to the corresponding entry in the bibliography, i.e. to another location in the same document. The final document is self-contained, you can walk away from the SQL database and all RefDB tools and and process the document like any other SGML document. The printable or HTML output is also self-contained with respect to the citation/reference stuff as no hyperlinks to locations outside of the current document are created. > How much of a practical advantage is it to trade in style manual transform > programs for that many variables? What exactly does "in style manual transform programs" mean in this context. I'm afraid I don't understand. > > What do you mean by helper script? > > Does it just put the values for the variables in the OJ command line? > Exactly. This is one of two solutions to get the variable values into the stylesheet. The other solution is to create a customized stylesheet on the fly with the appropriate values. None of these solutions has exceptional elegance, so I used the solution which is easier to implement. The downside is that this does not work with good ol' Jade (you can only set variables to "true" but not to a specific value), so I'll probably have to implement the other solution as well. regards, Markus -- Markus Hoenicka hoenicka_markus@xxxxxxxxxxxxxx http://ourworld.compuserve.com/homepages/hoenicka_markus/ DSSSList info and archive: http://www.mulberrytech.com/dsssl/dssslist
Current Thread |
---|
|
<- Previous | Index | Next -> |
---|---|---|
Re: (dsssl) Practical Bibliography , Trent Shipley | Thread | Re: (dsssl) Practical Bibliography , M. Wroth |
Re: (dsssl) Practical Bibliography , Trent Shipley | Date | RE: (dsssl) raw text, Didier PH Martin |
Month |