From <@PUCC.PRINCETON.EDU:owner-etextctr@RUTVM1.RUTGERS.EDU> Tue Jul 26 09:52:07 1994
Received: from pucc.Princeton.EDU (smtpc@pucc.Princeton.EDU [128.112.129.99]) by mail.csi.UOttawa.CA (8.6.9/8.6.9) with SMTP id JAA17510 for <terry@CSI.UOTTAWA.CA>; Tue, 26 Jul 1994 09:51:57 -0400
Message-Id: <199407261351.JAA17510@mail.csi.UOttawa.CA>
Received: from PUCC.PRINCETON.EDU by pucc.Princeton.EDU (IBM VM SMTP V2R2)
   with BSMTP id 7818; Tue, 26 Jul 94 09:48:59 EDT
Received: from PUCC.PRINCETON.EDU (NJE origin LISTSERV@PUCC) by PUCC.PRINCETON.EDU (LMail V1.1d/1.7f) with BSMTP id 5194; Tue, 26 Jul 1994 09:48:59 -0400
Date:         Tue, 26 Jul 1994 09:45:06 -0400
Reply-To: Discussion Group on Electronic Text Centers
              <ETEXTCTR%RUTVM1.BITNET@PUCC.PRINCETON.EDU>
Sender: Discussion Group on Electronic Text Centers
              <ETEXTCTR%RUTVM1.BITNET@PUCC.PRINCETON.EDU>
From: "Lisa R. Horowitz" <RHOROWITZ%ZODIAC.BITNET@PUCC.PRINCETON.EDU>
Subject:      Alex:  A Catalogue of Electronic Texts on the Internet
To: Multiple recipients of list ETEXTCTR <ETEXTCTR%RUTVM1.BITNET@PUCC.PRINCETON.EDU>
Status: OR

The following e-mail message was posted to several listservs, but Mary
Mallery suggested that I post it to this list as well.  The catalog
described might be of interest to the subscribers of ETEXTCTR.  I left
the message about the cataloging also for your interest; please note
however that the author requests that discussions about that topic be
held on AUTOCAT.
                                                  -Lisa Horowitz
                                                   Acting Moderator

_________________________________________________________________

Date: Sat, 23 Jul 1994 18:33:29 +0100
From: econhkm@vax.ox.ac.uk
Subject: Alex: A Catalogue of Electronic Texts on the Internet


      Alex: A Catalogue of Electronic Texts on the Internet
                          July 23, 1994

     This is to announce the available of a new service, "Alex:
A Catalogue of Electronic Texts on the Internet."  Alex can be
found at gopher://rsl.ox.ac.uk:70/11/lib-corn/hunter, or by
pointing at gopher.ox.ac.uk, choosing "The World", then
"Gopherspace", and then "Alex".

     Alex allows users to find and retrieve the full-text of
documents on the Internet.  It currently indexes over 700 books
and shorter texts by author and title, incorporating texts from
Project Gutenberg, Wiretap, the On-line Book Initiative, the
Eris system at Virginia Tech, the English Server at Carnegie
Mellon University, and the on-line portion of the Oxford Text
Archive.  For now it includes no serials.  Alex does include an
entry for itself.

     New publications at the sites above will be detected
automatically.  However, pointers to other sites, specific
suggestions for additions to the catalogue, and notifications of
errors are welcome, particularly those with full bibliographic
information and URLs.  The email address for Alex is
alex@rsl.ox.ac.uk.

     Gopher space for Alex is graciously provided by the
Radcliffe Science Library of Oxford University, which bears no
responsibility for its content.  World-Wide Web and WAIS
interfaces for Alex are forthcoming, as are tools to identify
bad URLs and to sort copy records in reverse order of server
response time.  The Alex database is copyright 1994 and released
for noncommercial use.  No claims are made concerning the
copyright status or quality of texts referenced in the database.

Hunter Monroe

A Note for Catalogers

     Alex is not MARC-based; it was not created by a
professional cataloguer.  However, names conform to Library of
Congress authorities, and every attempt has been made to
facilitate future upgrade of records to MARC standards by
professional catalogers.  A menu of resources relating to
cataloging and the Internet can be found in the Alex directory.

     The development of a MARC-based catalogue of Internet
resources offers the opportunity to rethink the process of
cataloging.  Existing OPACs and bibliographic systems do not
adhere to the basic tenet of database design--that each item of
information should be stored authoritatively in one and only one
place, and that other occurrences of that item should mirror the
authoritative original--even those other OPACs.  Furthermore,
ISBNs/ISSNs fail dismally as unique identifiers, with the same
ISBN applying to distinct editions or even distinct books.  The
result is that there are multiple MARC records for the same
book, catalogers spend an inordinate amount of time and money
downloading and editing records from outside databases, search
hits on ISBNs must be manually verified, enhancements or
corrections to a MARC record in one OPAC are not immediately
reflected in other OPACs, and a new MARC record created in one
OPAC takes many months or years to show up in the OPACs
libraries holding the relevant document.  Users are confronted
with incomplete OPACs, and catalogers are distracted from their
real work-- cataloging books from scratch and improving existing
MARC records.

     Consider by contrast an ideal world in which all cataloging
was performed on a single database to a single set of
authorities, and that local OPACs only contained copy records
attached to a unique number for each book (an URN), in turn
attached to a local image of the authoritative record in the
central database.  Each book would be catalogued once and only
once.  Unskilled staff could determine if a book had been
catalogued by scanning bar-coded URN, and could immediately
attach the book's shelf mark to the best MARC record for that
book in the world, with no typing of search keys, no need to
verify hits, and no editing of the record obtained.  Cataloging
new books would require a manual search for similar preexisting
records, and if necessary the creation of a new record from
scratch.  Mistakes corrected in one OPAC would be reflected
immediately in every other OPAC.

     Unfortunately, the stock of books and MARC records without
unique identifiers and the nonexistence of a unified world OPAC
makes the above system impractical in the real world.  Yet with
even a first approximation to it--batch searching by ISBN of
bibliographic databases, using software such as CAT ME for OCLC
and CURLBAT for CURL, following by manual editing--has led to
substantial gains in cataloging productivity in some libraries.

     Furthermore, the design of a catalogue of objects on the
Internet can adhere to this ideal standard.  It has been
proposed by the IETF that a central mapping be maintained
between Universal Resource Numbers (URNs) and Universal Resource
Locators (URLs).  However, the URL subfield of a MARC record for
an Internet object is not the only one subject to change: other
obviously fluid elements include the file date, file size, and
version number.  Consider a hypothetical catalogue record for
the Usenet newsgroup comp.infosystems.www--recently it would
have needed changes in the name of the group, it would required
an additional LCSH to reflect changing topics of discussion
(World-Wide Web--Clients--Mosaic), and would need an indication
that the group's FAQ had become available as a periodical
posting available in FAQ archives.  Since almost all elements of
a MARC record are potentially fluid, an URN should be mapped to
the whole MARC record, not just one subfield of the 856 field.

     Consider therefore the practical requirements of creating
and operating a catalogue of the Internet by the ideal standard.

o    The finalization by MARBI of standards for Internet
     cataloging, and selection of a single flavor of MARC as
     well as a single set of authorities.

o    The need for a central OPAC providing gopher, WWW, and WAIS
     interfaces, potentially mirrored in several locations.

o    The selection of catalogers to be given cataloging
     authority on thatcentral OPAC.

o    The development of a system by which local OPACs providing
     active outgoing links to the Internet to load MARC records
     from the central OPAC and update the local copy when the
     central one changes.

o    A central system for assigning URNs to ensure that they are
     unique when catalogers think they should be (see for
     instance Project Gutenberg document numbers #22 and #132).

o    The collection of existing MARC records for objects on the
     Internet, including: records produced by the OCLC-Library
     of Congress Internet Resources project, records for the
     Oxford Text Archive, CETH records at RLIN, and records at
     OCLC under "710 Project Gutenberg", etc. (suggestions for
     additions to this list are welcome).

To get started with the last of these, the Alex catalogue will
incorporate MARC records that are emailed to alex@rsl.ox.ac.uk,
as long as these records are submitted in ASCII form, they are
submitted by their copyright holder, and the copyright holder
explicitly consents to public distribution.  Discussion of these
ideas should be directed to AUTOCAT@UBVM.CC.BUFFALO.EDU.