The catalog described might be of interest to the subscribers of ETEXTCTR. I left the message about the cataloging also for your interest; please note however that the author requests that discussions about that topic be held on AUTOCAT. -Lisa Horowitz Acting Moderator _________________________________________________________________ Date: Sat, 23 Jul 1994 18:33:29 +0100 From: Subject: Alex: A Catalogue of Electronic Texts on the Internet Alex: A Catalogue of Electronic Texts on the Internet July 23, 1994 This is to announce the available of a new service, "Alex: A Catalogue of Electronic Texts on the Internet." Alex can be found at gopher://, or by pointing at, choosing "The World", then "Gopherspace", and then "Alex". Alex allows users to find and retrieve the full-text of documents on the Internet. It currently indexes over 700 books and shorter texts by author and title, incorporating texts from Project Gutenberg, Wiretap, the On-line Book Initiative, the Eris system at Virginia Tech, the English Server at Carnegie Mellon University, and the on-line portion of the Oxford Text Archive. For now it includes no serials. Alex does include an entry for itself. New publications at the sites above will be detected automatically. However, pointers to other sites, specific suggestions for additions to the catalogue, and notifications of errors are welcome, particularly those with full bibliographic information and URLs. The email address for Alex is Gopher space for Alex is graciously provided by the Radcliffe Science Library of Oxford University, which bears no responsibility for its content. World-Wide Web and WAIS interfaces for Alex are forthcoming, as are tools to identify bad URLs and to sort copy records in reverse order of server response time. The Alex database is copyright 1994 and released for noncommercial use. No claims are made concerning the copyright status or quality of texts referenced in the database. Hunter Monroe A Note for Catalogers Alex is not MARC-based; it was not created by a professional cataloguer. However, names conform to Library of Congress authorities, and every attempt has been made to facilitate future upgrade of records to MARC standards by professional catalogers. A menu of resources relating to cataloging and the Internet can be found in the Alex directory. The development of a MARC-based catalogue of Internet resources offers the opportunity to rethink the process of cataloging. Existing OPACs and bibliographic systems do not adhere to the basic tenet of database design--that each item of information should be stored authoritatively in one and only one place, and that other occurrences of that item should mirror the authoritative original--even those other OPACs. Furthermore, ISBNs/ISSNs fail dismally as unique identifiers, with the same ISBN applying to distinct editions or even distinct books. The result is that there are multiple MARC records for the same book, catalogers spend an inordinate amount of time and money downloading and editing records from outside databases, search hits on ISBNs must be manually verified, enhancements or corrections to a MARC record in one OPAC are not immediately reflected in other OPACs, and a new MARC record created in one OPAC takes many months or years to show up in the OPACs libraries holding the relevant document. Users are confronted with incomplete OPACs, and catalogers are distracted from their real work-- cataloging books from scratch and improving existing MARC records. Consider by contrast an ideal world in which all cataloging was performed on a single database to a single set of authorities, and that local OPACs only contained copy records attached to a unique number for each book (an URN), in turn attached to a local image of the authoritative record in the central database. Each book would be catalogued once and only once. Unskilled staff could determine if a book had been catalogued by scanning bar-coded URN, and could immediately attach the book's shelf mark to the best MARC record for that book in the world, with no typing of search keys, no need to verify hits, and no editing of the record obtained. Cataloging new books would require a manual search for similar preexisting records, and if necessary the creation of a new record from scratch. Mistakes corrected in one OPAC would be reflected immediately in every other OPAC. Unfortunately, the stock of books and MARC records without unique identifiers and the nonexistence of a unified world OPAC makes the above system impractical in the real world. Yet with even a first approximation to it--batch searching by ISBN of bibliographic databases, using software such as CAT ME for OCLC and CURLBAT for CURL, following by manual editing--has led to substantial gains in cataloging productivity in some libraries. Furthermore, the design of a catalogue of objects on the Internet can adhere to this ideal standard. It has been proposed by the IETF that a central mapping be maintained between Universal Resource Numbers (URNs) and Universal Resource Locators (URLs). However, the URL subfield of a MARC record for an Internet object is not the only one subject to change: other obviously fluid elements include the file date, file size, and version number. Consider a hypothetical catalogue record for the Usenet newsgroup comp.infosystems.www--recently it would have needed changes in the name of the group, it would required an additional LCSH to reflect changing topics of discussion (World-Wide Web--Clients--Mosaic), and would need an indication that the group's FAQ had become available as a periodical posting available in FAQ archives. Since almost all elements of a MARC record are potentially fluid, an URN should be mapped to the whole MARC record, not just one subfield of the 856 field. Consider therefore the practical requirements of creating and operating a catalogue of the Internet by the ideal standard. o The finalization by MARBI of standards for Internet cataloging, and selection of a single flavor of MARC as well as a single set of authorities. o The need for a central OPAC providing gopher, WWW, and WAIS interfaces, potentially mirrored in several locations. o The selection of catalogers to be given cataloging authority on thatcentral OPAC. o The development of a system by which local OPACs providing active outgoing links to the Internet to load MARC records from the central OPAC and update the local copy when the central one changes. o A central system for assigning URNs to ensure that they are unique when catalogers think they should be (see for instance Project Gutenberg document numbers #22 and #132). o The collection of existing MARC records for objects on the Internet, including: records produced by the OCLC-Library of Congress Internet Resources project, records for the Oxford Text Archive, CETH records at RLIN, and records at OCLC under "710 Project Gutenberg", etc. (suggestions for additions to this list are welcome). To get started with the last of these, the Alex catalogue will incorporate MARC records that are emailed to, as long as these records are submitted in ASCII form, they are submitted by their copyright holder, and the copyright holder explicitly consents to public distribution. Discussion of these ideas should be directed to AUTOCAT@UBVM.CC.BUFFALO.EDU.