Digital libraries/automated gazetteers...

Tom Moritz tmoritz at CAS.CALACADEMY.ORG
Tue Oct 24 14:51:51 CDT 1995

At a Conference in San Francisco this weekend I saw an early prototype of
a compiled world-wide gazetteer which will (putatively) be available on
the Web in November.  Following are 1) a description of the project under
which the resource will be offered and 2) a brief description of the
gazetteer aspect of the project...  Visit their WWW site for more details
and latest updates...
                Tom Moritz  Academy Librarian
                California Academy of Sciences
                Golden Gate Park
                San Francisco, California  94118
                415-750-7101 -- VOICE
                415-750-7106 -- FAX
                Internet: TMoritz at



> [Home]  Alexandria Digital Library: Project Description
> The development of ADL commenced in late 1994 as part of a national
> Digital Library Initiative sponsored by NSF, ARPA, and NASA. The
> Alexandria Project at the University of California at Santa Barbara
> (UCSB) is one of six projects supported under the Initiative. These
> projects are viewed by their sponsors as the cornerstone in a national
> effort to develop digital libraries. The remaining five projects are
> located at Carnegie-Mellon University, the University of Illinois at
> Champaign-Urbana, the University of Michigan, the University of
> California at Berkeley, and Stanford University. There is significant
> cooperation among the six projects.
> The Alexandria Project involves a consortium of several groups.
> Academic groups from UCSB include: the Map and Imagery Laboratory, the
> Department of Computer Science, the Department of Electrical and
> Computer Engineering, the National Center for Geographic Information
> and Analysis (NCGIA), and the Center for Remote Sensing and
> Environmental Optics. This team is augmented by researchers from the
> NCGIA at SUNY at Buffalo and the University of Maine at Orono.
> Libraries participating in the project include the UC Center for
> Library Automation, the library of SUNY (Buffalo), the Library of
> Congress, the library of the United States Geological Survey, and the
> St Louis Public library. Other partners include AT&T, Digital
> Equipment Corporation (DEC), Environmental Systems Research Institute
> (ESRI), E-Systems, Lockheed, San Diego Supercomputer Center, US Navy,
> Xerox and Conquest.
> Last modified on 1995-05-03 at 20:28 GMT by the Alexandria Web Team


> Catalog Implementation
> The second extension permits retrieving ADL holdings based on overlaps
> between the footprints of collection items and footprints of named
> instances of various classes of features, such towns or rivers. The WP
> employs precomputed representations of footprints that may be found in
> available gazetteers [gif] . A gazetteer is basically a list of
> feature classes (sometimes hierarchically organized), a set of their
> named instances, and a footprint of each instance [gif] .
> We are using both the Geographic Names Information System (GNIS)
> digital gazetteer of US features and the Board of Geographic Names
> (BGN) digital gazetteer of worldwide features. The GNIS gazetteer
> contains about 1.8M names of US features, organized hierarchically
> into 15 classes of features, while the BGN gazetteer contains
> approximately 4.5M names of land and undersea features. Specific
> issues that we are addressing include: 1) ingesting non-digital
> gazetteers (e.g. historic place names); 2) merging gazetteers of
> different feature classes, formats, and accuracies; 3) constructing
> meaningful footprints for entities; 4) organizing the feature classes
> into meaningful hierarchies.
> The issue of footprints is the most troubling since those of existing
> gazetteers are often point locations, rather than sets of points that
> define an area. When a gazetteer employs a single point to represent
> the footprint of a feature possessing non-zero extent, it is not
> always clear how the points were chosen. For example, they may be
> centroids, corners, or arbitrary points.
> In constructing a gazetteer for the WP, we are having to choose
> appropriate footprints for features and to extract these footprints.
> The definition of an appropriate footprint for many classes of
> features is difficult. One must in general decide whether a single
> point, a simple polygon, a complex polygon, or some other
> representation is the most appropriate. Furthermore, there may be no
> unambiguous definition of the footprint of some feature (where does a
> mountain begin and end?). This ambiguity and fuzziness is inherent in
> a person's notion of the spatial extent of a feature, and is
> particularly difficult to specify. Finally, extracting and entering
> footprints into a gazetteer is expensive. Footprints may be generated
> manually, or from existing digital data, or from other ancillary
> information. An associated problem is finding and correcting errors in
> existing gazetteers.
> Database support for the gazetteer information is currently provided
> by the ConQuest text-retrieval engine. A significant feature of
> ConQuest is its ability to handle fuzziness in the feature
> specifications. While the WP metadata are currently stored in the
> Sybase RDBMS, we are also implementing our metadata in the O2 object
> database, since many of the range values in informal interpretive
> mappings of interest are best represented as structured objects.
> As the size of the metadatabase grows, it is critical to provide
> efficient support for different types of queries over the footprints
> with the use of appropriate spatial indexing methods. We have been
> exploring new methods of indexing multiply-nested spatial data [9]
> such as footprints. In particular, we have extended B-trees to
> ``IB-trees'' for handling data objects that span a range of values
> (intervals) rather than single-valued points in the data space. This
> allows two distinct approaches for indexing multidimensional
> hierarchical data. The first decomposes the d-dimensional data objects
> into d intervals, one per dimension, and indexes the intervals in each
> dimension separately. The second organizes all data objects at the
> same level, using standard spatial indexing methods.
> Terence R. Smith
> Mon Jul 31 17:29:50 PDT 1995

More information about the Taxacom mailing list