database integrity

Thu May 6 12:21:26 CDT 1993

Barry Roth writes:

> Colleague Rob Guralnick's posting makes me think there
> is a natural law that might be expressed like this:

> information degrades in proportion to distance from its
> originator.

I think time is a more important independent variable in this equation
than distance.  More information has surely been lost in the desk of (or,
now, on the diskettes of) the original investigator than was ever degraded
in the hands of others.  Most scientists are notoriously poor data
stewards.  They view measurement data as an intermediate and expendable
step between observation and analysis.  Consequently, they document their
field measurements only to the extent required for their analysis.  The
longer these measurements are left in an incompletely documented state,
the more of the metadata - the information needed by others to judge the
quality and appropriateness of the measurements for their application - is
irretrievably lost.  Couple this practice with the widely held belief that
measurement data are proprietary, and you have the all-too-common
situation of scientists being loathe to share their data with others or to
trust the measurements made by others.

Please don't flame me for using "scientists" collectively.  I'm well aware
that some scientists, in all disciplines, are very good data stewards - as
evidenced by discussion groups such as this one and by the incredible data
resources that available out there in gopherspace.  When I was younger I
used to think the problem of poor data stewardship was an age-related
phenomenon.  I reasoned that older scientists, who came of age
professionally when science was done on a small scale, with much lower
data acquisition rates and no simple means to disseminate field
measurements widely, and at a time when computer literacy was rare among
naturalists, that these scientists were less likely to view data husbandry
as a necessary part of their work.  Unfortunately, I see the same attitude
among both young and old scientists today, scientists who are otherwise
quite adept with information technology.  Conservation of measurement data
is obviously an attitude, not a technology.  Looking back now, some of
those older scientists with their Write-in-the-Rain notebooks and 3x5
cards were doing a much better job conserving their field measurements
with the tools they had than many scientists are doing today with notebook
computers and relational databases.

Returning to the question of data degradation, I think degradation occurs
more from lack of use than from misuse; the data "rust out" rather than
"wear out".   Thus, to Barry's question...

> Should management of biodiversity/natural heritage
> databases be kept in the hands of systematist and
> collection managers, who have a personal stake in the
> quality of the information?  With the sanctions of peer
> review to keep us honest?

... I think the answer is "Yes" - because the systematists and collection
managers have both the expertise and the motivation to maintain the
quality of the databases.  Other scientists, as users of the data, have
the motivation but lack the expertise.  But data dumped to a database and
never used will degrade, regardless of who's maintaining the database.

--Gerry Key

Computer Sciences Corporation
4045 Hancock Street
San Diego, CA  92110-5164
Internet:       key at (NeXTMail OK)
FAX:            619.226.0462
Voice:          619.225.2504

More information about the Taxacom mailing list