[Taxacom] dissapearing data

Richard Pyle deepreef at bishopmuseum.org
Thu Nov 20 16:43:02 CST 2008

> We need a technology general enough to capture any kind 
> information, structured, unstructured, mixed. Able to give 
> the creator of information the tools to define any mixture of 
> syntax, semantics, or uncertainty. XML can do most of this 
> (with a weak spot on uncertainty), but it is just a format 
> and not a technology framework defining software. XML is 
> great for specific software, but it made only limited (but 
> certainly most welcome!) progress towards a long term, self 
> documenting, human understandable data format.

XML defines the "structured" part; binary data and its transliteration via
UTF-8 represents the technology framework part.  As long as we can persist
three things: 
1) that information is represented by an ordered sequence of binary values; 
2) how that ordered sequence of binary values can be transliterated into
human-decipherable glyphs through the UTF-8 standard; and
3) how those series of glyphs can be interpreted as structured information
via XML (mostly associated with special-case representations of "<", ":",
"/", ">" and perhaps a few others, and their meaning as delimiters; but also
a bit more meta-information about how XML documents are structured).

In the context of electronic documentation of information, we have a model
for items 1 & 2 that is almost as old as computer technology, and which
continues to persist today: ASCII.

As a basic framework for information persistence, we have a nearly
four-billion-year-old model to serve as an analog: 
1) instead of binary it's quaternary (A/G/T/C);
2) how that ordered sequence of quaternary values can be transliterated into
Amino Acids;
3) how those series of Amino Acids function as structured proteins.

If unthinking molecules can persist information for nearly four billion
years, I would like to imagine that (semi-)thinking humans should at least
be able to manage it for a few centuries or millennia -- if not into

I would view everything else (computers, hard drives, databases, wikis,
blah, blah, blah) as the ephemeral preferences "du-jour" of the (nearly)
hairless apes that want to persist the information.

In the context of the current "jour", however, I completely agree with the

> However, I do deplore that within the biodiversity community, 
> we do not have a real big, generally accepted Mediawiki 
> platform, to solve
> *together* some of the problems that could be solved *there*. 
> Without forcing people to create a new piece of fragile 
> software for every project, inventory, list of biodiversity 
> projects, list of software, physical or information collections, etc.

(Going back to my mounting obligations associated with creating new pieces
of fragile software for one particular project....)

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org

More information about the Taxacom mailing list