[Taxacom] [iczn-list] Sorry, but you are out-of-line

Stephen Gaimari SGaimari at cdfa.ca.gov
Mon Nov 15 13:59:42 CST 2010

Doug Yanega wrote:
>You're raising two points here, and they're not really linked. Your 
>first point, if I read it right, is that taxonomy *needs* a stagnant 
>original form on file somewhere, but it's not clear what form you are 
>referring to: is it the *paper* form, or a digital representation OF 
>that paper form? If the former, I don't think it's fair to say we NEED 
>the paper hard copy once we have created a securely-archived digital 
>version. It's *better* to have a hard copy, it's *desirable* to have a 
>hard copy, but I wouldn't use the word "need" once there is a secure 
>digital version; at that point, the hard copy is effectively 
>superfluous, in the same way that there is no longer a NEED for the 
>metal meter stick that was THE standard of reference for a meter (at 
>first there was a metal bar - the "hard copy" - then in 1960 it became 
>"1,650,763.73 wavelengths of the orange-red emission line in the 
>electromagnetic spectrum of the krypton-86 atom in a vacuum", and then 
>in 1983 it became "the length of the path travelled by light in vacuum 
>during a time interval of 1Ž299792458 of a second" - and I haven't 
>heard of any physicists objecting on the grounds that we may someday 
>lose the technology that allows us to measure wavelengths or laser 

Digital representations are only that - digital representations. I don't worry about these, and I don't worry about their needing to have file format upgrades, or changing with changing technologies, because they are just that - digital representations. They are not the stagnant original form, which paper copies are. Digital representations are not the original - they can change, in whatever way you might imagine, given sufficient time and numerous upgrades and new systems of data management on the horizon over the next 200 years. Since taxonomy is a field that relies upon historical documentation, bastardizing that historical information does have a clear effect, especially since we have no way to predict what the future will bring with regards to digital archiving. We just do not know. You seem very sure. I am not. We know it would be secure for now, but we don't know what that security will look like in the future. So to put all of our faith on things turning out as you envision, in my mind, is taking a chance with the future of taxonomy. It is not a chance I am willing to take, since the last 250 years are enough of a problem that needs fixing now to get a full grasp of it. 

I have no objection to digital archiving, but that does not preclude the need for the perpetual existence of a usable original copy. That is the only securely-archived original version. Your analogy with the meter standard doesn't work either. The meter is based upon calculations from the physical world - it does not rely upon being able to go back and check 150 years later, which taxonomy does rely upon. The length of a meter is a physical constant that can be calculated and demonstrated over and over and over. Regardless of technology to make these measurements, the constant is still the constant. It is inherently retrievable. That is not the case with original taxonomic information.

Doug Yanega wrote:
>If you're referring to the digital version of the hard copy not being 
>maintained in perpetuity, that's literally trivial; if GenBank asked 
>authors for PDFs of the papers in which their sequences were cited, 
>then do you honestly believe that GenBank's archives would somehow be 
>inadequate to the task? Digital is digital as far as storage, and as 
>far as format, PDF is NOT proprietary, and if the technology ever 
>"migrates", then the migration can be fully automated. Remember, a 
>centralized archive is NOT stagnant; it isn't "storage" in the 
>conventional sense of something being set aside and left untouched and 
>then retrieved at some later, indeterminate point, which is how 
>*private* archives work (and why private archives decay or become 
>obsolete - and why I've been harping on private archives as irrelevant 
>to the discussion); that sort of "storage" would only really apply to 
>backups and mirrors - the main archive, however, is *dynamic*, with all 
>of its elements up and running perpetually, constantly updating, error-
>checking, and so forth - there are no hiding places where some bit of 
>data (e.g., a PDF) can slip through the cracks and NOT be converted to 
>a different format when a different format upgrade is initiated. Again, 
>you can't think of a centralized archive as "storage" - the entire 
>archive changes every second of every day, and calling that "storage of 
>data" is like saying that a guy juggling three balls is "storing" them. 
>The bottom line is that any PDF of a paper is just as secure, 
>permament, and easy to archive and migrate as "simple data".

You say it's trivial, I disagree. I don't object in any way to digital versions - whether they DO in fact last into perpetuity is irrelevant to the discussion in 2010 - we have no way of knowing whether they WILL in fact last into perpetuity, and to throw all of our taxonomic eggs into that digital basket IS taking a risk. Suggesting digital is digital does trivialize the need for taxonomists to refer to original descriptions and nomenclatural acts. If you are suggesting a paradigm where this is NOT a need for taxonomists, then I will have to respectfully but strongly disagree on that point. Your references to the ease of data security are purely speculation in my view, since digital data security is relatively young field, and I don't find it compelling to rely on the need for the perpetual upgrading of outdated file formats to ensure the future of taxonomy.

Doug Yanega wrote:
>Your second point is the one I *don't* have a simple answer for, and as 
>such, is of greater general concern; "will taxonomy have the money and 
>resources that the field of molecular biology has?" To some extent, I 
>think we may be selling our commodity short a bit there; when you put 
>ALL of taxonomy together, and consider how essential taxonomy is to the 
>rest of the scientific community, it's not of trivial importance. Just 
>one observation alone can suffice to make the general point: none of 
>the data in GenBank are legitimately valuable to anyone if they are not 
>linked to an organism, and that link is taxonomy. 
>True, the bulk of GenBank is of common organisms whose taxonomy is 
>absolutely stable (like "Homo sapiens"), but there's a lot of stuff in 
>there for which taxonomy is crucial. Another part of this is that we 
>have never gotten together AS a community and said "We are unanimous in 
>our desire to have a permanent centralized archive - will you fund it?" 
>- how can we expect or imagine being given money if we haven't shown we 
>can work together or agree on anything? Consider that there *has* been 
>money and resources given to taxonomy (in the broad sense) - repeatedly 
>- to a number of different iniatives, each with slightly different 
>goals and approaches. Is it possible that overlap and/or competition 
>between these initiatives has created an environment such that nothing 
>that even *smells* like a cataloguing effort will attract new funding 
>(because "so-n-so is already doing that")? The bottom line here is that 
>your second point deals far, far more with politics than anything else, 
>and - as such - is less about logic or practicality, and accordingly 
>almost completely unpredictable. There are no easy, obvious answers, 
>aside from this one: we won't have a centralized archive if we abandon 
>the idea without even trying. THIS is the topic we most badly need to 
>be discussing, instead of the technical stuff. I agree that the analogy 
>to GenBank fails in *this* matter - the *politics* behind it - but 
>previous iterations of the discussion, including my original statement 
>of the analogy - were in reference to the *technical* side, which is 
>what people were worrying about, and the analogy still holds there.

I am willing to get onboard and say it would be highly desirable to have a permanent centralized and distributed archive. This is a pragmatic and useful thing to have available. However, I do NOT think this is the matter that is most important. Archiving is data storage, regardless of whether it is stagnant or dynamically upgraded, in PDF format, in XML, whatever. The data are useful in their own right, and it would be a positive thing for the world to have access to these data. However, this archive should be just that - a historical archive to serve the needs of the scientific community - but NOT serve as THE sole container for the "original". That is the inherent problem with e-only publication. There is this assumption that it will somehow remain "original" into perpetuity, or even to last into perpetuity at all. I just don't see the former as being so, and I have my doubts on the latter.

Doug Yanega wrote:
>As for a solution to the political dilemma, one idea I have raised 
>before, to limited and at best half-hearted response, is that - if 
>creating our own GenBank-like archive seems genuinely beyond our means 
>(either practically or politically) - we might consider riding on 
>GenBank's coattails; approach them and see if they would be willing to 
>incorporate taxonomic data in their archives. Then, in the best-case 
>scenario, the only funding *we* would need is for the process of 
>getting the data uploaded, and perhaps designing a taxonomist-friendly 
>interface; the actual infrastructure (otherwise a significant expense) 
>would be GenBank's, and already in place. As Donat has already 
>demonstrated, our data are NOT very different from their data, as seen 
>through the proverbial eyes of a computer. They already have several 
>orders of magnitude more sequences archived than there are 
>nomenclatural acts in all of recorded history; we would not make much 
>of a dent in their dataspace.

That would be a fine option for taxonomic data. No problem - archive all the data desired - it would be very beneficial if EVERYTHING was parsed and digitized and available to the world. The problem is when this digital archive is looked at as being equivalent to the original - it is akin to viewing a regional catalog as being equivalent to the original taxonomic data. As a taxonomist, catalogs are great tools, but you STILL need to consult the primary sources if you are going to do rigorous taxonomic research. And I don't think reliance upon a fluid data structure is going to achieve the stability needed by our field. Regarding what Donat had demonstrated - he demonstrated how data can be parsed and machine-read in an equivalent way to GenBank - that's all fine and desirable - but that is decidedly not the same thing as e-only publication, where that's ALL there is to a nomenclatural act.


Dr. Stephen D. Gaimari
Program Supervisor (Entomology)
Plant Pest Diagnostics Center
California Department of Food and Agriculture
3294 Meadowview Road
Sacramento, CA 95832, USA
Tel. 916-262-1131, Fax 916-262-1190
E-mail sgaimari at cdfa.ca.gov

More information about the Taxacom mailing list