[Taxacom] ICZN 5th Edition Code (was "Taxon Filter" (was Re: Electronic publication))

Richard Pyle deepreef at bishopmuseum.org
Thu Jan 12 13:15:33 CST 2017


Hi Doug, All,

I've once again changed the subject line, as I hope to steer this conversation in a particular direction (crafting the 5th Edition of the Code). This didn't start out as an epic post, but it became one, so I apologize for its length.  If you don't want to read the whole thing, please skip down to the "*******" divider and start reading at the sentence that begins "The key point I want to make in this post..."  Everything between here and there is important contextual information, so if you're not intimately familiar with the history of this debate, you will probably benefit from reading the whole thing.

Doug and I (both ICZN Commissioners) have been discussing these ideas for the past 15-20 years (including within a couple of publications*), and in summary our discussions recognized four fundamental models for making names available:

1. Published=Available (how it's been done since the time of Linnaeus)
2. Published+Registered=Available (the current model for electronic publication)
3. Registered=Published=Available (what Doug is advocating)
4. Registered=Available (what I am advocating)

An important point to remember here is that the word "Published" in this context means explicitly "Published in the sense of the ICZN Code".

The first model has worked relatively well for about 250 years, but mostly in the context of paper-printed publications.  Throughout most of the history of scientific publication, works produced on paper involved costly print runs, such that in the vast majority of cases, numerous identical copies were produced in batches.  Slight alterations would have required another costly print run, and thus were rarely produced.  Reprints represented a bit of a possible problem (i.e., there are often discrepancies between the volume-printed version of a work and a reprint), but for the most part the Taxonomic community collectively agreed that reprints didn't really "count" (although in some cases, one could challenge this via a strict interpretation of the Code).  Under this model, there had been some ambiguity about whether some works were published in the sense of the Code, and perhaps a bit more ambiguity on exactly when (completely specified date) they were published (requiring reviews of library receipt date stamps and such), but for the most part problems associated with "Publication" were kept to a minimum.

Things started to change during the first decade of this century when publishers began to more routinely produce electronic PDF versions of scientific articles.  At first there wasn't really much of an issue, because the Code made it clear that works produced as electronic signals were not published in the sense of the Code.  But as demand by the taxonomic community increased to allow electronic editions of works (either made obtainable months earlier than the paper edition, or in some cases when only electronic editions were produced with no paper edition) to be considered published in the sense of the Code, the Commission began drafting an Amendment to accommodate works published electronically.  This was in 2008, which coincided with the initial launch of ZooBank, so it was natural that the second model (Published+Registered=Available) emerged from that process.  The Amendment was publicly debated for four years, and went into effect in 2012. During those years, most of the problems that have come up were anticipated, but no one knew which problems would be the most troublesome.  Thus, the Amendment was viewed as a grand experiment to discover what works and what doesn't for this model, both in terms of designing the registration system and in view of the highly evolving nature of electronic scientific publication.  The idea was to learn from this grand experiment, so that a more robust and lasting solution, developed with the benefit of hindsight, would be implemented in the 5th Edition of the Code.

So... here we are at the dawn of 2017, five years after the Amendment went into effect, and the Commission is currently ramping up the process for drafting the 5th Edition of the Code.  One of the biggest outstanding questions on how to craft that Code is the issue we are discussing here (hence the epic emails, and my hope and gratitude that at least some people are actually reading them).  Obviously, one of the things we need to take into account is the lessons learned from the past four years of experience with the "Published+Registered=Available" model.  There is a subcommittee of ICZN Commissioners who have specifically addressed the problems (so-called "Limbo names") in electronic works, and it boils down to six types of problem, summarized as:
1) The work was first issued prior 2012 (Art. 8.5.1);
2) The work itself does not state the date of publication (Art. 8.5.2);
3) The work is not registered in ZooBank (8.5.3);
4) The work itself does not contain evidence of registration (8.5.3);
5) A name and internet address of an organization other than the publisher that is intended to permanently archive the work is not indicated in ZooBank (8.5.3.1);
6) An ISBN for the work or an ISSN for the journal containing the work is not indicated in ZooBank (8.5.3.2).

The first is not common. The second is what started this thread via John Noyes' post.  The third and fourth are probably the biggest source of problems, but we have no effective way to assess the scale of these problems. Number 5 is one of the more apparent problems, and number 6 much less so. The fundamental problem introduced in the Published+Registered=Available model is that we now have two different places (and times) where information must exist in order for a name to be available (the publication, and the registration record).

Separate from these six issues that can cause names to be technically unavailable, there is also the (growing) problem of pinpointing exactly when an electronic work is published in the sense of the Code.  Unlike the days of paper-printed publications, new versions of electronic works can be updated and re-issued almost at no cost.  As such, when problems are discovered (e.g., work isn't registered, or evidence of registration not included within the work), then it's often difficult to ascertain exactly when a work was published.  Although we do have Article 8.5.2, this article does not state that the date indicated in the work itself be *accurate*.  There is no consistency among publishers in representing this date within the electronic work.  In short, the current situation of Published+Registered=Available, while perfectly functional in the vast majority of cases, has revealed several key problems that are compounded by the rapidly evolving nature of electronic scientific publication in general.

So.... returning now to the recent thread, Doug's recent post, and the direction I hope this conversation leads....

As I mentioned, the Commission is currently starting the process of drafting the 5th Edition, wherein we hope to resolve the existing problems while simultaneously preventing the creation of new problems.

One option is that the 5th Edition can just patch a few holes, add a few band-aids, and otherwise perpetuate the status quo.  I, and many other Commissioners, have been advocating a much more bold and progressive approach to drafting the 5th Edition, particularly where it concerns new names established after the 5th Edition goes into effect. This ties back to Hinrich's comments about getting ahead of problems and being more proactive.  There are many aspects to how the new Code could be restructured to make things better and easier, but perhaps the largest and most significant of these is defining which of the four model Doug and I outlined (at the top of this message) to follow for all new names (not just those in electronic publications).

The "band-aid" solution would likely implement refined versions of models 1 & 2 ("Published=Available" and "Published+Registered=Available"), perhaps extending the latter to apply to all names (not just those within electronic works).  However, the more progressive/proactive solution would focus on models 3 & 4 (Registered=Available and "Registered=Published=Available").  

*******
The key point I want to make in this post (sorry it took me so long to get here, but I wanted to make sure we were all on the same contextual page) is that the model that I advocate (Published=Available) and the model Doug advocates (Published=Registered=Available) are actually both the SAME model, but framed in different contexts.  The reason I say this is that I prefer to think of "robust registration" (i.e., a registration system that is much more comprehensive than the existing ZooBank) as something that is different from "Publication" in the traditional sense (even though both achieve the same ends: sharing of information publicly); whereas Doug prefers to maintain most aspects of the traditional publication process as embedded within the Registration system.  But regardless of how you label it (i.e., with or without the word "Published"), they both involve the same set of goals and questions:

Goals:
We want a system of establishing new names/acts that:
- makes relevant information publicly accessible for free and easily discoverable in real-time as it is produced;
- minimizes ambiguity about availability;
- eliminates ambiguity about priority (i.e., unambiguously time-stamped);
- minimizes or eliminates ambiguity about typification;
- minimizes or eliminates future homonymy;
- minimizes bogus/shoddy names (taxonomic noise, such as heterotypic synonyms, incorrectly formed names, "vandalism", etc.)
- includes other useful stuff that I'm sure most people would agree with

The outstanding questions on how best to achieve these goals are very-much open to debate, and where you fall on answering these questions largely determines which end of the spectrum (my view vs. Doug's view) you tend to fall.

Questions (for the 5th Edition of the Code):
1) Should registration be required for all new names & acts (not just those published electronically)?
2) How much of the existing metrics for Code compliance can be algorithmically verified (i.e., without the need for subjective human review)? [and therefore embedded within the registration system]
3) What should constitute the new *MINIMUM* standard for information necessary for establishing a new name (e.g., absent any external source, such as publication)?
4) What additional *OPTIONAL* (but not required for availability) information should be supported within the registry?
5) Should the registry include aggregations of new names/acts within a defined set (analogous to how single publications may include multiple new names/acts), or should the new registry simply be structured around names/acts individually?
6) If aggregations of new names/acts are desired, then what kinds of information do we want to capture for the aggregate set of names/acts separate from the names/acts themselves?
7) How explicitly should typification be asserted? [Currently only the name/location of the type depository is required; but perhaps there should be an electronic link to a specific specimen analogous to how gene sequences are registered in GenBank]
8) How can/should we improve quality control through human subjective review as part of the REQUIRED process for making names/acts available? [This is the big one.]

I suspect that if Doug and I sat down, we'd come to a mutual agreement on the first seven of these items, so really the only difference in our approach is how to answer #8. In Doug's model, the process would still very-much resemble the current publication process, except that reviews would be open and non-anonymous, and it would all happen within a single "Journal".  I actually support much of this stuff as well, but I don't think of it as a "publication" -- but rather a review process for registration records (replacing the registration system for the single "Journal").  In fact, reading Doug's fifth paragraph below (that begins "Author X would submit their manuscript..."), if you replace his word "manuscript" with my term "registration record", I think I could agree with almost everything in that paragraph.

The only real differences between what Doug and I are advocating are:
1) Where to draw the line between Mandatory and Optional information necessary to confer availability through the registration process; and
2) The extent to which the process resembles a "publication" (i.e., PDF that is cited and contributes to tenure, etc.)

We're probably not too far apart on #1; but for #2, I strongly advocate a relatively "thin" registration system, that simply fulfills the minimum requirements for establishing an available name, and does not in itself hold much prestige for the author(s).  In my model, the "science" (taxonomy) continues to happen among the pages of widely diverse journals, and the only real difference from the existing (and evolving) status quo is that all information necessary to confer availability is consolidated within the registration process (completely divorced from the traditional publication process) -- in much the same way that historically it was all consolidate within the paper-published work.

The concern regarding my approach is the fear that certain unscrupulous individuals would exploit the "thin" registration system to perpetrate taxonomic "vandalism". My counterpoint to this has always been that the bar is already extremely low as it currently is with the existing system (which is why vandalism exists), so we wouldn't be any worse off than we are now.  But I also concede that Doug would like to find a way to dramatically improve the situation to prevent or at least minimize vandalism, and hence the need for *some* form of subjective peer review.  I now tend to agree that such review should be included within the Registration process, but the devil lies in the details.  I wholeheartedly support the model of review that Doug has outlined, but my main concerns are that:
1) There should be a process that arbitrates the reviews as objectively as possible (not left to the subjective opinion of a single editor);
2) There needs to be a time limit (and/or system for extending the time limit) for the review period;
3) There needs to be a clear system of arbitration when it is not possible to objectively determine whether a registration record is accepted.

All the objective criteria can be embedded algorithmically within the registration system.  But there is a need for human subjective review to minimize vandalism, and to help promote collaboration as Doug has described.

OK, there went my morning.  And I doubt many (any?) of you are still reading.  But in case you are, thank you for indulging me.  Above all, I hope we can distill this conversation down to the main issues (e.g., the 8 questions I listed above, plus I'm sure many others that I didn't think of just now).

Aloha,
Rich

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences | Associate Zoologist in Ichthyology | Dive Safety Officer
Department of Natural Sciences, Bishop Museum, 1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html



*Publications relevant to this:
Polaszek, A., M. Alonso-Zarazaga, P. Bouchet, D.J. Brothers, N. Evenhuis, F.-T. Krell, C.H.C. Lyal, A. Minelli, R.L. Pyle, N.J. Robinson, F.C. Thompson, & J. van Tol. 2005. ZooBank: the open-access register for zoological taxonomy: Technical Discussion Paper. Bulletin of Zoological Nomenclature. 62(4):210–220.

Polaszek, A., R. Pyle & D. Yanega. 2008. Animal names for all: ICZN, ZooBank, and the New Taxonomy. pp. 129–142. In: Wheeler, Q.D. (Ed.). The New Taxonomy. CRC Press, Boca Raton. 237 pp.

Pyle, R.L. & E. Michel. 2008. ZooBank: Developing a nomenclatural tool for unifying 250 years of biological information. Pp. 39–50. In: Minelli, A., Bonato, L. & Fusco, G. (eds.) Updating the Linnaean Heritage: Names as Tools for Thinking about Animals and Plants. Zootaxa, 1950, 1–163.


> -----Original Message-----
> From: Taxacom [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf
> Of Doug Yanega
> Sent: Wednesday, January 11, 2017 8:55 AM
> To: taxacom at mailman.nhm.ku.edu
> Subject: [Taxacom] "Taxon Filter" (was Re: Electronic publication)
> 
> For the record, the "Taxon Filter" concept is one that Hinrich largely
> borrowed from a model I have been proposing for some time now, and he
> and I had extensive discussions prior to his adopting that name. There is a
> very important aspect of the process which does not seem to have been
> clearly explained, and I think it is highly relevant to several of the issues
> raised in this present thread. Allow me to give a hypothetical example to
> illustrate:
> 
> Suppose author X wants to describe a new species of bumblebee (the genus
> Bombus, family Apidae, order Hymenoptera, class Insecta). They have three
> female specimens, from two localities in Mexico.
> 
> Under the status quo, they submit a manuscript to journal Y and it is seen by
> three anonymous referees plus a subject editor before being accepted, with
> minor revision, and published. Under the status quo, it is ALSO possible that
> one of the referees (let's call them Z) might realize "Oh, I have some
> specimens of this new species myself!" and they could quickly publish their
> OWN description, and "scoop" author X, usurping their discovery. Author X is
> furious, but cannot prove that Z was one of the referees, because they are
> shielded by anonymity.
> 
> Under the model that I and Hinrich have been advocating (at least my version
> of which is NOT a "minimal requirements" model), this would be quite
> different.
> 
> Author X would submit their manuscript (in my model, the *whole* thing) to
> THE single official venue - most likely ZooBank - that acts as a registration
> portal for all new nomenclatural acts. The instant it is submitted there, an
> automated message goes out to every taxonomist who is a registered user
> of that venue AND who has self-selected any of the following key words:
> Bombus, Apidae, Hymenoptera, Insecta, Mexico, "new species" (among
> others). Instead of just 3 anonymous referees, the manuscript is thereby
> opened to review online by *hundreds* of people, including the majority of
> the world's experts on bumblebees. If a person like Z is among the reviewers,
> they CANNOT do anything to usurp the taxon as their own, because now
> there is ONLY ONE VENUE. They would have to submit their manuscript for
> registration to the *same* place, and have it seen by the *same* reviewers,
> as author X - and *including* author X! It would be immediately obvious that
> the two works referred to the same taxon, and since author X had submitted
> first, their registration could still be approved, but Z's would definitely be
> rejected. The BEST scenario that Z could hope for here is that X would agree
> to add them as a co-author. In fact, that could turn what might otherwise
> have been a bitter rivalry into a cooperative, win-win venture. Especially if
> author Z had specimens from different localities, or male specimens to help
> better characterize the new species. For that matter, ANY of the hundreds of
> reviewers might have additional specimens or data that they could
> contribute (with or without co-authorship), and the resulting species
> description could be *vastly* improved over the one produced under the
> traditional publishing model. Open review makes a level of collaboration
> possible that is NOT part of the present competitive publishing model. So,
> we'd see not only a BETTER end product, but one that is immune to being
> usurped, and - once registered
> - can be submitted for publication wherever the author wishes, and it won't
> make a difference how long it actually takes to get into print, because the
> date of registration (and availability) of the name is *already established*
> and NOT dependent on the date of printing.
> 
> That last clause is EXTREMELY significant in regards to the present debate
> over dating, pre-prints, digital versus paper, and such: if the date a name
> becomes *available* is the date it is *registered*, then it makes no
> difference at all when the formal publication takes place, or where, or
> whether is is e-only, or hard copy only, or privately printed, or printed on
> demand, etc.
> 
> I have been arguing for some 20 years now that adopting this approach
> would be to everyone's collective benefit, for many, many reasons. No
> longer having to worry about the date of publication (or digital versus paper),
> is just one of those many reasons.
> 
> --
> 
> P.S.: I can imagine several of you immediately leaping forward with
> questions like "But what if author X takes the manuscript after it is
> registered, and changes it before it is published?" "What if they never
> formally publish it?" - and while those are fair questions,
> superficially, I also think you'll see a few things: first, by virtue of
> the open review process, there is virtually no reason that there WOULD
> be any changes between registration and publication. After all, most (if
> not all!) of the potential referees for the print version will have
> *already* reviewed the work. No errors should slip through that would
> require fixing; e.g., if someone's proposed new names are synonyms, or
> homonyms, or there is some other error regarding Code-compliance
> (failure to state type depository, etc.), that would *all* *get worked
> out* before the work could be approved for registration. Second, if
> changes *are* made, there are several options that could render this a
> non-problem. Which option people prefer could be a separate topic for
> discussion, but off the top of my head, (1) declare that only the
> official registered version of the work has *nomenclatural* standing.
> I'm not talking about minor changes in the final published version,
> which would be irrelevant, but something like altering the composition
> of the type series, changing the spelling of a name, etc. Note that this
> would, in effect, make the archived registered work the functional
> equivalent of a digital publication. As such, even if it never got
> "published" anywhere else, the names and acts in it would still be
> available, *and* possible to cite. This is one of the main reasons that
> I advocate that the *entire works* be registered, rather than just a
> minimalist template. (2) If the author insists that a revised published
> version *needs* to replace the previously-registered version (e.g., they
> found a better specimen to be selected as holotype), then - thanks to
> the entire process being archived - everyone who was involved in the
> original registration approval could be sent a follow-up e-mail asking
> whether or not they approve of the revised version; if so, then the
> revised version replaces the original. This could only be done ONCE, in
> conjunction with the first post-registration publication. (3) Some
> publishers might be convinced to accept the registered version *as is*
> and just print it without any further review or editorial process.
> 
> This all could work, and work well.
> 
> Sincerely,
> 
> --
> Doug Yanega      Dept. of Entomology       Entomology Research Museum
> Univ. of California, Riverside, CA 92521-0314     skype: dyanega
> phone: (951) 827-4315 (disclaimer: opinions are mine, not UCR's)
>               http://cache.ucr.edu/~heraty/yanega.html
>    "There are some enterprises in which a careful disorderliness
>          is the true method" - Herman Melville, Moby Dick, Chap. 82
> 
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
> 
> 
> Nurturing Nuance while Assaulting Ambiguity for 30 Years, 1987-2017.



More information about the Taxacom mailing list