[Taxacom] Article 8 compliance

Richard Pyle deepreef at bishopmuseum.org
Fri Apr 7 15:27:15 CDT 2017

Hi John,

MANY thanks for taking the time to read my epic post and provide such a detailed and thoughtful response.

> I still prefer my model of e-publications (specifically PDF/A) are treated the same
> as print publications and no mandatory registration on ZooBank. I think it is
> much simpler and neater with the only caveat being that we must get publishers
> on board with regards to exact dates and pagination. Personally I cannot see a
> better format of electronic publication than PDF and I cannot see why this
> should change for a very long time to come. This format has been in existence
> for at least 20 years and has changed very little in that time. Thus it could
> specified that the only form of electronic publication that is acceptable as a
> means of publishing taxonomic acts is by means of PDF/A.

I actually would see what you propose above as a better solution than one that requires actions in two separate places (in the work itself, and in ZooBank) to confer nomenclatural availability; so in one sense I agree that what you outline may be better than the status quo for e-publications.  But I still strongly believe this would be inferior to a system that separated the legalistic, Code-compliance aspects of nomenclature from the increasingly "messy" aspects of what we think of as "publication", and consolidated the latter into an online registration system (ZooBank on steroids).

> I agree with you that, even registering 400 names on ZooBank would take little
> time in comparison to the effort that has gone into arriving at that point. I
> actually did a little trial with a possible 7 fields which may be the minimum
> requirement [Superfamily, Family, Taxon Name, Author, Primary type, Type
> depository, Diagnosis] for a name registration in ZooBank. In this case I included
> the full description plus a differential diagnosis in the Diagnosis field. It took less
> than 1 minute 30 seconds to upload the data for each taxon. That would be 10
> hours solid work for 400 names, let's say two days' concentrated work. That is
> acceptable if you consider that it probably has taken 5-8 years to reach that
> point. 

Actually, I don't find that acceptable.  I would like to see the time investment to be limited to something less than 30 seconds per name.  For example, what if you already had your MS written in a word processor file, and the seven bits of information were well-formatted in the MS document file.  You could have the MS on the left side of your screen, and an Excel file in the right side of your screen, and you just went through each of the names and copied and pasted the information (including diagnosis) from the left side to the right, in tabular form.  Then you would upload the Excel file as one batch, and all names would be registered.

Alternatively, as journals move increasingly towards XML templates that are used to generate the PDF files, we could make it much easier for publications to offer this service to attract authors (as Pensoft currently does). Because the Journal would need to reformat your MS in the XML template anyway, as long as the XML template included your seven defined fields, then there would be no need for anyone to do any copy/pasting.  The Journal would simply submit the XML file to ZooBank as part of the publication process for automatic registration.  This actually already exists, and is in use right now.

The point is, we don't need to change how we actually do business (indeed, I would HOPE that the science aspect of what we do would continue as per community standards).  We just need to shift where the Code-governed nomenclatural actions actually happen.  Currently, it is either entirely within a paper-printed publication, or in a mixture of the Registry and electronic document for electronic publications.  The two major problems we face with modern names are: 1) the increasing ambiguities and inconsistencies of what constitutes a "publication" (with associated dates), and the problems that happen when discrepancies exist between the publication and the associated registration.  Consolidating all requirements for nomenclatural availability within an electronic publication/PDF (as you propose) eliminates the latter, but not the former.  Consolidating all requirements for nomenclatural availability within an online registration system (ZooBank on steroids) eliminates both.

> My problem here is that the full description and differential diagnosis by
> themselves are not enough to unambiguously define a species (although it really
> should be enough to define any taxon above that level). 

Fair enough.... but where is the line between taxonomy and nomenclature?  This is not a problem with the system I am proposing, this is a "problem" with the existing Code.  At the moment, the only requirement the Code has for the description and diagnosis is that it must be  "a description or definition that states in words characters that are purported to differentiate the taxon".  That leaves a lot of latitude, regardless of whether that information is captured in a paper publication, a PDF publication, or a robust registration system.  So, is this point about my proposed system, or is about the Code requirements in general?

> If the images and key are omitted then the usefulness of any
> diagnosis included with a registration on ZooBank will be seriously impaired. 
> I think for this model to work then it should be possible to use any accompanying
> diagnosis to fairly confidently identify any species at least as well as you would
> with a formally published taxonomic work. 

ABSOLUTELY Agreed!  The registration system would definitely ALLOW for the inclusion of anything that an author wished to include as part of the description/diagnosis.  This would include things that cannot easily be included within a PDF (e.g., video).  The question would be what is the minimum necessary information that is required for nomenclatural availability; and that question is equally problematic for paper and PDF publications as it is for the system I advocate.

> If there is a requirement for images
> in the system than it really would be a daunting task, and it should probably also
> include a key to species somewhere. On the other hand, if you keep it simple
> with only a diagnosis required then there is danger of it becoming equivalent to
> the Latin diagnosis that was the requirement in botany [rumour has it that many
> latin diagnoses were about as useful as a chocolate teapot].

No argument here, but again this has little or no bearing on the relative merits of the system you propose (treating PDFs the same as paper publications) vs. what I propose; except that the system I propose allows us more direct control over the nature of content that is required and/or optional.

> Another problem is that recording and linking taxonomic acts to any taxonomic
> database could become a bit of a nightmare. The database will have to either
> import information direct from ZooBank or have a URL reference or an
> embedded link direct to the registration. 

On this point, the system I propose is VASTLY superior to the system you propose, for reasons too numerous to list.  Other electronic online databases would have a MUCH MUCH MUCH easier time linking to and incorporating information from the system I propose than from myriad PDFs scattered over servers across the planet.  So here you make another point in favor of what I advocate.

> Then, assuming that authors may/will
> include minimal information in ZooBank to make their names available there will
> also have to be a second entry to the relevant published article (assuming there
> is one) as a bibliographic reference, URL or direct link. 

Again, this is much easier to accomplish through the system I am proposing than the one you propose.  The link to the publication would only need to happen once (within the robust ZooBank), and then all other databases on the planet would have direct access to this information.

> In many cases, and I really
> think in many cases, authors will put their main taxonomic contribution on web
> sites because they will consider that once the names are available having been
> registered on ZooBank there is no need to go to the expense and tedium of
> publishing formally in journals, etc. I would not regard this as taxonomic
> vandalism but as a logical thing to do. These sites will have a limited "shelf life".
> After a time we would end up with a good proportion of taxonomic names
> without access to accompanying practical diagnoses. Maybe that would be no
> worse than what we have at present. I do not know. 

This is the one issue that I think has real merit.  Let me rephrase it this way:
Do most taxonomists develop their reputations and careers through the Code-compliant names they establish?  Or do they develop their reputations and careers through the body of scientific work they publish?  In my own case, and in the case of the vast majority of my colleagues, it is unambiguously the latter.  But if, in fact, it is the former (notable examples have been discussed on this forum), then there could indeed be a steady decline in overall taxonomic quality as new researchers trend toward "lazy" and only go as far as the minimalist requirements to establish new names, without feeling the need to do the hard work of taxonomic science through a publication.

My main counterpoint to this is that such lazy taxonomists ALREADY have the tools at their disposal to accomplish this with the existing rules (again, notable examples have been discussed previously). But MOST taxonomists don't do the minimalist approach to producing available names, because, I think, most taxonomists develop their reputations and careers through the body of scientific work they publish, not just through the Code-compliant names they establish.

> However, the increased
> logistical problems associated with ZooBank registration and maintaining an up
> to date taxonomic catalogue are quite serious.

Again, the opposite is true.  This applies to the status quo, and would likewise apply under your proposal for treating PDFs just like paper publications.  The system I propose is by far the superior solution to maintaining an up-to-date taxonomic catalogue.  In fact, that's one of the primary virtues of the system I propose over and above the existing paradigm.

> My final reservation I have brought up previously. That is how can the future of
> ZooBank be guaranteed? Under the current system if ZooBank suddenly failed
> we would lose very little. But what is the legacy for the future if we follow the
> registration=available model. Using this system we would be putting all our eggs
> in one basket. Failure of ZooBank would be catastrophic for nomenclature and
> thus taxonomy. It may take only $1.5m to get ZooBank working in the way that
> you would like but I still believe that we need some sort of cast-iron guarantee
> that ZooBank will be here well into the future. That could be quite costly in
> terms of maintenance, moving to a new platform every few years, security, etc.
> I think that guarantee would require a seriously large endowment because so
> much would be hanging on it.

This is perhaps the MOST legitimate concern about the system I propose -- and believe me, I have spent many, many, many hours thinking about it.  I have a fair bit of empirical data going back nearly ten years on these exact issues.  ZooBank was first officially launched on January 1 2008.  Regular and long-time users will know that the road has not been perfectly smooth.  The main problems that have been encountered so far include periods lasting up to several days when ZooBank was offline (this happened several times early on, but not so much recently), intermittent periods when the server was very slow, and a few other hiccups.  The good news is that it has steadily improved in reliability.  Just last week we moved the entire system (website and database) to a new server, which now seems to be more stable and reliable (and faster) than the one it was on previously. The good news is that no records have been lost during this time (that I am aware of). These ten years have been an extremely useful learning process, which is why we now have a very clear roadmap of how to get from the current system to the one I envision ("ZooBank on steroids").

But let me address a few specific aspects of your very legitimate concerns.  

The first is financial cost. Up until last week, ZooBank was located on commercial hosted server in Arizona (GoDaddy).  This server cost $300/month, which Bishop Museum has been paying as a service to the ICZN and broader taxonomic community.  Recently the Museum upgraded its internal server architecture, and as a consequence we decided to move ZooBank to one of our internal servers, which is both more powerful and robust than what we had on ZooBank, and costs the institution less money (basically it is a trivial fraction of the money the Museum already spends on maintaining a computer server system).  Essentially, the only ongoing financial maintenance costs for ZooBank are the hard drive space (about 100GB, most of which is in the form of local backups, out of a system that has ~100TB of storage space; so 0.1% of the total storage capacity at the Museum), the electricity (essentially zero, given that the Museum needs to power the servers anyway), and the internet bandwidth (again, ZooBank's share is immeasurably trivial compared to the rest of the Museum bandwidth usage).  In summary, the WORST case scenario for financial costs of ZooBank is $300/month, which is small enough that my own institution has absorbed it; and certainly within a plausible funding scenario for ICZN.  However, the current state (better, faster, more robust server) costs almost nothing in terms of cash, and a very small amount in terms of disk space (essentially 10% of a 1TB disk, which one can now purchase for about US$50) and trivial bandwidth compared with what most institutions serve.

The second is time.  I receive about 5-10 emails per week requesting assistance with ZooBank issues.  Most of these require about 5-10 minutes of my time (some longer, some shorter).  I'm more than happy to address these in my spare time, and even if I weren't certainly these issues could be absorbed by the ICZN Secretariat and/or any number of volunteer taxonomists with enough technical expertise to know how to resolve these issues.  In addition to this "customer service" aspect of ZooBank, I spend maybe about 3-5 full days per year doing things like migrating the website to a new server (last week was the second time we've done this since 2008, and it gets easier each time), and/or fixing bugs.  Sometimes I like to add new features when time permits, but that is really best done as part of the major overhaul which we've already budgeted at about $1.5M, and which would include adding all the features I describe for making it "ZooBank on steroids", as well as importing enormous historical names content from sources like Catalog of Life, GBIF, NCBI, Sherborn, Systema Dipterorum, and many other similar databases that have expressed a strong willingness to share content.  It would also include developing better tools to allow publishers to conduct registrations as part of their publication workflow (as Pensoft already does), and also better tools for data cleanup and maintenance, cross-linking to other databases, and a bunch of other stuff.

The third is reliability.  By "reliability", I mean several things.  The first (short term reliability) is that the website continues to function on a day-to-day basis.  I already discussed that above.  The trend has been towards increasing reliability of this sort, and I have high expectations that the recent server migration will bump this up even more.  As technology improves, this will continue to become better and better (e.g., moving it to a cloud service), and cheaper and cheaper. The existing ZooBank master copy (where new records are added and existing records are edited) has always been on a single server.  However, the "ZooBank on Steroids" option would exist on many (perhaps dozens) of servers around the world, which are replicated in real time.  This is along the lines of what airlines use to manage reservation systems.  We already have been testing this since 2012, and we've had as many as 11 copies on 4 continents running in parallel with 7 second latency (meaning that changes made to one copy are reflected in the other copies within 7 seconds). With such a system, day-to-day reliability goes WAY up because we can set it up with an automatic failover system such that when one server goes down, traffic to that server is automatically redirected to another server, and the end user is never the wiser.  The only financial outlay required to implement this is that more institutions need to be willing to host a copy on their institutional servers as Bishop Museum already does.  With new tools we're developing to allow direct integration of the database behind ZooBank with specimen databases, there is NO doubt in my mind that many institutions would be willing to shoulder the trivial costs (lost in the background noise of normal server maintenance costs) of hosting a local copy of ZooBank.

But the bigger (biggest, in my mind) issue is the long-term reliability question.  In developing the original ZooBank, my thought process was on the scale of centuries.  That is, I made decisions about implementing certain aspects of the database that would lend themselves towards a system most likely to endure the no-doubt radical changes that will come to electronic information systems in the coming decades.  This is a very open question not just for ZooBank, but for ALL systems that rely on electronic data being perpetuated.  Because we humans are so "new" to electronic information technology (only about half a century old), we haven't really developed the track record that is enjoyed by, for example, papyrus (think Dead Sea scrolls).  Indeed, this was one of the main arguments against allowing electronic publication of nomenclatural acts in the first place.  

I don't have the perfect answer to this long-term reliability question.  But I am confident that the system I propose has FAR better prospects for long-term/perpetual accessibility (e.g., via dozens or even hundreds of synchronized replicated copies on servers around the world) than your proposal for allowing scattered PDF files to serve as the basis for nomenclatural acts.  Indeed, I am reasonably confident that as long as human society exists, and electricity is relatively available, then ZooBank will continue to be accessible.  If either of these fails (civilization or available electricity), then I suspect the people of that era will have bigger issues to solve.  And besides, they wouldn't have access to the PDF files in that circumstance either. Just to show you that we have considered this, it would be relatively easy to set up a system to regularly print copies of ZooBank additions and edits on archival paper and store them in safe places.  For example, if a dozen or so of the institutions did this (printed monthly update reports on paper and stored them in their respective institutional libraries), then we'd have a reasonably reliable paper-based backup even in the event of the collapse of civilization.

> I am not sure that I like any form of peer review, unless you mean mandatory
> fields in the registration process that must be completed to make a name
> available.

Fair enough.  I'm not sure I like it either.  I was just explaining that IF the community did move towards peer-review as a requirement (one of the reasons people often cite for preferring names and acts to be established within publications), then it would be far easier to implement in the registration system I envision.

> I cannot see any way that publishers will move away from PDF's for electronic
> publishing. It is just too convenient and the fact that it is used so much for
> storing data means that it surely will be used for a very long time to come
> because so much depends on it.

I agree completely!  But this has no bearing on my argument.  It does support your proposal for adopting PDFs as a de-facto standard in electronic publication.  My only comment is that when drafting the 4th Edition of the Code, the Commissioners apparently felt that "laser disks" were likewise going to be a stable medium indefinitely.  That mistake was dealt with in the 2012 Amendment.  I would predict that PDF formatted files have better prospect for longevity than "laser disks", but in keeping with the centuries-long vision for information availability, I would lean more towards standards like UTF-8 encoding of characters, and perhaps XML markup.  But that's another discussion entirely....

> >- Never need to wonder if a work was issued for the purpose of
> >providing a public and permanent scientific record, and met all the
> >other criteria for publication
> This would also be the case with treating e-publications the same as print
> publications.

Right -- paper and print publications both suffer the same need to ascertain whether Art. 8.1.1 has been met.  In my system, we don't need Art. 8.1.1 anymore.

> >- ZERO confusion or ambiguity about when a name was established (for
> >determining priority)
> I agree, but is this a bit of a red herring? In the grand scheme of things, in how
> many cases is knowing the exact day of publication important.

In terms of "exact", I agree with you.  But in the current publication system (especially electronic), the discrepancy might be on the order of months. Even that may not be an issue in the grand scheme of things.  But people sure do talk a lot about it (what with all the talk of "version of record" and such).  So apparently some people think it's important.  A non-trivial component of discussion about ambiguities of establishing new names involves the date of publication for purposes of nomenclatural priority, so it's not completely a red herring.

> >- Free/open access to all the core criteria necessary for establishing
> >ALL new names (type designation, etc.)
> Yes, but will it be all that useful? Encouragement of making all taxonomic
> publications freely available and archiving them on ZooBank would be hugely
> more beneficial.

Perhaps, but again, whether or not the information that accompanies an original description is useful is completely independent of what we're discussing.  This is a separate issue that applies to paper publications, PDF publications, and registered=available registration system equally.  The fact that the information that DOES accompany the original description (whether robust or minimal) would ALWAYS be open-access in all contexts in the system I propose definitely stands as an advantage.

> - Single resource for all new names
> Yes, but what about other nomenclatural acts such as typification?

ALL nomenclatural acts would come through the "ZooBank on steroids" system.  If it's required by the Code, it would happen via the registration system.

> .- Much easier to implement new rules as the community demands them
> Treating PDF/A equally as print would not require  any new rules for a veryl ong
> time to come.

Maybe -- but the devil is in the details.  You'd need to propose precise wording to how the 5th Edition (or another Amendment to the 4th Edition) would define a PDF publication, and that would likely be just as fraught with ambiguity as defining a paper-based publication currently is.  All that goes away entirely with the system I propose.  My point is that even if you make the PDF definition the same as the paper definition, you still have issues to sort out that wouldn't exist if we divorcd nomenclatural actions from publication entirely.

> >- Elimination of homonymy
> A very minor problem not really worth considering.

My understanding is that about 10% of all genus-group names suffer some of homography (either proper homonymy, or other forms of text-strings duplicated through misspellings).  Maybe 10% is "minor", but it's not trivial.  And it's still an advantage, even if a small one.

> >- Reduction of ambiguity in linking a name to a type
> Is there any?


> >- Elimination of new names that are unavailable on technical grounds
> Rare. Apart from the current confusion over availability of some e-publications
> there are some cases of non-specifying primary type depositories  but these can
> be easily solved. In any case these are rare.

Maybe.  I'd like to see some actual analysis to get a sense for how rare.  And even if they are rare, they command a disproportionate amount of time (taxonomists, Commissioners) to resolve.

> - Elimination of the VAST majority of other nomenclatural headaches we still
> need to deal with
> I don't think so. That would need a complete rewrite of the Code to make it
> clearer, less subjective and less contradictory. On the other hand if we got rid of
> gender agreeement . . . .

That's PRECISELY what I am advocating (complete rewrite of the Code).

> > If e-publications are accepted without the impediment of registration
> > the it could get very much cheaper and quicker to publish taxonomy.
> >>How so?
> Self-publication. Cheaper. Faster. Better. No impediment of copyright, therefore
> better distribution to those that need it or are interested.

Hmmm.... you may find some in our community would disagree with your assertion that "Self-publication" is necessarily "better" (at least in terms of average taxonomic quality).  Also, just because something is self-published doesn't mean it's free of copyright.

In summary, although I don't agree with some of the concerns you raised (in that they apply equally or even more so to the existing system or the PDF-as-paper system you propose, than they apply to the system I propose), I certainly do agree with several of them (as detailed above).  And my overall conclusion comes back to my first reply to your email on this thread:

Do the benefits of ZooBank (and the new system I propose) exceed the costs?  As I said originally, and as we have demonstrated through these lengthy posts, it's an exceedingly complex question.  But the more I work on that calculus, the more obvious it is to me what the path forward should be.


Richard L. Pyle, PhD
Database Coordinator for Natural Sciences | Associate Zoologist in Ichthyology | Dive Safety Officer
Department of Natural Sciences, Bishop Museum, 1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252 email: deepreef at bishopmuseum.org

More information about the Taxacom mailing list