[Taxacom] Global biodiversity databases

Hi Tony,
Interesting, and thanks for taking the time to reply (I have had one other, off-list, reply so far). Some brief comments on what you have said:
(1) It is perhaps a little bit inappropiate to call Wikispecies a "secondary aggregator", because, as you note, it does aggregate in part from primary sources (and is to that extent a primary aggregator), and to the extent that it is also a secondary aggregator (aggregating from primary aggregators), it does not do so uncritically, unlike the true secondary aggregators like CoL (they may critically select their data providers, but that means little as their standards are pretty low, and then they just uncritically accept all the data from the chosen providers);
(2) my question about "expert validation" vs. user verifiability was perhaps a little badly worded ... of course BOTH would be best, but my main point was that user verifiability (against *primary sources*) is surely crucial? Any user who just settles for "expert validation" is a mug, IMHO ... (and so little of what is on CoL, WoRMS, etc. is verifiable against the primary sources);
(3) Wikipedia and Wikispecies really shouldn't be viewed as alternatives, but should rather be used together, for they each have strengths and weaknesses which add up to a new strength. Aside from both WS and WP being currently incomplete, when a taxon page is done properly on WS, it should refer the reader to all the most important references, though it may not spell out all the information contained in those references in quite the same way as a well-constructed WP page on the same taxon. WS is much better than WP at keeping track of the basics, however, and so should always be checked for consistency with the corresponding WP page. IN cases where they differ in content, then it comes down to referencing and user verifiability to decide between them ...

Hi Stephen,

Those who know me might appreciate that I have some interest in this area, e.g. see a couple of recent presentations:


Without knowing the subtext to your question(s), here are the answers I would give if pressed...

> Question 1: Do you expect a comprehensive and reliable GBD to exist in
> the foreseeable future (or do you think that one or more already
> exist)? If so, do you think it is likely to come from an existing
> initiative, and if so, which one(s)?

I think you have to split this across short term vs. medium/longer term. Short term answer is that currently you have to do a mix-and-match across the best curated resources for specific groups: examples being Eschmeyer's Catalog of Fishes for the latter (extant species and genera), Index/Species Fungorum for fungi, Systema Dipterorum for Diptera, etc. etc.; notable cross-group compilations being Catalogue of Life (which is really a collation/fusion of 100+ "expert curated" systems), WoRMS (similar for some 20 contributing components) for marine species, and so on. For higher plants there is The Plant List, for algae AlgaeBase, for prokaryotes there is List of Prokaryotic names with Standing in Nomenclature (LPSN) plus CyanoDB, and for viruses there is the ICTV database. I would term these (with the exception of the composite CoL dataset and the Plant List) "primary aggregators" which ideally are the realm of experts in their respective fields (my 2
 cents anyway).

Medium to longer term there is the hope/wish/desire to move to an environment where as many as possible of these resources agree to collaborate in a common infrastructure, currently termed the Global Names Architecture or simply GN for Global Names. A recent meeting in Hawaii aimed to address some of the challenges to doing this, see http://www.globalnames.org/taxonomy/term/169/0 .

Meanwhile while we wait for GN to deliver the "holy grail" there are secondary aggregators of which my project, ITIS, Wikispecies, Wikipedia and more might be cited as examples, taking material from the primary aggregators and original sources to build something more complete than any single source. Speaking from experience I do this without necessarily taxonomic expertise in any particular area, but hopefully some ability to make calls on which source to use or weight accordingly in the case of conflicting information. In some cases these secondary aggregators may also have a slightly different remit than the primary ones e.g. fleshing out with images, descriptive information or distributions absent from the purely nomenclatorial or bare-bones species lists. Whether these will also move into the GN space as discrete entities maintained separately for ever or will coalesce into a few larger units remains to be seen.

> Question 2: Which would you prefer, (A) data verified by "experts"; or
> (2) data verifiable by the user (via referencing)?

Answer would be both (see also examples given below). If an expert has made a call then that saves me (the user) doing the same! At the same time the more evidence which is included on which the call is based the better, so one can assess the currency and quality/credibility of that information, and as needed consider whether to utilise it unchanged or not (for example new taxonomic information may have been published since that call was made e.g. taxonomic placement, synonymy, name change etc.). 

> Question 3: What kinds of data do you want to be able to access from a
> GBD?

A previous Taxacom post from Rod Page suggested the following:


Very simple questions are being asked:

1. Is this a name?
2. Is this the correct way to write it?
3. Is this name currently in use?
4. What other names are related to this name (e.g., synonyms, lexical variants)?
5. Where was this name published? Can I see that publication?


I would extend this a bit further:

6. What is the current (and also past) taxonomic placement of this name (+ according to...)
7. What are its parent/children in the selected current taxonomic hierarchy
8. What other names are lexically related to this name (homonyms, near-homonyms/candidate "did you mean")
9. What do we know of the type specimen i.e. when/where collected, where deposited, geologic age, associated habitat etc.
10. What do we know of the taxon to which this name applies - ecological info, distribution in space and time, common names, descriptive characters, significant literature treatments

For a "straw man" here is an example species-level name treatment from Eschmeyer's online Catalog of Fishes, for the name Bythites hollisi (now a syn. of Thermichthys hollisi):

hollisi, Bythites Cohen [D. M.], Rosenblatt [R. H.] & Moser [H. G.] 1990:270, Figs. 1-8 [Deep-Sea Research v. 37 (no. 2); ref. 14223] Hydrothermal vent (Mussel Bed) on Galápagos Rift Zone, 0°47.894'N, 86°9.210'W, depth 2500 meters. Holotype (unique): SIO 88-97. .Valid as Bythites hollisi Cohen, Rosenblatt & Moser 1990 -- (Geistdoerfer 1999:9 [ref. 23832], Nielsen & Cohen in Nielsen et al. 1999:98 [ref. 24448], Machida & Hashimoto 2002:1 [ref. 25949], Chernova & Geistdorfer 2003:153 [ref. 26887]). .Valid as Gerhardia hollisi (Cohen, Rosenblatt & Moser 1990) -- (Nielsen & Cohen 2002:50 [ref. 26528]). .Valid as Thermichthys hollisi (Cohen, Rosenblatt & Moser 1990) -- (Nielsen & Cohen 2005:395 [ref. 28470]). Current status: Valid as Thermichthys hollisi (Cohen, Rosenblatt & Moser 1990). Bythitidae: Bythitinae. Distribution: Southeastern Pacific. Habitat: marine.

(also note that all the statements are referenced to a references table which can be searched independently).

You can assess for yourself how much of my suggestions above are covered here. Some that are not may be covered by the equivalent entry in FishBase, see


(Actually this page is pretty bare compared with many in FishBase, but you will get the idea).

For fossil taxa I think PaleoDB has pretty much the right approach, as an example see this page for the genus Tyrannosaurus:

http://paleodb.org/cgi-bin/bridge.pl?a=basicTaxonInfo&taxon_no=38613 (there is a lot more information also available via "more details" as well)

> Question 4: Which existing initiative currently comes closest to what
> you would ideally like to see?

See some examples above for particular groups (many more out there). Across all groups - either build your own (as I do) or use Google Scholar and Nomenclator Zoologicus (for animals) as a surrogate for the literature at this time, backed up by other internet/print resources as available (a personal library is still invaluable, especially for the more substantial texts). Wikipedia is surprisingly useful for recent updates on treatments of some groups and for the more "charismatic" taxa in general (the value of crowdsourcing I guess) but also beware of inaccuracies/inconsistencies between treatments on different pages, also very incomplete as minor taxa are not considered sufficiently "notable" I guess. (Why Wikipedia as opposed to Wikispecies? I guess I typically want more than the "bare bones" taxonomic placement and Wikispecies only supplies the latter).

That's my take - maybe not quite what you are asking for, but maybe something useful there.

Regards - Tony

> Dear Taxacomers,
> I have created a short questionnaire (below) for which I would
> appreciate greatly any replies. It concerns global biodiversity
> databases (GBDs) ("databases" in the broadest possible sense).
> Cheers, Stephen
> Question 1: Do you expect a comprehensive and reliable GBD to exist in
> the foreseeable future (or do you think that one or more already
> exist)? If so, do you think it is likely to come from an existing
> initiative, and if so, which one(s)?
> Question 2: Which would you prefer, (A) data verified by "experts"; or
> (2) data verifiable by the user (via referencing)?
> Question 3: What kinds of data do you want to be able to access from a
> GBD?
> Question 4: Which existing initiative currently comes closest to what
> you would ideally like to see?
