[Taxacom] Filling the world with biodiversity data errors

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Fri Dec 4 02:44:34 CST 2020

 I'm not quite sure how this helps, Donat?? GBIF has avoided the Iceland reference simply by giving no location information at all for that specimen. The fact remains that anyone using Plazi will see Iceland, not N.Z., as the ONLY locality for Zenascus luniger and will have to check the original treatment to see that this is an error. Many potential users will simply have no reason to doubt the Iceland reference and so will accept it uncritically.Stephen
    On Friday, 4 December 2020, 09:11:42 pm NZDT, Donat Agosti <agosti at amnh.org> wrote:  
 #yiv5395021525 #yiv5395021525 -- _filtered {} _filtered {} _filtered {}#yiv5395021525 #yiv5395021525 p.yiv5395021525MsoNormal, #yiv5395021525 li.yiv5395021525MsoNormal, #yiv5395021525 div.yiv5395021525MsoNormal {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv5395021525 a:link, #yiv5395021525 span.yiv5395021525MsoHyperlink {color:blue;text-decoration:underline;}#yiv5395021525 span.yiv5395021525EmailStyle20 {font-family:sans-serif;color:windowtext;}#yiv5395021525 .yiv5395021525MsoChpDefault {font-size:10.0pt;} _filtered {}#yiv5395021525 div.yiv5395021525WordSection1 {}#yiv5395021525 
Here is also the record of this article in GBIF: https://www.gbif.org/dataset/11a122a9-8f3e-4a1c-a8d3-5ef3bea09b59
Here is the an overview of the figures in this article: https://ocellus.info/images.html?q=%2210.11646/zootaxa.4889.1.1%22&size=100&page=1&communities=biosyslit
Each of the figure is citing the source, the treatment that cites it and more.
The important part is to look at the comparison with the GBIF and COL taxonomic backbones, as well as the access to treatments and figures.
All the embedded links can only be achieved by machine.
From: Donat Agosti 
Sent: Friday, December 4, 2020 8:38 AM
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>; 'Taxacom' <taxacom at mailman.nhm.ku.edu>; Richard Pyle <deepreef at bishopmuseum.org>
Subject: RE: [Taxacom] Filling the world with biodiversity data errors
Hi Stephen
Thanks for pointing this out. Our stats – you also can access ithttp://plazi.org/api-tools/statistics/ – say that the machine detected 15 treatments. 10 with country = New Zealand, 1 country = Iceland, 4 country not detected (because the subsequent materials citation are not complete and providing only data that is different from the previous one).
If you look at the materials citations themselves then there is in fact ONE materials citation that has the country wrong, and  88  with or without country = New Zealand.http://tb.plazi.org/GgServer/srsStats/stats?outputFields=doc.uuid+doc.articleUuid+matCit.verbatimMatCit+matCit.country&groupingFields=doc.articleUuid+matCit.country&FP-doc.articleUuid=FF9DFFECFFDEFFE16F124B66712FFFBC&format=HTML
And here some more detail
in fact, you don’t see in the stats above country = Iceland, because it is already fixed.
You  can report errors via taxacom, https://github.com/plazi/community/issues, GBIF.
Erros are taken seriously and we have quality control tools in place to minimize them. They could altogether be omitted if we would stop publishing this important data in cloased access PDF prisons. We feel however, that it is more important that we make this data accessible. Currently the Plazi worklflow is the only way to get data about new and already known species close to the day of publication into GBIF, BLR so it can be widely used. Ca 45,000 new species are the only records of the species because the data has been liberated from publications.
In our view, a very impressive example of what machine, quality control and a human eye can do is this most recent EJT article with almost 1,700 mateials citation:  https://doi.org/10.5852/ejt.2020.725.1167 that is now accessible: https://zenodo.org/record/4298139 orhttp://treatment.plazi.org/GgServer/summary/FFCBFF8B214DFFE68B01FFBC3556FF96 and finally in GBIFhttps://www.gbif.org/dataset/9e062836-3946-4ac6-8910-304cadff0d4b.  
From: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Sent: Friday, December 4, 2020 5:50 AM
To: Donat Agosti <agosti at amnh.org>; 'Taxacom' <taxacom at mailman.nhm.ku.edu>; Richard Pyle <deepreef at bishopmuseum.org>
Subject: Re: [Taxacom] Filling the world with biodiversity data errors
Well Rich, it rather depends on the frequency of such errors, which I don't have any grasp of, at present. If the frequency of such errors is low, say 1%, then there isn't much of a problem (unless you particularly want data on Zenascus luniger!) However, if the frequency of such errors is high, which might be the case, as far as I know, then there is more of a problem, even if it is the best of the available alternatives. Assuming that the data does actually get used by somebody for some purpose, a high proportion of such errors isn't necessarily better than nothing! Imagine if the species was added to the list of protected threatened species of Iceland, on the basis that there were so few known specimens and subsequent surveys turned up none! The mythical extinct Icelandic aderid! Anyway, I don't see anything wrong with pointing out that there is room for improvement!
On Friday, 4 December 2020, 05:38:52 pm NZDT, Richard Pyle <deepreef at bishopmuseum.org> wrote: 
Two guys walking through the woods.  A huge bear starts to charge them.  The first guy sits down to put on his running shoes.  The second guy says, "Are you crazy?!? You'll never out-run that bear!" The first guy says, "I don't have to out-run the bear. I just have to out-run YOU."

Moral of the story: You don't need to perfect.  You just need to be better than the alternative.  From where I sit, what PLAZI is doing is far, FAR better than the alternative! 

Keep those running shoes on, Donat!


Richard L. Pyle, PhD
Senior Curator of Ichthyology | Director of XCoRE
Bernice Pauahi Bishop Museum
1525 Bernice Street, Honolulu, HI 96817-2704
Office: (808) 848-4115;  Fax: (808) 847-8252
eMail: deepreef at bishopmuseum.org
Our Mission: Bishop Museum inspires our community and visitors through the exploration and celebration of the extraordinary history, culture, and environment of Hawaiʻi and the Pacific.

> -----Original Message-----
> From: Taxacom <taxacom-bounces at mailman.nhm.ku.edu> On Behalf Of
> Stephen Thorpe via Taxacom
> Sent: Thursday, December 3, 2020 5:27 PM
> To: Donat Agosti <agosti at amnh.org>; Taxacom
> <taxacom at mailman.nhm.ku.edu>
> Subject: [Taxacom] Filling the world with biodiversity data errors
> Hi Donat and Taxacom,The attempt to automate data harvesting, without
> human scrutiny, continues to fill the world with biodiversity data errors.
> Specifically, on zenodo, for Zenascus luniger, if you click on 'specimens', you
> get
> trash: http://tb.plazi.org/GgServer/html/03A48794FFF8FFC66F8549E97070F8
> F1
> It is garbage for the following reasons: (1) The species is endemic to N.Z., but
> the 4 specimens recorded are 2 from Iceland and the other 2 unknown, leaving
> Iceland as the only location on the map!(2) Of the 4 specimens, 2 are
> supposedly lectotypes (no, there is no synonymy)!(3) The treatment upon
> which this data is based only mentions 2 specimens (lectotype and
> paralectotype), both from New Zealand.I'm not sure how widespread such
> problems are on zenodo, but there seems to be little reason to think that this
> is a rare case, is there?Cheers, Stephen

> _______________________________________________
> Taxacom Mailing List
> Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu For
> list information; to subscribe or unsubscribe, visit:
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> You can reach the person managing the list at: taxacom-
> owner at mailman.nhm.ku.edu The Taxacom email archive back to 1992 can be
> searched at: http://taxacom.markmail.org
> Nurturing nuance while assaulting ambiguity for about 33 years, 1987-2020. 

More information about the Taxacom mailing list