[Taxacom] Filling the world with biodiversity data errors

Donat Agosti agosti at amnh.org
Fri Dec 4 02:11:36 CST 2020


Here is also the record of this article in GBIF: https://www.gbif.org/dataset/11a122a9-8f3e-4a1c-a8d3-5ef3bea09b59

Here is the an overview of the figures in this article: https://ocellus.info/images.html?q=%2210.11646/zootaxa.4889.1.1%22&size=100&page=1&communities=biosyslit
Each of the figure is citing the source, the treatment that cites it and more.

The important part is to look at the comparison with the GBIF and COL taxonomic backbones, as well as the access to treatments and figures.

All the embedded links can only be achieved by machine.

Cheers
Donat


From: Donat Agosti
Sent: Friday, December 4, 2020 8:38 AM
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>; 'Taxacom' <taxacom at mailman.nhm.ku.edu>; Richard Pyle <deepreef at bishopmuseum.org>
Subject: RE: [Taxacom] Filling the world with biodiversity data errors

Hi Stephen

Thanks for pointing this out. Our stats – you also can access it http://plazi.org/api-tools/statistics/ – say that the machine detected 15 treatments. 10 with country = New Zealand, 1 country = Iceland, 4 country not detected (because the subsequent materials citation are not complete and providing only data that is different from the previous one).

http://tb.plazi.org/GgServer/srsStats/stats?outputFields=doc.articleUuid+matCit.country&groupingFields=doc.articleUuid+matCit.country&FP-doc.articleUuid=FF9DFFECFFDEFFE16F124B66712FFFBC&format=HTML


If you look at the materials citations themselves then there is in fact ONE materials citation that has the country wrong, and  88  with or without country = New Zealand. http://tb.plazi.org/GgServer/srsStats/stats?outputFields=doc.uuid+doc.articleUuid+matCit.verbatimMatCit+matCit.country&groupingFields=doc.articleUuid+matCit.country&FP-doc.articleUuid=FF9DFFECFFDEFFE16F124B66712FFFBC&format=HTML


And here some more detail

http://tb.plazi.org/GgServer/srsStats/stats?outputFields=doc.uuid+doc.articleUuid+doc.articleDoi+matCit.verbatimMatCit+matCit.country&groupingFields=doc.uuid+doc.articleUuid+doc.articleDoi+matCit.verbatimMatCit+matCit.country&FP-doc.articleUuid=FF9DFFECFFDEFFE16F124B66712FFFBC&format=HTML

in fact, you don’t see in the stats above country = Iceland, because it is already fixed.

You  can report errors via taxacom, https://github.com/plazi/community/issues, GBIF.

Erros are taken seriously and we have quality control tools in place to minimize them. They could altogether be omitted if we would stop publishing this important data in cloased access PDF prisons. We feel however, that it is more important that we make this data accessible. Currently the Plazi worklflow is the only way to get data about new and already known species close to the day of publication into GBIF, BLR so it can be widely used. Ca 45,000 new species are the only records of the species because the data has been liberated from publications.

In our view, a very impressive example of what machine, quality control and a human eye can do is this most recent EJT article with almost 1,700 mateials citation:  https://doi.org/10.5852/ejt.2020.725.1167 that is now accessible: https://zenodo.org/record/4298139 or http://treatment.plazi.org/GgServer/summary/FFCBFF8B214DFFE68B01FFBC3556FF96 and finally in GBIF https://www.gbif.org/dataset/9e062836-3946-4ac6-8910-304cadff0d4b.


Cheers
Donat


From: Stephen Thorpe <stephen_thorpe at yahoo.co.nz<mailto:stephen_thorpe at yahoo.co.nz>>
Sent: Friday, December 4, 2020 5:50 AM
To: Donat Agosti <agosti at amnh.org<mailto:agosti at amnh.org>>; 'Taxacom' <taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>>; Richard Pyle <deepreef at bishopmuseum.org<mailto:deepreef at bishopmuseum.org>>
Subject: Re: [Taxacom] Filling the world with biodiversity data errors

EXTERNAL SENDER

Well Rich, it rather depends on the frequency of such errors, which I don't have any grasp of, at present. If the frequency of such errors is low, say 1%, then there isn't much of a problem (unless you particularly want data on Zenascus luniger!) However, if the frequency of such errors is high, which might be the case, as far as I know, then there is more of a problem, even if it is the best of the available alternatives. Assuming that the data does actually get used by somebody for some purpose, a high proportion of such errors isn't necessarily better than nothing! Imagine if the species was added to the list of protected threatened species of Iceland, on the basis that there were so few known specimens and subsequent surveys turned up none! The mythical extinct Icelandic aderid! Anyway, I don't see anything wrong with pointing out that there is room for improvement!

Stephen

On Friday, 4 December 2020, 05:38:52 pm NZDT, Richard Pyle <deepreef at bishopmuseum.org<mailto:deepreef at bishopmuseum.org>> wrote:


Two guys walking through the woods.  A huge bear starts to charge them.  The first guy sits down to put on his running shoes.  The second guy says, "Are you crazy?!? You'll never out-run that bear!" The first guy says, "I don't have to out-run the bear. I just have to out-run YOU."

Moral of the story: You don't need to perfect.  You just need to be better than the alternative.  From where I sit, what PLAZI is doing is far, FAR better than the alternative!

Keep those running shoes on, Donat!

Aloha,
Rich

Richard L. Pyle, PhD
Senior Curator of Ichthyology | Director of XCoRE
Bernice Pauahi Bishop Museum
1525 Bernice Street, Honolulu, HI 96817-2704
Office: (808) 848-4115;  Fax: (808) 847-8252
eMail: deepreef at bishopmuseum.org<mailto:deepreef at bishopmuseum.org>
BishopMuseum.org
Our Mission: Bishop Museum inspires our community and visitors through the exploration and celebration of the extraordinary history, culture, and environment of Hawaiʻi and the Pacific.

> -----Original Message-----
> From: Taxacom <taxacom-bounces at mailman.nhm.ku.edu<mailto:taxacom-bounces at mailman.nhm.ku.edu>> On Behalf Of
> Stephen Thorpe via Taxacom
> Sent: Thursday, December 3, 2020 5:27 PM
> To: Donat Agosti <agosti at amnh.org<mailto:agosti at amnh.org>>; Taxacom
> <taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu>>
> Subject: [Taxacom] Filling the world with biodiversity data errors
>
> Hi Donat and Taxacom,The attempt to automate data harvesting, without
> human scrutiny, continues to fill the world with biodiversity data errors.
> Specifically, on zenodo, for Zenascus luniger, if you click on 'specimens', you
> get
> trash: http://tb.plazi.org/GgServer/html/03A48794FFF8FFC66F8549E97070F8<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftb.plazi.org%2FGgServer%2Fhtml%2F03A48794FFF8FFC66F8549E97070F8&data=04%7C01%7Cagosti%40amnh.org%7Cd8a1975b9a5b436c163c08d898101606%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637426542161324931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0ntPeV5w6OHEOVp%2Fs%2BXoht%2F1qksKItTUMO3eYBn6imQ%3D&reserved=0>
> F1
> It is garbage for the following reasons: (1) The species is endemic to N.Z., but
> the 4 specimens recorded are 2 from Iceland and the other 2 unknown, leaving
> Iceland as the only location on the map!(2) Of the 4 specimens, 2 are
> supposedly lectotypes (no, there is no synonymy)!(3) The treatment upon
> which this data is based only mentions 2 specimens (lectotype and
> paralectotype), both from New Zealand.I'm not sure how widespread such
> problems are on zenodo, but there seems to be little reason to think that this
> is a rare case, is there?Cheers, Stephen

> _______________________________________________
> Taxacom Mailing List
>
> Send Taxacom mailing list submissions to: taxacom at mailman.nhm.ku.edu<mailto:taxacom at mailman.nhm.ku.edu> For
> list information; to subscribe or unsubscribe, visit:
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.nhm.ku.edu%2Fcgi-bin%2Fmailman%2Flistinfo%2Ftaxacom&data=04%7C01%7Cagosti%40amnh.org%7Cd8a1975b9a5b436c163c08d898101606%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637426542161334931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=qqgZtQVewF6TWocHPmgABOrdi6kOFtGE8FtQhmA%2F9cI%3D&reserved=0>
> You can reach the person managing the list at: taxacom-
> owner at mailman.nhm.ku.edu<mailto:owner at mailman.nhm.ku.edu> The Taxacom email archive back to 1992 can be
> searched at: http://taxacom.markmail.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftaxacom.markmail.org%2F&data=04%7C01%7Cagosti%40amnh.org%7Cd8a1975b9a5b436c163c08d898101606%7Cbe0003e8c6b9496883aeb34586974b76%7C0%7C0%7C637426542161334931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BqnGfh%2FV5p5lwwMzsUvlaAf6tDHANN6LZwfAGja8%2Bws%3D&reserved=0>
>
> Nurturing nuance while assaulting ambiguity for about 33 years, 1987-2020.



More information about the Taxacom mailing list