[Taxacom] An aside re: Wikidata (was Re: Forcing ORCID on researchers)

Douglas Yanega dyanega at gmail.com
Tue Dec 8 17:32:49 CST 2020

On 12/8/20 11:24 AM, Tony Rees via Taxacom wrote:
> Perhaps I can put what I mean another way. ORCID and Google Scholar (two
> current exemplars in the "IDs for researchers" and "IDs for articles"
> space, but by no means the only ones) currently work because "someone" is
> throwing the required substantial amounts of funding around to support the
> dozens of full time staff (in each case) required to support these tasks,
> or at least make a reasonable dent on them (the dreaded commercial model as
> described/disparaged above, even though at least ORCID is a
> not-for-profit). However I do not see equivalent funders / full time
> positions to repeat the same exercise(s) for a community-owned project such
> as Wikidata...

To divert the discussion back to taxonomy a little, I have a wee rant to 

Speaking as someone who makes routine use of resources including 
Wikispecies, Wikipedia, and the Wikimedia Commons, there is something 
fundamentally different in how Wikidata operates, and not in a good way. 
There is a logic, and a transparency, to the operations and interactions 
in the first three resources that is largely absent from Wikidata. 
Relative to the other Wiki resources, it is extremely hard to make sense 
of the Wikidata interface if one wishes to edit an entry there, and 
there's a variety of interactions between a Wikidata record and a number 
of *external elements* that are not easy to keep aligned properly, 
*especially* when there are taxonomic and nomenclatural issues involved 
(e.g., if there are two Wikidata records, and one is a junior synonym of 

The point I'm getting at is this: Wikispecies, Wikipedia, and the 
Wikimedia Commons are all resources that are genuinely crowd-sourced, 
because they allow people to contribute to them, and maintain them, with 
a minimal learning curve (so there are thousands upon thousands of 
contributing editors), and it is correspondingly easy for the 
self-policing mechanisms to function as intended, because there is a 
critical mass of people who know and can enforce the official policies. 
Feedback in Wikipedia, especially, is *extremely* rapid. This does not 
appear to be true of Wikidata. There are very few active editors, a 
poorly-defined policing mechanism, and the overwhelming majority of 
edits are via automated scripts ("bots"), whose actions may conflict 
with manual edits. There is no easy way to solicit feedback (even though 
items have "discussion" spaces allocated, most of them are blank, and it 
is doubtful if a comment posted there would attract anyone's attention). 
One of the immediate consequences is that if there is an error in a 
taxonomic Wikidata record, especially an error *originating in an 
external source *(such as a misspelling, or the use of a junior 
synonym), it is *extremely* difficult to fix it and not have it reverted 
back automatically by a bot, or by an editor who understands nothing 
about taxonomy. A database that is designed in such a way that errors in 
it cannot be easily reported, or fixed, is a badly-designed database.

Having a permanent, unique, ID number for every taxon in the world is a 
great idea, but taxonomy is by no means permanent. Opinions change, and 
taxa merge, or split, or change rank, or change spellings, literally 
every single day, but Wikidata is not designed to accommodate this. If 
someone publishes a new paper in which a taxon listed in Wikidata 
changes in its rank, spelling, or delimitation, it could be *years* 
before the change is reflected there, if ever (e.g., Wikidata has an 
entry for the bee family Anthophoridae, a name that was synonymized in 
1990, and has no indication that the name is not valid); however, I can 
change essentially ANYTHING regarding a taxon in a matter of *minutes* 
in Wikispecies, Wikipedia, or the Wikimedia Commons. In fact, one of the 
first things one has to do when modifying a Wikipedia entry in this way 
is to delete any links to Wikidata, because the Wikidata record no 
longer matches the new parameters of the taxon, cannot be altered to 
match the new parameters, and would mislead readers by pointing to the 
OLDĀ rank, spelling, or delimitation.

I can't honestly see what possible advantage Wikidata offers as a 
taxonomic resource when it has such an incredibly limited capacity for 
modification, and so little community engagement. It acts like one of 
the large and arbitrary data aggregators such as EOL, or ITIS, or GBIF, 
and is similarly inflexible and rapidly outdated. To be perfectly 
honest, the only online resource that offers all the tools that taxonomy 
requires to make taxa and their data visible *and* stay updated in a 
timely fashion is Wikipedia, because it allows for extensive text and 
inclusion of *multiple* external sources (despite having a single 
backbone classification). That is, Wikidata literally limits each record 
to a single external source link, so if there is a taxonomic dispute, 
*only Wikipedia* can link to publications and evidence from both sides 
of the dispute, to explain how and why the backbone being displayed 
might not represent a unanimous opinion. Wikispecies and the Wikimedia 
Commons both use a single taxonomic backbone, as well, but they are not 
designed to incorporate extensive text that might explain alternative 
taxonomic opinions, or track historical usage; in Wikispecies, this is 
at least *possible*, in theory, though it'd be cumbersome to accomplish, 
while in the Wikimedia Commons, there is literally no way to indicate 
alternative taxonomic opinions, nor to link to external sources other 
than Wikidata. At least for the arthropod groups I work with, the 
ranking from the smallest number of outdated or erroneous 
taxonomy-related entries to the greatest number, is Wikipedia performing 
best, then Wikispecies, then Wikimedia Commons, then Wikidata the worst. 
I don't think it's a coincidence that this is the same ranking in terms 
of the relative ease of editing. If you want a resource to stay updated, 
it needs to allow people to update it easily.


Doug Yanega      Dept. of Entomology       Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314     skype: dyanega
phone: (951) 827-4315 (disclaimer: opinions are mine, not UCR's)
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82

More information about the Taxacom mailing list