[Taxacom] An aside re: Wikidata (was Re: Forcing ORCID on researchers)
Douglas Yanega
dyanega at gmail.com
Tue Dec 8 17:32:49 CST 2020
On 12/8/20 11:24 AM, Tony Rees via Taxacom wrote:
> Perhaps I can put what I mean another way. ORCID and Google Scholar (two
> current exemplars in the "IDs for researchers" and "IDs for articles"
> space, but by no means the only ones) currently work because "someone" is
> throwing the required substantial amounts of funding around to support the
> dozens of full time staff (in each case) required to support these tasks,
> or at least make a reasonable dent on them (the dreaded commercial model as
> described/disparaged above, even though at least ORCID is a
> not-for-profit). However I do not see equivalent funders / full time
> positions to repeat the same exercise(s) for a community-owned project such
> as Wikidata...
To divert the discussion back to taxonomy a little, I have a wee rant to
share:
Speaking as someone who makes routine use of resources including
Wikispecies, Wikipedia, and the Wikimedia Commons, there is something
fundamentally different in how Wikidata operates, and not in a good way.
There is a logic, and a transparency, to the operations and interactions
in the first three resources that is largely absent from Wikidata.
Relative to the other Wiki resources, it is extremely hard to make sense
of the Wikidata interface if one wishes to edit an entry there, and
there's a variety of interactions between a Wikidata record and a number
of *external elements* that are not easy to keep aligned properly,
*especially* when there are taxonomic and nomenclatural issues involved
(e.g., if there are two Wikidata records, and one is a junior synonym of
another).
The point I'm getting at is this: Wikispecies, Wikipedia, and the
Wikimedia Commons are all resources that are genuinely crowd-sourced,
because they allow people to contribute to them, and maintain them, with
a minimal learning curve (so there are thousands upon thousands of
contributing editors), and it is correspondingly easy for the
self-policing mechanisms to function as intended, because there is a
critical mass of people who know and can enforce the official policies.
Feedback in Wikipedia, especially, is *extremely* rapid. This does not
appear to be true of Wikidata. There are very few active editors, a
poorly-defined policing mechanism, and the overwhelming majority of
edits are via automated scripts ("bots"), whose actions may conflict
with manual edits. There is no easy way to solicit feedback (even though
items have "discussion" spaces allocated, most of them are blank, and it
is doubtful if a comment posted there would attract anyone's attention).
One of the immediate consequences is that if there is an error in a
taxonomic Wikidata record, especially an error *originating in an
external source *(such as a misspelling, or the use of a junior
synonym), it is *extremely* difficult to fix it and not have it reverted
back automatically by a bot, or by an editor who understands nothing
about taxonomy. A database that is designed in such a way that errors in
it cannot be easily reported, or fixed, is a badly-designed database.
Having a permanent, unique, ID number for every taxon in the world is a
great idea, but taxonomy is by no means permanent. Opinions change, and
taxa merge, or split, or change rank, or change spellings, literally
every single day, but Wikidata is not designed to accommodate this. If
someone publishes a new paper in which a taxon listed in Wikidata
changes in its rank, spelling, or delimitation, it could be *years*
before the change is reflected there, if ever (e.g., Wikidata has an
entry for the bee family Anthophoridae, a name that was synonymized in
1990, and has no indication that the name is not valid); however, I can
change essentially ANYTHING regarding a taxon in a matter of *minutes*
in Wikispecies, Wikipedia, or the Wikimedia Commons. In fact, one of the
first things one has to do when modifying a Wikipedia entry in this way
is to delete any links to Wikidata, because the Wikidata record no
longer matches the new parameters of the taxon, cannot be altered to
match the new parameters, and would mislead readers by pointing to the
OLDĀ rank, spelling, or delimitation.
I can't honestly see what possible advantage Wikidata offers as a
taxonomic resource when it has such an incredibly limited capacity for
modification, and so little community engagement. It acts like one of
the large and arbitrary data aggregators such as EOL, or ITIS, or GBIF,
and is similarly inflexible and rapidly outdated. To be perfectly
honest, the only online resource that offers all the tools that taxonomy
requires to make taxa and their data visible *and* stay updated in a
timely fashion is Wikipedia, because it allows for extensive text and
inclusion of *multiple* external sources (despite having a single
backbone classification). That is, Wikidata literally limits each record
to a single external source link, so if there is a taxonomic dispute,
*only Wikipedia* can link to publications and evidence from both sides
of the dispute, to explain how and why the backbone being displayed
might not represent a unanimous opinion. Wikispecies and the Wikimedia
Commons both use a single taxonomic backbone, as well, but they are not
designed to incorporate extensive text that might explain alternative
taxonomic opinions, or track historical usage; in Wikispecies, this is
at least *possible*, in theory, though it'd be cumbersome to accomplish,
while in the Wikimedia Commons, there is literally no way to indicate
alternative taxonomic opinions, nor to link to external sources other
than Wikidata. At least for the arthropod groups I work with, the
ranking from the smallest number of outdated or erroneous
taxonomy-related entries to the greatest number, is Wikipedia performing
best, then Wikispecies, then Wikimedia Commons, then Wikidata the worst.
I don't think it's a coincidence that this is the same ranking in terms
of the relative ease of editing. If you want a resource to stay updated,
it needs to allow people to update it easily.
Peace,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (disclaimer: opinions are mine, not UCR's)
https://faculty.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
More information about the Taxacom
mailing list