[Taxacom] Chameleons, GBIF, and the Red List

James Macklin james.macklin at gmail.com
Tue Aug 26 13:11:52 CDT 2014


Hi Rod,

Okay, I started this message about 20 messages ago but hope it helps...

I was pleased to see your response to Chuck using the TripAdvisor example.
I now believe that for the most part what you envision is very similar to
the goals of FilteredPush but with slightly different focus. Our greatest
challenge by far has been to "map" the accepted annotations back to source
collection management databases (concatenation vs. atomization, etc.). We
do have some solutions based on specific clients but the investment is
significant. For some, this investment (an API and some potential
manipulation of interfaces) will be worth it. As I noted earlier, there is
no doubt that some institutions may be overwhelmed by the number of
annotations pushed to them and have little hope of processing them given
their current resources. We have tried to address this issue in a couple
ways. Our web client does allow for some powerful filtering so a given data
manager can quickly search for annotations that they wish to review. We
also have considered a trust mechanism here so that an annotator may be
given an "accept" status allowing all of their annotations to automatically
go into the collection management system. The opposite can also be applied
;-)  This may be especially useful to curators and researchers associated
with the institution who are working on a project which uses an FP client
for management, for example Symbiota, where all the annotations relevant to
their home institution are immediately reflected in their collection
database. This is also beneficial to those working remotely where no access
to the collection database is possible but annotations could be made
through a Morphbank client, for example, and be automatically updated back
at the source. Of course, a major issue, which has been raised in different
ways during this discussion is consensus. What if I get pushed 7 different
new georeferences for a given record over a month's time (not that
unrealistic). My collection database stores a single georeference with an
ability to put "old" versions in a comments field. So, I can store what I
was pushed in an unstructured way but which annotation should be the
current structured one? Unlike taxonomic annotations where we have a long
history of the latest annotation wins (naive as that is sometimes...), this
does not hold for georeferences. The data manager has no easy way to judge
which to choose. We do have some algorithms in our analysis component of FP
which could help (compare against Geolocate result for example) but in the
end some consensus method is necessary.

Now, stepping back to Rod's concept of a store or stores that hold the
annotations/comments about specimens we can see some real advantages. One
thing I have learned along the way is that it is unrealistic and
counterproductive to expect institutions and their collection management
systems to handle all of this information/knowledge coming from experts
and/or more automated curation processes. These systems are necessary for
internal management, no doubt, but are likely to no longer be
"authoritative" in the sense that all knowledge about the specimen is
housed there. I think this what Rod is also suggesting. There has, of
course, traditionally been some information that is not associated with the
specimen record, i.e., the research done on the specimens and recorded in
the literature, but thankfully we are slowly doing better at connecting
these. Thus, we do need stores which are outside of the institutions
themselves which house all of the annotations/comments/information
captured. This immediately raises the sustainability flag but that is
another story (note this may involve GBIF). In FP, we do have to maintain
an annotation store and in some cases a document store (required for
reference to what is being annotated). As Rod points out, these could be
mined for a few purposes, which divide between curation and data quality,
and research. These stores can in many cases be made public and lead to
further annotation through a wide range of clients. In fact, we now talk
about annotation conversations about one-to-many objects/specimens as this
is what we see occurring between experts. If we want to be more "social"
then we just need lightweight annotation services that interact with your
favorite social outlet, Facebook, Wikipedia, Twitter, LinkedIn... We have
demonstrated a general ability to do this (
https://sourceforge.net/p/filteredpush/svn/HEAD/tree/trunk/FP-DataEntry/).
Let's also not forget the importance of an annotation framework to provide
at least minimal structure and metadata to annotations being passed and
stored which we now have at least recognized as part of the W3C Open
Annotation Consortium standard. This is key to attribution for contributors
and knowing who/what can be trusted.

The who can be trusted component of the mix is a significant challenge.
Yes, there are models coming from many "social" clients and apps but their
application to research and curation is still under studied and perhaps
appreciated. I was hoping to get a workshop going on this as it is key to
these annotation systems but have not made it that far yet. Of course, the
greatest challenge here are our own colleagues who are very used to a
certain way of doing business and assessing trust and we need to involve
them and find ways to make them feel comfortable...

Finally, I really believe in the "data cleaning data" model or perhaps
rephrased "data informing data" to be a little more semantically grounded,
but the reality so far is that we actually are not big enough! We need much
more data to be made available through digitization efforts, opening up
research silos, mining literature, etc., and yes, we need aggregators like
GBIF to put the data in a place, make it discoverable and perhaps even to
help us interpret it.

I hope you will see that FP and the many folks we interact with have moved
well beyond sticky notes ;-)

Best,  JAmes

P.S. Completely agree with Rich's three points, especially #3



On Tue, Aug 26, 2014 at 5:48 AM, Roderic Page <Roderic.Page at glasgow.ac.uk>
wrote:

> Hi Lyubo,
>
> Yes, you could have structured annotations and free text for comments. But
> what I’d like to avoid is the idea that simply putting a comment system
> onto a database is the answer. It’s easy to do that (I use Disqus on my
> blog, and on BioStor and BioNames, for example), but that doesn’t get us
> terribly far (useful as people’s comments are).
>
> If we focus on the structured annotations, and the idea that multiple
> authorities can write about the same thing, then this leads inevitably to
> the idea of having a single, global annotation store that holds data about
> all kinds of things that we care about.
>
> This was what I was trying to articulate in
> http://iphylo.blogspot.co.uk/2014/03/rethinking-annotating-biodiversity-data.html
> It can be done in a way that makes almost no demands on data providers at
> all, and if it’s done cleverly it will (as a side effect) provide a
> potentially powerful database we can use to ask some of the bigger
> questions about biodiversity data.
>
> Regards
>
> Rod
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
> Tel:  +44 141 330 4778
> Skype:  rdmpage
> Facebook:  http://www.facebook.com/rdmpage
> LinkedIn:  http://uk.linkedin.com/in/rdmpage
> Twitter:  http://twitter.com/rdmpage
> Blog:  http://iphylo.blogspot.com
> ORCID:  http://orcid.org/0000-0002-7101-9767
> Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
>
>
> On 26 Aug 2014, at 08:56, Lyubomir Penev <lyubo.penev at gmail.com<mailto:
> lyubo.penev at gmail.com>> wrote:
>
> Rod,
>
> Your proposal is indeed elegant, but what about using two separate general
> types of annotations, e.g.: (1) Corrections/Additions and (2) Comments
> ("sticky notes")?
>
> The Type 1 (Annotations=Corrections/Additions) can go straight to
> improve/amend the data (after approval or not is a different story), for
> example adding geocoordinates for a well-known locality or adding a
> collector's name missing on label data but known from other source, etc.
>
> Annotations Type 2 (Annotations=Comments) could be associated with the
> original data or with Type 1 Annotations, e.g., "velvet worms cannot live
> in the ocean, correct geocoordinates for this locality", or "this locality
> might be wrongly spelled", etc.
>
> No need to explain that Type 2 annotations could be based on a rich
> controlled vocabulary of statements, besides the free text option, which
> will allow machine processing of a part of the process, including automated
> verification of the original data in some particular cases. For example,
> annotation of the kind  "Now in different genus"  could automatically query
> a trusted taxonomy source (CoL, GNUB, etc,) and display all possible
> versions and validity status of that name and its combinations.
>
> Regards,
> Lyubomir
>
>
>
>
> On Mon, Aug 25, 2014 at 11:32 PM, Chuck Miller <Chuck.Miller at mobot.org
> <mailto:Chuck.Miller at mobot.org>> wrote:
> Rod,
> I see what you are thinking.
>
> I'm thinking too much about the alternate views that seem to always come
> up in comments about taxonomic data, even right here on Taxacom.  So, an
> annotation thread could be something like:
> "Now in different genus"
> "No way, respectfully what century are you from?. It's not even a species."
> "Maybe they meant Africa."
> "Yeh, in Namibia it was split and revised to Y, I think."
> "The phylogeny for Y is totally different. It's now in Z. Does no one read
> my work? Taxonomic tyranny!"
> "Consider this DOI that refers to new collections and it seems to
> contradict all y'all."
>
> What does "social" annotation turn into?  Lots of useful information but
> if it starts being contradictory or overlapping, can it be databased and
> which contradiction is the primary?  Hopefully, annotation of localities
> could be straightforward as in your example.
>
> Chuck
>
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
> The Taxacom Archive back to 1992 may be searched at:
> http://taxacom.markmail.org
>
> Celebrating 27 years of Taxacom in 2014.
>



More information about the Taxacom mailing list