[Taxacom] GBIF & BHL

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Mon Oct 13 15:04:55 CDT 2014


>From my perspective we spend a lot of effort making things that are attractive to users, but neglect to make them also appealing to machines<
Really? I've not seen much that I find attractive or appealing! Perhaps the priority should be not how it looks or feels (to man or machine), but how reliable the data is?

Stephen

--------------------------------------------
On Tue, 14/10/14, Roderic Page <Roderic.Page at glasgow.ac.uk> wrote:

 Subject: Re: [Taxacom] GBIF & BHL
 To: "taxacom at mailman.nhm.ku.edu" <taxacom at mailman.nhm.ku.edu>
 Cc: "Donald Hobern" <dhobern at gbif.org>
 Received: Tuesday, 14 October, 2014, 8:18 AM
 
 Dear Wolfgang,
 
 A couple of quick thoughts in
 response to your intriguing post (I should declare an
 interest, I’m currently chair of the GBIF Science
 Committee, and that committee is keenly aware of issues of
 data quality).
 
 BHL has
 indeed become a associate participant, although I’m not
 aware of plans to import BHL content directly into GBIF.
 Personally I think there is a lot of value in doing this,
 but as you note there are issues with combining approximate
 name recognition and approximate name matching.
 
 If you are generating clean
 lists of names and/or occurrences, then one way forward
 would be to add those directly to GBIF. There are mechanisms
 for directly publishing to GBIF, or via data journals such
 as Biodiversity Data Journal. That said, if the goal is to
 annotate and build on data available in the GBIF portal I
 agree we could  ae a lot better use of existing taxonomic
 expertise. One challenge is finding ways to best do this
 that makes it rewarding for people to invest the time needed
 to clean data.
 
 Regarding
 the species PDFs, one immediate concern is that PDFs are not
 ideal for further reprocessing. For example, the PDF you
 mention on the web site ( http://carabidfauna.net/ChaudoirM.pdf )
 has a list of papers that I would like to extract and add to
 BioStor (which means the articles listed would then within a
 week or so become identified as “parts” in BHL). As an
 example, I added http://biostor.org/reference/143864
 based on your PDF. This could be done much  more
 efficiently if, for example, you provided the metadata in a
 machine readable format, such as the Reference Manager (RIS)
 format. Indeed, if you were willing to do this, a lot of
 these articles could be quickly added to BioStor, and hence
 to BHL.
 
 From my perspective
 we spend a lot of effort making things that are attractive
 to users, but neglect to make them also appealing to
 machines - resulting in a missed opportunity to build upon
 these efforts and create even more useful products.
 
 Regards
 
 Rod
 
 ---------------------------------------------------------
 Roderic Page
 Professor of
 Taxonomy
 Institute of Biodiversity, Animal
 Health and Comparative Medicine
 College of
 Medical, Veterinary and Life Sciences
 Graham
 Kerr Building
 University of Glasgow
 Glasgow G12 8QQ, UK
 
 Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
 Tel:  +44 141 330 4778
 Skype:  rdmpage
 Facebook:  http://www.facebook.com/rdmpage
 LinkedIn:  http://uk.linkedin.com/in/rdmpage
 Twitter:  http://twitter.com/rdmpage
 Blog:  http://iphylo.blogspot.com
 ORCID:  http://orcid.org/0000-0002-7101-9767
 Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
 ResearchGate https://www.researchgate.net/profile/Roderic_Page
 
 
 On 13 Oct
 2014, at 10:05, Wolfgang Lorenz <faunaplan at googlemail.com<mailto:faunaplan at googlemail.com>>
 wrote:
 
 Dear All,
 a few weeks ago, BHL has joined GBIF as an
 associate. Good news!
 And doesn't it
 underline, again, the urgency for more control by users?
 BHL's automatic name recognition is a
 fantastic tool when used with
 caution, but
 in combination with GBIF's "fuzzy taxon
 matches" it might
 produce so many more
 errors...
 
 GBIF does have
 excellent data! Sadly, many users will not see it because
 it
 takes an awful lot of time.
 'Manual work doesn't scale' is an
 often used argument for automatic data
 aggregation.
 Okay, but when
 it's available, why not use it?
 
 GBIF's official portal launch was in July
 2007. Since then, I was trying to
 follow
 GBIF's progress on the megadiverse family of ground
 beetles. That is:
 1) by the end of each
 year, download a complete dataset on Carabidae
 (almost 1 million last december);
 2) compare original verbatim names provided by
 the data providers with my
 own names
 database in order to spot & correct errors in GBIF's
 name
 interpretation.
 3)
 group all georeferenced records into squares (grid cells) on
 the WGS-84
 grid which I can display on a
 simple map for comparison with overview maps
 provided by GBIF.
 It takes time
 but it's not too difficult to do all that with my
 simple
 tools and limited programming skills.
 My latest results can be seen here:
 http://carabidfauna.com/CarabMap.php
 (download of Dec 2013).
 
 Putting all data into easy-to-use geospatial
 "boxes" has several
 advantages.
 E.g., I can get a checklist for each gridcell. And it
 might
 help in organizing a sort of data
 stewardship by users who know a region
 well
 and can spot errors earlier than others.
 
 Finally, taxon specialists might want to set up
 species-pages by putting
 together what they
 have: nomenclature, literature citations with BHL-links
 and an overview map
 (e.g.: http://carabidfauna.net/Orthotrichus_gilvipes.pdf
 ).
 Such species pages with authorship
 and time-stamp could then serve other
 users
 as a background for vetting data that are accessible through
 GBIF.
 Why not set up a persistent archive
 for such species PDFs?
 
 Best
 regards,
 Wolfgang
 
 ---------------------------------------
 
 Wolfgang Lorenz, Tutzing,
 Germany
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu<mailto:Taxacom at mailman.nhm.ku.edu>
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org
 
 Celebrating 27 years of
 Taxacom in 2014.
 
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org
 
 Celebrating 27 years of
 Taxacom in 2014.
 



More information about the Taxacom mailing list