Dear Wolfgang,

A couple of quick thoughts in response to your intriguing post (I should declare an interest, I’m currently chair of the GBIF Science Committee, and that committee is keenly aware of issues of data quality).

BHL has indeed become a associate participant, although I’m not aware of plans to import BHL content directly into GBIF. Personally I think there is a lot of value in doing this, but as you note there are issues with combining approximate name recognition and approximate name matching.

If you are generating clean lists of names and/or occurrences, then one way forward would be to add those directly to GBIF. There are mechanisms for directly publishing to GBIF, or via data journals such as Biodiversity Data Journal. That said, if the goal is to annotate and build on data available in the GBIF portal I agree we could  ae a lot better use of existing taxonomic expertise. One challenge is finding ways to best do this that makes it rewarding for people to invest the time needed to clean data.

Regarding the species PDFs, one immediate concern is that PDFs are not ideal for further reprocessing. For example, the PDF you mention on the web site ( http://carabidfauna.net/ChaudoirM.pdf ) has a list of papers that I would like to extract and add to BioStor (which means the articles listed would then within a week or so become identified as “parts” in BHL). As an example, I added http://biostor.org/reference/143864 based on your PDF. This could be done much  more efficiently if, for example, you provided the metadata in a machine readable format, such as the Reference Manager (RIS) format. Indeed, if you were willing to do this, a lot of these articles could be quickly added to BioStor, and hence to BHL.

>From my perspective we spend a lot of effort making things that are attractive to users, but neglect to make them also appealing to machines - resulting in a missed opportunity to build upon these efforts and create even more useful products.



Dear All,
a few weeks ago, BHL has joined GBIF as an associate. Good news!
And doesn't it underline, again, the urgency for more control by users?
BHL's automatic name recognition is a fantastic tool when used with
caution, but in combination with GBIF's "fuzzy taxon matches" it might
produce so many more errors...

GBIF does have excellent data! Sadly, many users will not see it because it
takes an awful lot of time.
'Manual work doesn't scale' is an often used argument for automatic data
Okay, but when it's available, why not use it?

GBIF's official portal launch was in July 2007. Since then, I was trying to
follow GBIF's progress on the megadiverse family of ground beetles. That is:
1) by the end of each year, download a complete dataset on Carabidae
(almost 1 million last december);
2) compare original verbatim names provided by the data providers with my
own names database in order to spot & correct errors in GBIF's name
3) group all georeferenced records into squares (grid cells) on the WGS-84
grid which I can display on a simple map for comparison with overview maps
provided by GBIF.
It takes time but it's not too difficult to do all that with my simple
tools and limited programming skills. My latest results can be seen here:
http://carabidfauna.com/CarabMap.php (download of Dec 2013).

Putting all data into easy-to-use geospatial "boxes" has several
advantages. E.g., I can get a checklist for each gridcell. And it might
help in organizing a sort of data stewardship by users who know a region
well and can spot errors earlier than others.

Finally, taxon specialists might want to set up species-pages by putting
together what they have: nomenclature, literature citations with BHL-links
and an overview map
(e.g.: http://carabidfauna.net/Orthotrichus_gilvipes.pdf ).
Such species pages with authorship and time-stamp could then serve other
users as a background for vetting data that are accessible through GBIF.
Why not set up a persistent archive for such species PDFs?

Best regards,


Wolfgang Lorenz, Tutzing, Germany
