[Taxacom] GBIF progress

Dora dora at cria.org.br
Tue Jan 6 13:32:16 CST 2009

Dear All,

I believe that when discussing GBIF's role and the results it has 
achieved, one shouldn't only focus on data quality, even though that's 
an extremely important item. I'll try to explain what I mean. 
Unfortunately Brazil has not yet adhered to GBIF, but I have followed 
its developments rather closely and I am part of a team responsible for 
structuring a similar facility in Brazil, the speciesLink network 
(http://splink.cria.org.br <http://splink.cria.org.br/>). Therefore, 
what I intend discussing is based on my experience in Brazil.

The first important role of such initiatives (GBIF, speciesLink, and 
many others) is to promote a cultural change in the direction of free 
and open sharing of biodiversity data. Such discussions as to data 
quality, fit-for-use and so on are only possible because there is a 
cultural shift and biological collections and researchers in general are 
more and more willing to openly share biodiversity data. In Brazil the 
State of São Paulo research foundation funded the speciesLink project 
from 2001 and 2005, giving some support to specific collections to share 
their data and to CRIA to develop the network. The project ended in 
October, 2005 with 36 collections and about 710 thousand records 
available on-line. In the last 3 years there have been small grants for 
specific collections, and for IT development (we have received small 
grants from our Ministry of Science and Technology, Brazilian Agencies, 
GBIF, JRS Biodiversity Foundation) but nothing specific for the 
network's core maintenance or for its overall development or a national 
program to support the participation of biological collections. In spite 
of this, we ended 2008 with 158 participating collections and 
subcollections that are sharing 2.9 million records. This, in my point 
of view shows that there is a cultural change towards open and free data 
sharing through the Internet to any or all interested. This also seems 
to express that data providers are seeing benefits in data sharing, with 
the concrete possibility of acknowledgment and feed back from users.

The second issue concerns recent IT developments that allow truly 
distributed systems where data providers have full acknowledgment and, 
what I believe to be most important, full control over their data. 
Determining what data is sensitive and what can be openly shared, at 
least within the speciesLink network, is a role of each data provider. 
This control, in my opinion, greatly helps promote data sharing. Also by 
using tools and applications it is possible to develop data cleaning 
reports that help providers in identifying possible errors and users to 
assert data quality. You can find such reports at 
http://splink.cria.org.br/dc by selecting the acronym wished.

Georeferencing is an enormous job concerning legacy data and is 
fundamental for data usage. It is a process to be achieved with sound 
and long term support and with the help of IT developments. Here we have 
the example of the biogeomancer project (http://www.biogeomancer.org/) 
whose goal was to maximize the quality and quantity of biodiversity data 
that can be mapped in support of scientific research, planning, 
conservation, and management. We also did something here at CRIA. We 
didn't alter original data, but we added new fields to the database to 
include automatic georeferencing at the municipality level. If this 
satisfies the users need, this becomes good quality data. This option is 
available in our search page 
(http://splink.cria.org.br/centralized_search). Our system also flags 
suspect records to users when downloading data.

The third aspect is data usage. I don't know what GBIF's usage 
statistics are, but at least our experience is that usage is ever 
increasing, even though we are dealing with data that does have problems 
with quality but, which is most important, data that IS AVAILABLE in a 
free and usable way.

All this to say that I think that GBIF's role to continuously increase 
data availability is fundamental and shouldn't stop or decrease to first 
consider data quality. Both are processes that have to be stimulated and 
must grow simultaneously. It is important to note that when considering 
GBIF one must consider the role of the Secretariat, of each country node 
and of each data provider. Each one is important and has a role to play.

All the best,

> Date: Tue, 06 Jan 2009 09:16:50 +0200
> From: John Irish <jirish at mweb.com.na>
> Subject: Re: [Taxacom] GBIF progress
> To: taxacom at mailman.nhm.ku.edu
> Message-ID: <49630562.6010203 at mweb.com.na>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> I think GBIF is great, but it is suffering from a bad case of GIGO. It 
> may be better to first fix what they have before piling on ever more 
> records.
> A lot of the current records are not georeferenced (e.g. Namibia: 224000 
> records, only 17000, or 13%, georeferenced). But it gets worse:
> A lot of the 'georeferenced' records are crap. I recently did a GBIF 
> search on a rectangle in inland southern Namibia, and turned up 6840 
> records. After examination, only 192 proved to be useful. The rest were 
> from stated localities in other countries in different hemispheres, but 
> with coordinates in Namibia; from parts of Namibia nowhere near my 
> search rectangle, but with coordinates inside it; marine taxa, but my 
> search rectangle was not in the sea; records identified to genus, family 
> or order only, and not very useful for species-level analysis; records 
> of species that are well-known and definitely do not occur in the area 
> (misidentifications?); apparent nomina nuda - names that I have not been 
> able to find anywhere else; fossils - not very useful for what GBIF is 
> usually used for.
> GBIF's strength, and weakness, is that it indiscriminately serves what 
> museums offer. Fix the museum collection databases and GBIF's *useful* 
> holdings will increase. Simply add more collections and the garbage will 
> increase to the point where it is no longer worth the effort to use GBIF 
> (at a rate of 1 useful record per 1000, as above, it may already be).
> Fixing collection databases is a tall order, I know. About 2 years ago, 
> I asked for copies of Namibian data holdings from a number of large 
> mammal collections that are served both on MaNIS and GBIF. The 
> georeferencing was as usual, but I spent time to fix it and gave all 
> back to the curators involved. They thanked me very politely, but when I 
> did the GBIF search mentioned above, there were some of the same 
> mistakes still unchanged. As an ex-curator, I know full well that 
> museums are understaffed and overworked, so this is no surprise.
> So, yes! Get more data into GBIF. BUT: make sure it is properly 
> georeferenced. And figure out some way to also fix the legacy data that 
> is already in there. (And as an aside: have the fixing done by someone 
> who lives in the country and speaks the language - georeferencing from a 
> distance is what caused the problem in the first place).
> John
Dora Ann Lange Canhos
Centro de Referência em Informação Ambiental
URL: www.cria.org.br 
Tel.: +55 19 3288-0466
Fax: +55 19 3249-0960

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Taxacom mailing list