[Taxacom] GBIF progress
dora at cria.org.br
Tue Jan 6 13:32:16 CST 2009
I believe that when discussing GBIF's role and the results it has
achieved, one shouldn't only focus on data quality, even though that's
an extremely important item. I'll try to explain what I mean.
Unfortunately Brazil has not yet adhered to GBIF, but I have followed
its developments rather closely and I am part of a team responsible for
structuring a similar facility in Brazil, the speciesLink network
(http://splink.cria.org.br <http://splink.cria.org.br/>). Therefore,
what I intend discussing is based on my experience in Brazil.
The first important role of such initiatives (GBIF, speciesLink, and
many others) is to promote a cultural change in the direction of free
and open sharing of biodiversity data. Such discussions as to data
quality, fit-for-use and so on are only possible because there is a
cultural shift and biological collections and researchers in general are
more and more willing to openly share biodiversity data. In Brazil the
State of São Paulo research foundation funded the speciesLink project
from 2001 and 2005, giving some support to specific collections to share
their data and to CRIA to develop the network. The project ended in
October, 2005 with 36 collections and about 710 thousand records
available on-line. In the last 3 years there have been small grants for
specific collections, and for IT development (we have received small
grants from our Ministry of Science and Technology, Brazilian Agencies,
GBIF, JRS Biodiversity Foundation) but nothing specific for the
network's core maintenance or for its overall development or a national
program to support the participation of biological collections. In spite
of this, we ended 2008 with 158 participating collections and
subcollections that are sharing 2.9 million records. This, in my point
of view shows that there is a cultural change towards open and free data
sharing through the Internet to any or all interested. This also seems
to express that data providers are seeing benefits in data sharing, with
the concrete possibility of acknowledgment and feed back from users.
The second issue concerns recent IT developments that allow truly
distributed systems where data providers have full acknowledgment and,
what I believe to be most important, full control over their data.
Determining what data is sensitive and what can be openly shared, at
least within the speciesLink network, is a role of each data provider.
This control, in my opinion, greatly helps promote data sharing. Also by
using tools and applications it is possible to develop data cleaning
reports that help providers in identifying possible errors and users to
assert data quality. You can find such reports at
http://splink.cria.org.br/dc by selecting the acronym wished.
Georeferencing is an enormous job concerning legacy data and is
fundamental for data usage. It is a process to be achieved with sound
and long term support and with the help of IT developments. Here we have
the example of the biogeomancer project (http://www.biogeomancer.org/)
whose goal was to maximize the quality and quantity of biodiversity data
that can be mapped in support of scientific research, planning,
conservation, and management. We also did something here at CRIA. We
didn't alter original data, but we added new fields to the database to
include automatic georeferencing at the municipality level. If this
satisfies the users need, this becomes good quality data. This option is
available in our search page
(http://splink.cria.org.br/centralized_search). Our system also flags
suspect records to users when downloading data.
The third aspect is data usage. I don't know what GBIF's usage
statistics are, but at least our experience is that usage is ever
increasing, even though we are dealing with data that does have problems
with quality but, which is most important, data that IS AVAILABLE in a
free and usable way.
All this to say that I think that GBIF's role to continuously increase
data availability is fundamental and shouldn't stop or decrease to first
consider data quality. Both are processes that have to be stimulated and
must grow simultaneously. It is important to note that when considering
GBIF one must consider the role of the Secretariat, of each country node
and of each data provider. Each one is important and has a role to play.
All the best,
> Date: Tue, 06 Jan 2009 09:16:50 +0200
> From: John Irish <jirish at mweb.com.na>
> Subject: Re: [Taxacom] GBIF progress
> To: taxacom at mailman.nhm.ku.edu
> Message-ID: <49630562.6010203 at mweb.com.na>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> I think GBIF is great, but it is suffering from a bad case of GIGO. It
> may be better to first fix what they have before piling on ever more
> A lot of the current records are not georeferenced (e.g. Namibia: 224000
> records, only 17000, or 13%, georeferenced). But it gets worse:
> A lot of the 'georeferenced' records are crap. I recently did a GBIF
> search on a rectangle in inland southern Namibia, and turned up 6840
> records. After examination, only 192 proved to be useful. The rest were
> from stated localities in other countries in different hemispheres, but
> with coordinates in Namibia; from parts of Namibia nowhere near my
> search rectangle, but with coordinates inside it; marine taxa, but my
> search rectangle was not in the sea; records identified to genus, family
> or order only, and not very useful for species-level analysis; records
> of species that are well-known and definitely do not occur in the area
> (misidentifications?); apparent nomina nuda - names that I have not been
> able to find anywhere else; fossils - not very useful for what GBIF is
> usually used for.
> GBIF's strength, and weakness, is that it indiscriminately serves what
> museums offer. Fix the museum collection databases and GBIF's *useful*
> holdings will increase. Simply add more collections and the garbage will
> increase to the point where it is no longer worth the effort to use GBIF
> (at a rate of 1 useful record per 1000, as above, it may already be).
> Fixing collection databases is a tall order, I know. About 2 years ago,
> I asked for copies of Namibian data holdings from a number of large
> mammal collections that are served both on MaNIS and GBIF. The
> georeferencing was as usual, but I spent time to fix it and gave all
> back to the curators involved. They thanked me very politely, but when I
> did the GBIF search mentioned above, there were some of the same
> mistakes still unchanged. As an ex-curator, I know full well that
> museums are understaffed and overworked, so this is no surprise.
> So, yes! Get more data into GBIF. BUT: make sure it is properly
> georeferenced. And figure out some way to also fix the legacy data that
> is already in there. (And as an aside: have the fixing done by someone
> who lives in the country and speaks the language - georeferencing from a
> distance is what caused the problem in the first place).
Dora Ann Lange Canhos
Centro de Referência em Informação Ambiental
Tel.: +55 19 3288-0466
Fax: +55 19 3249-0960
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Taxacom