paddy at eol.org
Wed Sep 16 08:15:55 CDT 2009
I know, Jim, it's easy to run into catch phrases, and those of us who have been at work on this stuff have been remiss on getting the thinking into press.
Some unpacking might help here.
1. Names-based. Names of organisms are associated with almost every piece of information out there (names need to be interpreted loosely). That is, if we can manage names we have a device to manage the information associated with them. Almost every database that deals with biodiversity contains a field for names. This approach was explicitly adopted by EOL, and seems to be working (yes there are glitches, and they will be addressed). Now the task is to 'externalize' that thinking so that the architecture and devices are widely available, and can be applied by anyone to any data located anywhere on the internet. The task sounds grand, but as folk like Rod will attest, the architectural parts are not the big challenge, bigger challenges lie in digitizing legacy information and engaging those who can make this work.
2. Infrastructure: The concept of an infrastructure is of a framework around which our knowledge can be organized. In today's terms, this usually means a cyberinfrastructure - i.e. one that can be called upon by computers, web sites, or developers. For biology, this needs an ability to organize information taxonomically (of which the names component forms a large part), requires informatics thinking in terms of standards, protocols, ontologies and the like to allow metadata or data to e moved around and organized, needs hardware to provide core services and this is where the dancing initiatives may have a role to play, and needs content ...
3. Semantic. There is a myriad of initiatives out there that in one way or another carry names, and information about names - and those initiatives are the source of the content for the infrastructure and that the infrastructure can organize. All exist because they provide value. It is silly to reinvent things that already exist, so what we want to do is to link these initiatives together so we can take advantage of the value that is out there. We need to link the values together so, say, we wanted to see where the sequences of a particular group of marine taxa came from, we could connect WoRMS to GenBank to Catalog of Life to OBIS or GBIF. Now we can all do this by hand. Folk like Rod demonstrate that it is feasible to automate this process. We also want to do this in a way that when new sequences get added to GenBank, or the taxonomy of the holothuria is improved, the results are automatically updated, The intent of semanticization is to set in place the standards and protocols that allow the computers to collaborate to respond to more complex queries (or analyses or visualizations) without having to put the links together by hand. These components should be freely available.
4. Integrated biology. Biology is still largely a discipline that is very fragmented, certainly from an informatics point of view. We are facing issues of a global scale that need us to understand the relationships among atmospheric and climate changes, biological changes, economies and social structure. This is my view mandates a commitment to explore how best to take all of the insights, currently distributed across a myriad of publications and in the minds of tens of thousands of biologists, and unify them. To date history and sociology have done this, the vision is that we can supplement this with new informatics tools.
I trust this helps a little
----- Original Message -----
From: "Jim Croft" <jim.croft at gmail.com>
To: "David Patterson" <paddy at eol.org>
Cc: dipteryx at freeler.nl, taxacom at mailman.nhm.ku.edu
Sent: Wednesday, September 16, 2009 8:13:00 AM GMT -05:00 US/Canada Eastern
Subject: Re: [Taxacom] globalnames?
What does 'a biology integrated through semantic names-based
infrastructure' actually mean, if we were to try and explain it to,
say, your average public service bureaucrat? Can it be done in simple
language that someone like my mother might understand? I do not think
I understand it well enough yet to have a go...
On Wed, Sep 16, 2009 at 9:49 PM, David Patterson <paddy at eol.org> wrote:
> Before you can fix problems, you need to know what they are.
> This listing serves minimally two masters, the taxonomist and the informatician. The informatician must be aware of every 'string' that has been used as a name and therefore points to some potentially useful piece of information. Within the totality of all strings lies a subset of interest to the taxonomist. Team taxonomy (the sum of all active taxonomists past and present) has a suite of rules to follow, but within them there remains considerable latitude. The GN thing is intended to develop into an infrastructure that will serve all users equally well, and because of this, it has to be inclusive of different points of view and solutions. It can achieve that through modularity. At the core lies the GNI module - the list of all names. Additional modules (in the form mostly of web services that call upon more specialist listings and projects) can be used to (for example) use validated taxonomic lists to show only the names that taxonomists like, or provide editing environments to improve the quality of the data, or perhaps offer parsing algorithms that will break names into their component parts and reassemble them into different forms to suit different users. Both through modularity and through federation, a names infrastructure can be designed to pick out only those subsets of information that are needed by particular classes of users, and then to present that information in a form that suits individual users.
> The progress from raw material to a structure that meets all our needs will be a long haul, will take much time, good will, and participation. But, the benefits of a biology integrated through a semantic names-based infrastructure make the walk well worth while, as are the conversations that accompany the promenade.
> David Patterson
Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
... in pursuit of the meaning of leaf ...
... 'All is leaf' ('Alles ist Blatt') - Goethe
David J Patterson
Senior Taxonomist, EOL
Marine Biological Laboratory
Woods Hole, Massachusetts 02543, USA.
(+) (1) 508 289 7260
dpatterson at mbl.edu
More information about the Taxacom