[Taxacom] human involvement (was Re: BioNames)
deepreef at bishopmuseum.org
Tue Jun 4 04:30:14 CDT 2013
> Again, it's the genius of 'and' - we do both.
> So, why can't we do this? Why not strive for clean, structured
> data (i.e., clean names linked to relevant nomenclatural events),
> but at the same time (AND IN THE SAME PLACE) give people what
> we currently have so they have a fighting chance of coming way
> with some information?
Thank you for providing the perfect description of the GNA approach!
Indeed, Rod, why not strive for clean, structured data (i.e., clean names
linked to relevant nomenclatural events AND clean literature citations)
...but at the same time (AND IN THE SAME PLACE) give people what we
currently have [HINT: GNI for dirty bucket names, and RefBank for dirty
bucket literature citations].
I can indeed say "AT THE SAME PLACE" now because we have a live functioning
mirror of GNI on the same server as the master GNUB DB, and I also just
installed an instance of RefBank: http://zoobank.org/RefBank (thanks largely
to Guido Sautter).
And while we're at it, let's throw in about 42 million page images while
we're at it [HINT: BHL -- which is an integral part of the GNA effort]
The cool/exciting thing is that we are now in the process of more tightly
integrating these things so that they are more dynamically linked to each
other via APIs. For example, GNUB uses APIs at GNI and BHL to dynamically
show content (e.g., search ZooBank for your favorite Linnaeus name -- or any
one of tens of thousands of other names already linked to BHL and GNI). BHL
and GNUB are currently working together to develop tighter integration
between the two respective literature databases, and there are also plans to
harvest TNUs from BHL data (and cross-link BHL pages back to ZooBank). And
there is other work in progress to use GNI name recognition services against
BHL OCR content, and also for GNI to use the new GNUB "GNIE" service to
expand its reconciliation algorithms to homotypic and heterotypic synonyms.
And let's also not forget the ongoing work between Pensoft, IPNI, Index
Fungorum, and ZooBank to develop a mechanism for automatic submission of
content from publishers as part of the publication workflow, to capture
Obviously, there is still a long way to go to get all the pieces functioning
properly, but the pace is definitely accelerating. We need creative/clever
efforts like BioNames to invent how this will all eventually work. The
really critical part right now is to start connecting the dots between the
different projects (which really are different -- either in scope of content
or scope of function -- in the vast majority of cases). This means
developing clever *and* useful web services/APIs to allow the cross-links to
be made in real time.
More information about the Taxacom