[Taxacom] formation of zoological names with Mc, Mac, etc.

Tony.Rees at csiro.au Tony.Rees at csiro.au
Tue Sep 1 01:08:51 CDT 2009

Rich Pyle wrote:


> The question is whether one can have a machine 
> look at a text string and *decide* which link should be used, 
> rather than having a human *tell* it (a "time consuming 
> process"); this is the crux of Francisco's point:

Yes, and also the crux of the services being developed around the Global
Names Index, which are based on existing efforts to implement "fuzzy
matching" [Tony Rees -- this is your cue...]

That's not my forte, but what I'd like to see emerge from such services is
not that the computers *decide* how to link, but rather use some sort of
standard metric of liklihood, which the Human can use to make the final
call.  Thus, though it may involve a human, I think the services could cut
down the "time consuming" part dramatically (with access to the right
databases, page images, etc.)  This is slightly different from how Doug
described it, but ultimately it would end up being the same thing.  The easy
matches (only one, high-probability match) would probably be accepted
automatically in most cases; but the not-so-certain matches would be flagged
for human scrutiny.  But even in those cases, if all the necessary resources
are one mouse-click away (e.g., page images of original literature through
BHL), then the job would be a LOT easier than it is now.


Okay, okay...

Actually I do this all the time, as a part of the process of adding names to my IRMNG genera compilation - in essence I wish to sort "incoming" names according to whether or not they are already on the master list, and add them if new.

To do this I need to match genus (and in some cases specific) names exactly or fuzzily, also cited authorities and years exactly or fuzzily, while also taking into account taxonomic placement as available. What I may end up with is (e.g.) a list of exact or near matches on scientific name pairs and degree of author similarity on a 0-1 scale using the "compare_auth" portion of taxamatch (you can test this yourself at http://www.cmar.csiro.au/datacentre/taxamatch-tests.htm if interested), additionally with other info as available e.g. higher taxonomy, cited publication info and nomenclatural notes, etc. There seems to be a numeric threshold of the computed authority similarity of around 0.2-0.4 below which *most* of the results appear likely to be different and above which *most* appear likely to be acceptably the same to a human eye: e.g. the following snippet, below. Actually the threshold is not really fixed, it seems to vary according to the characteristics of the data being compared too, but that's par for this particular course I guess.

Basically in this case the machine does not replace the human reader but the pre-sorting that the algorithm can do makes the list a lot easier to scan and spot the exceptions.

(I do similar things with searching for near matches on scientific names: use the algorithm to (1) determine if any near matches are present, and (2) present candidates for scrutiny in pre-sorted groups which are then easy to eyeball and for a human to accept or reject).

I can expand on these further as desired, but this is probably enough except for those who may be gluttons for punishment!

Regards - Tony


Here's some sample output of my "authority comparison" values of real genus name pairs with identical spelling (in this case, for fungi) - i.e. either homonyms, or duplicates for which I only need one instance (where the authors are sufficiently different I deem it a new name i.e. homonym, and will upload it as such). In general the software should expand known author abbreviations to the full version, but in at least one case below, the selected abbreviation (Bat.) is either not on my list, or associated with a different name: in fact (on checking) it is on the list twice, first associated with Batenburg, and second with Batista - obviously undesirable, not sure what is the best way forward in this instance...)


Junius ex Linnaeus, 1753
Persoon, 1801

Meschinelli, 1892
A. Straus, 1950

Tode, 1790
E.M. Fries, 1832

Meschinelli, 1898
T.N. Hermann in B.S. Sokolov, 1979

Arthur & Bisby, 1921
B. Renault, 1894

(Persoon) Roussel, 1806
S.F. Gray, 1821

Linnaeus, 1753
E.M. Fries, 1822

Renault, 1896
D. Ellis, 1916

Sprengel, 1827
Bosc ex E.M. Fries, 1829

P. Micheli ex Persoon, 1794
E.M. Fries, 1832

Tode, 1790
Tode ex Palisot de Beauvois in F. Cuvier, 1805

Nees, 1816
C.G.D. Nees ex A.T. Brongniart in F. Cuvier, 1824

Link, 1815
Link ex Brongniart in Willdenow in F. Cuvier, 1824

Nees, 1816
C.G.D. Nees ex Brongniart in F. Cuvier, 1824

C.C. Chen ex W.H. Ko, H.S. Chang, H.J. Su, C.C. Chen & L.S. Leu, 1978
C.-C. Chen, 1961

Sowerby, 1803
Persoon, 1822

Nees & T. Nees, 1818
C.G.D. Nees ex Leman in F. Cuvier, 1821

Link, 1809
Link ex Wallroth in Bluff & Fingerhuth, 1833

Fell, Statzell, I.L. Hunter & Phaff, 1970
Fell et al., 1969

Tode, 1790
Tode ex Kunze & J.K. Schmidt, 1823

Link, 1816
Link ex A.T. Brongniart in F. Cuvier, 1824

Batista, 1960

Kunze, 1817
G. Kunze & J.K. Schmidt ex E.M. Fries, 1832

Ozkose, B.J. Thomas, D.R. Davies, G.W. Griff. & Theodorou, 2001
E. Ozkose et al., 2001

Persoon, 1797
(Persoon ex E.M. Fries) S.F. Gray, 1821

Sherwood, 1986
M.A. Sherwood-Pike in F. Candoussau, K. Katomoto & M.A. Sherwood-Pike, 1986

Haller, 1768
[Haller] E.M. Fries, 1821

Korf, 1978
R.P. Korf in R.P. Korf, R.N. Singh & V.P. Tewari, 1978

(Persoon) Roussel, 1806
Persoon ex S.F. Gray, 1821

Nees, 1816
C.G.D. Nees ex S.F. Gray, 1821

Kunze, 1817
(Kunze ex Persoon) Steudel, 1824

Nees, 1816
C.G.D. Nees ex S.F. Gray, 1821

Kunze, 1817
[Kunze] E.M. Fries, 1821

Schulzer, 1866
S. Schulzer von Müggenburg in S. Schulzer von Müggenburg, A. Kanitz & Knapp, 1866

Tode, 1790
Tode ex A.J.C. Corda, 1837

Morais, Batista & Massa, 1966
Falcão de Morais et al., 1966 (Approved Lists, 1980)

Nees, 1816
C.G.D. Nees ex E.M. Fries, 1832


Trappe, Castellano & Amaranthus, 1996
J.M. Trappe, M.A. Castellano & M.P. Amaranthus, 1996

Chesters & Greenhalgh, 1964
C.G.C. Chesters & G.N. Greenhalgh, 1964

Mougeot & E.M. Fries, 1825
Mougeot & E.M. Fries ex E.M. Fries, 1825

D. Hawksworth & R. Santesson, 1990
D.L. Hawksworth & R. Santesson in H.M. Jahns, 1990

Berkeley & Broome, 1870
M.J. Berkeley & Broome, 1875

Penzig & Saccardo, 1898
Penzig & P.A. Saccardo, 1897

Nylander, 1885
Nylander in Hue, 1885

F. Stevens, 1923
F.L. Stevens, 1924

P. Karsten, 1870
P.A. Karsten, 1871

(Nannenga-Bremekamp) Nannenga-Bremekamp, 1975
(N.E. Nanninga-Bremekamp) N.E. Nannenga-Bremekamp, 1974

K.D. Hyde & Nakagiri, 1991
K.D. Hyde & Nakagiri In K.D. Hyde, 1991


More information about the Taxacom mailing list