[Taxacom] Propagation of bad sameAs statements

joel sachs jsachs at csee.umbc.edu
Wed Sep 8 09:42:45 CDT 2010


I'd like to catalog sources of biodiversity information and misinformation 
on the semantic web, and am trying to determine the genesis of some 
unfortunate owl:sameAs statements.

According to sameas.org:

<http://dbpedia.org/resource/Invasive_species>
    <owl:sameAs>
       <http://dbpedia.org/resource/Invasive_plant>
       <http://dbpedia.org/resource/Invasive_animal>
       <http://dbpedia.org/resource/Invasive_organism>
       <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000007de24>
      (many other concepts)

Checking out the dbpedia resources that are the objects of the sameAs 
assertions, we see that each redirects to
http://dbpedia.org/resource/Invasive_species. But other than 
dbpedia:Invasive_species including a sameAs link to 
freebase:Invasive_species, no dbpedia page, afaict,  makes the sameAs assertions listed above.

However, http://rdf.freebase.com/rdf/guid.9202a8c04000641f800000000007de24 
does assert:

<http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000007de24>
    <owl:sameAs>
       <http://dbpedia.org/resource/Invasive_species>
       <http://dbpedia.org/resource/Invasive_plant>
       <http://dbpedia.org/resource/Invasive_organism>
       <http://dbpedia.org/resource/Invasive_animal>
       etc.


The direction of propagation is not explicit. One possibility is that 
sameas.org is inferring that "A sameAs B" based on "A redirects to B", and 
that these assertions are making their way into freebase. Another is that 
a freebase contributor is making the sameas inferences, and that they are 
being picked up by sameas.org. (Similar cycles of sameAs can be found for 
"habitat", "introduced_species", and many other concepts.)

So, a request for the sameas.org folks: Would it be possible to include a 
provenance column for all sameAs assertions you keep track of?  In cases 
where the sameAs assertion isn't actually asserted on the web, you could 
indicate the provenance as "inferred" in the provenance column. Also, have 
you published the heuristics you use (if any) to infer sameAs relations?

And questions for freebase contributors: Are any of you running a script 
that either a) loads in assertions from sameas.org, or b) deduces sameAs 
relations from dbepedia redirection behaviour?

Thanks!
Joel.






More information about the Taxacom mailing list