[Taxacom] iDigBio Demo Webinar: Visualize Your Text Data Using OCR Output

Deb Paul dpaul at fsu.edu
Tue Jan 14 11:47:32 CST 2014


What: Visualize Your Text Data Using OCR Output
Why: Fast access to your data, reveal the unexpected
When: Wednesday 10 AM EST 22 January 2014
Where: http://idigbio.adobeconnect.com/augmentocr
Who: All Are Welcome!

Note: Headsets recommended for best experience with AdobeConnect 
<https://www.idigbio.org/wiki/index.php/Web_Conferencing>and please log 
in 15 minutes early if it is your first experience with AdobeConnect

Twitter: @iDigBio #citscribe #ocrviz

See your data in a whole new way! Museum specimen labels, note cards, 
field notebooks, ledgers and other primary source materials are being 
imaged in many digitization projects. Other projects plan to OCR their 
materials or have questions about what they can do with the output.

OCR text output from these sources opens a window to your data, /before/ 
the data elements are entered into the database fields. It gives you 
unprecedented, fast access to your data, revealing insights to 
facilitate research, data validation, and public participation in 
science. Come see a demonstration of how you might do this with OCR 
output. As part of the recent iDigBio CITScribe Hackathon 
<https://www.idigbio.org/content/citscribe-hackathon>, the LlLl team 
<https://www.facebook.com/photo.php?fbid=645283398848943&l=bbbcf70f3b> 
demonstrated one technique to do this visualization with Carrot^2 
<http://search.carrot2.org/stable/search> and Google charts 
<https://developers.google.com/chart/> using OCR output indexed 
<http://www.techopedia.com/definition/1210/index-idx-database-systems> 
by Apache Solr <http://lucene.apache.org/solr/> and highlighting OCR 
errors using n-gram, a probabilistic model for estimating likelihood of 
a string being a good word. Find what you want, fast and /discover/ 
unexpected informative search terms. The same approach can be used to 
guide what needs to be validated using crowdsourcing outputs, on a per 
field basis. All are welcome.

See you there! Yes, please share the link, spread the word, and yes, it 
will be recorded.
Andrea M, Jason B, Miao C, Sylvia O, Reed B, William U and @idbdeb from 
the @idigbio #citscribe LlLl Team 
<https://www.facebook.com/photo.php?fbid=645283398848943&l=bbbcf70f3b>, 
et al from the iDigBio CITScribe Hackathon and iDigBio

NB. Work inspired by a Biodiversity Information Standards (TDWG) 2013 
talk The use of OCR in the digitisation of herbarium specimens 
<http://www.tdwg.org/fileadmin/2013conference/slides/Drinkwater_OCRforHerbaria.pptx>. 
Robyn Drinkwater, Robert Cubey, and Elspeth Haston, RBGE.

keywords: OCR ML NLP SOLR GoogleCharts CARROT2

-- 
Upcoming iDigBio Events https://www.idigbio.org/outreach-events-sidebar
--Deborah Paul
iDigBio Technology Specialist
Institute for Digital Information, 234 LSB
Florida State University
Tallahassee, Florida 32306
850-644-6366




More information about the Taxacom mailing list