Convert PDF or Images through Optical Character Recognition to text (in Google Docs)

In the email that informed me that in a few months time my Google Groups will no longer be supported or even available (ouch, that hurts) – I noticed a link that took me to a page on Google Docs that suggested that files can be uploaded to Google Docs that through OCR (Optical Character Regocnition) can be scanned and converted into text. That certainly is an interesting feature – one that deserves a trial run.

I captured a screenshot, saved it as PNG file and uploaded it to Google Docs with this ‘process through OCR’ checkbox turned on. The results? Pretty good. Next would be a test with a hand-written document.

Below is the screenshot – based on a webpage (community.oraclepressbooks.com/profile.php?aid=122&action=works)

authorProfile

I saved it to PNG file. Then I uploaded it in Google Docs:

uploadPNG

The result is a Google Docs document with the original image and the extracted text:

ocredDoc

The text that was extracted reads (with exact indentation):

Oracle SOA Suite 11g Meets Oracle Business Process Management 11g
More Info Presentation by Lucas Jeltema Why and How to Engage a Complex Event Processor from a Java Web Application (2010) More Info Presentation by Lucas Jellema for PUSQL: lnfusing Java Best Practices and Design Patterns (2010) Mgrg Infg Presentation by Lucas Jellema and Alex
Forms2Future: Journey into the Future for organizations on the Oracle Platform (2010)
Name: Lucas Jellema Location: Zoetermeer, Netherlands Subjects: Middleware; Databases; Regions: Australia; Europe; USIX-“vast Coast;

It has a few issues, indicated in bold and italic below:

Oracle SOA Suite 11g Meets Oracle Business Process Management 11g
More Info Presentation by Lucas Jeltema Why and How to Engage a Complex Event Processor from a Java Web Application (2010) More Info Presentation by Lucas Jellema for PUSQL: lnfusing Java Best Practices and Design Patterns (2010) Mgrg Infg Presentation by Lucas Jellema and Alex ???
Forms2Future: Journey into the Future for organizations on the Oracle Platform (2010)
Name: Lucas Jellema Location: Zoetermeer, Netherlands Subjects: Middleware; Databases; Regions: Australia; Europe; USIX-“vast Coast;

All in all an interesting feature. Killing Google Groups is not cool by the way (well, removing all custom pages).

Tags:,