What is, in your experience, the best OCR software nowadays?
Thread poster: Ivan Rocha, CT

Ivan Rocha, CT
Canada
English to Portuguese
+ ...
May 12, 2011

Hello.

The company I work for is considering the purchase of an OCR software.

What is, in your personal experience, the best software in the field? And what would you recommend me (the files we work with are usually .pdfs with tons of tables and text, as well as with some graphs)?

Thanks in advance for your input.

Regards,

Ivan


 

Natalie  Identity Verified
Poland
Local time: 13:21
Member (2002)
English to Russian
+ ...

Moderator of this forum
Finereader May 12, 2011

without any doubts!

 

Stanislaw Czech, MCIL  Identity Verified
United Kingdom
Local time: 12:21
Member (2006)
English to Polish
+ ...
Most likely Abbyy Fine Reader May 12, 2011

I like it however I did not try all programs on the market so I cannot be sure.
You can try a free demo (15 days, 50 pages) to see if it is good enough.
S


 

Daniel Grau  Identity Verified
Argentina
English to Spanish
Abbyy for OCR May 13, 2011

However, no PDF converter will convert complex files (tables in particular) flawlessly. PDF files were designed as a delivery mechanism, not as working files.

To see what to expect from Abbyy, do some tests here (20 free pages per month):

• https://www.ocrterminal.com

The reason I like Abbyy is, it stays away from floating text boxes and frames, which most other converters use heavily and which are a nightmare to the translator.

If your PDF
... See more
However, no PDF converter will convert complex files (tables in particular) flawlessly. PDF files were designed as a delivery mechanism, not as working files.

To see what to expect from Abbyy, do some tests here (20 free pages per month):

• https://www.ocrterminal.com

The reason I like Abbyy is, it stays away from floating text boxes and frames, which most other converters use heavily and which are a nightmare to the translator.

If your PDF files don't contain imaged text (not requiring OCR) and you just want to convert the PDF text into a Word file (actually RTF files, although they name them .doc), this one is extremely good at tables:

• http://www.pdftoword.com

And it's free. Go figure.

Bear in mind that sending files over the web raises confidentiality issues.
Collapse


 

Eileen Cartoon  Identity Verified
Local time: 13:21
Italian to English
Nuance May 13, 2011

I have nuance and it works pretty well. However I havent used Abbey so I can't compare

 

Tomás Cano Binder, BA, CT  Identity Verified
Spain
Local time: 13:21
Member (2005)
English to Spanish
+ ...
Good experiences with ABBYY FineReader May 13, 2011

Although it is far from being perfect!

It works well with scanned pages, and the Word documents it produces are generally OK. However, for complex documents sometimes it is even better to scan the text and format it yourself, since the host of tiny boxes created by FineReader are really cumbersome to work with as a translator.


 

Peter Linton  Identity Verified
Local time: 12:21
Swedish to English
+ ...
OmniPage May 13, 2011

I use OmniPage 17 very successfully.

In a computer magazine test last year, OmniPage and ABBYY both came out on top.


 

José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 10:21
English to Portuguese
+ ...
In memoriam
InFix Pro - NOT an OCR software May 13, 2011

If it's a "distilled" (i.e. not scanned) PDF, InFix is the way to go. It's a PDF editor with DTP-like resources. It lets you export tagged text to XML, translate it with your favorite tool, and then import back into the (equally tagged) PDF, preserving all formatting.

Of course, you'll have issues with partially-embedded fonts in the PDF and text swelling in translation. Yet the program lets you manage and solve them. My workflow is described in more detail ... See more
If it's a "distilled" (i.e. not scanned) PDF, InFix is the way to go. It's a PDF editor with DTP-like resources. It lets you export tagged text to XML, translate it with your favorite tool, and then import back into the (equally tagged) PDF, preserving all formatting.

Of course, you'll have issues with partially-embedded fonts in the PDF and text swelling in translation. Yet the program lets you manage and solve them. My workflow is described in more detail here.

If it's a scanned PDF, I use an old but satisfactory version (14) of OmniPage and, after translation, I rebuild the whole publication using PageMaker, editing/adjusting the illustrations with PhotoImpact. Obviously I charge the client for the DTP work too.
Collapse


 

Jo Macdonald  Identity Verified
Spain
Member (2005)
Italian to English
+ ...
Omnipage 14 May 13, 2011

Been using Omnipage for years, quite happy with it, didn't cost much either.
Great with clean Pdfs and other electronic text, not so good with dirty scans/images. It will convert these but the results are often more time-consuming to work with than typing the translation from scratch.

Just tried Pdf-to-word with a dirty scan Pdf, took about 30 mins to receive a mail saying:
Failed to convert your document - Sorry, the result converted document is too large to be sent.<
... See more
Been using Omnipage for years, quite happy with it, didn't cost much either.
Great with clean Pdfs and other electronic text, not so good with dirty scans/images. It will convert these but the results are often more time-consuming to work with than typing the translation from scratch.

Just tried Pdf-to-word with a dirty scan Pdf, took about 30 mins to receive a mail saying:
Failed to convert your document - Sorry, the result converted document is too large to be sent.

Omnipage took less than a minute to convert this file and the resulting Word doc was about 1.2 Mb. I didn't actually end up use this file but typed the translation while reading the scan, imo less time consuming than converting-correcting-processing in a Cat-checking against scan, etc.

I've only had a few instances of files that made Omnipage crash.
No experience with Abbyy.
Collapse


 

esperantisto  Identity Verified
Local time: 15:21
Member (2006)
English to Russian
+ ...
None May 13, 2011

Ivan Rocha wrote:
And what would you recommend me (the files we work with are usually .pdfs with tons of tables and text, as well as with some graphs)?


Avoid such clients. Or charge per hour for the OCR work. Whichever OCR program you choose, documents of the kind you describe will be a pain in the neck anyway.


 

Ivan Rocha, CT
Canada
English to Portuguese
+ ...
TOPIC STARTER
Can't do it... May 13, 2011

esperantisto wrote:

Ivan Rocha wrote:
And what would you recommend me (the files we work with are usually .pdfs with tons of tables and text, as well as with some graphs)?


Avoid such clients. Or charge per hour for the OCR work. Whichever OCR program you choose, documents of the kind you describe will be a pain in the neck anyway.


I have an in-house position, so I can't (or want) "avoid" this client.

As for all others who answered my question, thanks for your contribution.


 

Henning Holthusen  Identity Verified
Philippines
Local time: 20:21
English to German
+ ...
OCR (especially tables) May 13, 2011

I would also recommend ABBYY Finereader, but you got to be realistic about the results when you are talking about tables or just poor copies.
Tables are almost always a disaster and must be either extensively reformatted or simply retyped.

I currently live on the Philippines, and have a few Filipinos working for me when I need OCR documents cleaned up. I pay them for a couple of Euros a day (average wage here is maybe 100 Euros a month). They know English (official language in
... See more
I would also recommend ABBYY Finereader, but you got to be realistic about the results when you are talking about tables or just poor copies.
Tables are almost always a disaster and must be either extensively reformatted or simply retyped.

I currently live on the Philippines, and have a few Filipinos working for me when I need OCR documents cleaned up. I pay them for a couple of Euros a day (average wage here is maybe 100 Euros a month). They know English (official language in the Philippines) can type and layout.

Feel free to message me if you are interested in outsourcing some of the retyping/layouting/reviewing work. OCRs are a major time drain and hideously expensive at Western wage levels, much better to get it done in the developing world.
Collapse


 

John Robinson
United States
Local time: 07:21
ABBYY for conversion May 13, 2011

ABBYY for PDF conversion. I find it handles tables quite nicely, and if it cannot get an accurate reconstruction of the table, you can always draw the borders yourself which can be very helpful.

 

Daniel Grau  Identity Verified
Argentina
English to Spanish
@Jo Macdonald May 13, 2011

Like I said, PDFtoWord does not work with OCR.

Submit a PDF with accessible text and complex tables—it will do a good work of exporting to Word.


 

Artem Vakhitov  Identity Verified
Estonia
English to Russian
+ ...
ABBYY PDF Transformer Sep 12, 2011

ABBYY PDF Transformer. Try it on the ABBYY site. It gives excellent results on any PDFs, including scanned ones, even in automatic mode (I mostly draw boxes by hand to get the ultimate control).

The current version is 3.0 though I personally liked 2.0 better. I have one legitimate ABBYY PDF Transformer 3.0 box I don't use - send me a personal message if you're interested in getting it for not much money (the license is transferrable).


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

What is, in your experience, the best OCR software nowadays?

Advanced search






SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search