Converting a PDF document to .DOC format

From ProZ.com Wiki

Jump to: navigation, search


Introduction

We increasingly get source documents for translation in PDF format. Although it is preferable to get the source document in an editable format (after all, the source document existed in an editable format, such as Word, Quark Express or Frame Maker, before the PDF was created), we sometimes do not have an alternative and have to deal with a PDF all the same. These are the instructions for dealing with PDF source documents.

PDF in image format

A PDF in image format is a PDF that is based on images, such as a scanned document, a fax or other types in which text cannot be selected using the text selection tool in Adobe Acrobat Reader.

Counting words in an image PDF is not feasible unless you first extract the text using OCR. In fact, there is no way to process the text in such a PDF other than using OCR, unless you don't mind typing the entire document manually to obtain an editable copy.

PDF in text format

A PDF in text format is the most common PDF format. Text within the document can be selected using the text selection tool in Adobe Acrobat Reader.

Using the free Adobe Acrobat Reader, you can export the text in a PDF document as text. However, none of the formatting is retained in this case. If you have an Adobe Acrobat Standard or Professional license, then you can also export to other formats.

In all cases, you can manually select part or all of the text in a PDF document using the text selection tool, and then copy it using the copy command (usually Ctrl + C). However, depending on the original formatting of the PDF document, the copied text may lose some or all of its formatting.

One classical symptom found in texts copied from a PDF document is that each line ends with a line break (carriage return). This causes the text, once pasted into another editing environment, to be split into smaller, illogical chunks. In this case, you would have to delete all unnecessary line breaks manually. However, there is also a free tool you can use for this, called AutoUnbreak by Hollmén Digital. AutoUnbreak will remove unnecessary line breaks and retain most basic formatting (boldface, bullets, etc.) and produce an RTF version of the copied text, which you can then paste into Word and other word processors that support the Rich Text format.

(please expand)

Discussion related to this article

Please note that ProZ.com forum rules apply to this area.


Converting a PDF document to .DOC format

psicutrinius Identity Verified
Spain
Local time: 19:43
Member (2008)
English to Spanish
+ ...
PDF Image conversionJul 2, 2011

What are the OCR applications most suitable for doing this?. More specifically: Is there anyone converting straight or through an intermediate step- to word in particular?

 

Bogdan Doicin (X)
Local time: 20:43
English to Romanian
+ ...
Answer:Jul 2, 2011

ABBYY Fine Reader 10 is a very good software, if you have the time and will to do the fine details, as no OCR software converts 100% correctly. Or Adobe Acrobat Reader Professional Version, which has the direct option of exporting pdf to word. In the latter case, I can help you, because I have the software installed. PM me for more details.

 

Susan Welsh Identity Verified
United States
Local time: 13:43
Member (2008)
Russian to English
+ ...
see archivesJul 2, 2011

This topic has been discussed many, many times. Just search in the forums for "pdf" and "word" or "pdf conversion."
There is no great solution, I'll tell you that.


 

Vadim Kadyrov Identity Verified
Ukraine
Local time: 20:43
Member (2011)
English to Russian
+ ...
solid pdf converterJul 2, 2011

is the best one I have ever met (still, in case you pdf is not a collection of images).

Try it!


 

Sergei Tumanov Identity Verified
Local time: 20:43
English to Russian
+ ...
see the linkJul 3, 2011

http://finereader.abbyy.com/

 

bergazy Identity Verified
Croatia
Local time: 19:43
Croatian to Italian
+ ...
Try thisJul 3, 2011

If you don't want to spend money right now, maybe the online conversion is good idea:

PDF to Word Online — 100% Free PDF Converter to Word Format

http://www.pdfonline.com/pdf-to-word-converter/
PDF to Word Online is a free PDF Converter to editable Word format. ... PDF2Word.3") 2 | oConverter.ConvertToWord ("C:\input.pdf", "C:\output.doc") ...
Download - BCL easyConverter Desktop - BCL
... See more
If you don't want to spend money right now, maybe the online conversion is good idea:

PDF to Word Online — 100% Free PDF Converter to Word Format

http://www.pdfonline.com/pdf-to-word-converter/
PDF to Word Online is a free PDF Converter to editable Word format. ... PDF2Word.3") 2 | oConverter.ConvertToWord ("C:\input.pdf", "C:\output.doc") ...
Download - BCL easyConverter Desktop - BCL easyConverter SDK
PDF to Word Converter — 100% Free

http://www.pdftoword.com/
Use Nitro's industry-leading PDF-to-Word converter to create better quality ... easily create editable DOC/RTF files, making it a cinch to re-use PDF content in ... Our free online service is based on the industry-leading PDF-to-Word ...


Regards

M.
Collapse


 

James (Jim) Davis Identity Verified
Seychelles
Local time: 21:43
Italian to English
Thanks Bergazy online pdf almost perfectJun 23, 2013

I have a lousy conversion of a file which I did with Nuance pdf converter professional 5. I **was** looking at signing up to adobe online conversions at what seems like 16 euro or pounds sterling a month or buying the latest version 8 of the Nuance product (around 60 dollars) and pdfonline http://www.pdfonline.com/pdf-to-word-converter/ has just converted my document near perfectly. It is a pdf stuck together from word... See more
I have a lousy conversion of a file which I did with Nuance pdf converter professional 5. I **was** looking at signing up to adobe online conversions at what seems like 16 euro or pounds sterling a month or buying the latest version 8 of the Nuance product (around 60 dollars) and pdfonline http://www.pdfonline.com/pdf-to-word-converter/ has just converted my document near perfectly. It is a pdf stuck together from word files, excel files and some images of excel files. Everything is perfect except for the images which it didn't attempt to do but just copied in as images.
I downloaded adobe acrobat pro trial version and converted the same pdf, the result was identical to the free pdfonline conversion. It left the images as images. I tried and searched. The options were set on convert with ocr where necessary, but no way would it convert the images to editable text. Tried ABBYY Finereader, which is pretty good, but it treats the whole pdf as an image, when only some of it is an image, which means there is a risk of errors in text which could be just copied and pasted.

[Edited at 2013-06-23 11:39 GMT]
Collapse


 

mazharm
India
pdfonlineSep 24, 2014

Thanks james i tried pdfonline, this was fine, one can do many changes to their pdfs there.

 

Sign in to add a comment

To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »
This discussion can also be accessed via the ProZ.com forum pages.
Personal tools