Extracting text from PDFs in Acrobat
Thread poster: Mary Worby
Mary Worby
Mary Worby  Identity Verified
United Kingdom
Local time: 04:39
German to English
+ ...
Nov 4, 2002

Folks,



More and more of my work is coming in in PDF format. Which currently means either copying and pasting the text into word and all the associated rigmarole of reformatting the text, getting rid of paragraph marks, etc. or just printing the thing out and starting from scratch. Neither of which is a particularly time-effective soluton.



What I would like is a system that allows me to extract the text directly into a Word or PDF format. It does not have
... See more
Folks,



More and more of my work is coming in in PDF format. Which currently means either copying and pasting the text into word and all the associated rigmarole of reformatting the text, getting rid of paragraph marks, etc. or just printing the thing out and starting from scratch. Neither of which is a particularly time-effective soluton.



What I would like is a system that allows me to extract the text directly into a Word or PDF format. It does not have to be perfectly formatted, but it would be nice to have flowing text which is all in the right order.



I\'m tempted to get the full version of Acrobat, which allegedly allows you to save in RTF format. My question is whether this actually works! Does it do what it says on the tin, or are there reasons why this would not be the right way to go?



Thanks in advance for any suggestions!



Mary
Collapse


 
Joeri Van Liefferinge
Joeri Van Liefferinge  Identity Verified
Belgium
Local time: 05:39
English to Dutch
+ ...
Full version of Acrobat is not the ideal solution either Nov 4, 2002

The full version allows you to save in rtf format, but it\'s not marvellous either: there\'s a hard return after each line and if there are columns in your document, everything is mixed up.

The best solution is to ask for the original text, but I know that clients often say that they don\'t have access to that.



I for one treats pdf files as texts I receive on paper or by fax, which means that I charge extra for them.



fwiw



<
... See more
The full version allows you to save in rtf format, but it\'s not marvellous either: there\'s a hard return after each line and if there are columns in your document, everything is mixed up.

The best solution is to ask for the original text, but I know that clients often say that they don\'t have access to that.



I for one treats pdf files as texts I receive on paper or by fax, which means that I charge extra for them.



fwiw





Joeri
Collapse


 
E-nauta
E-nauta  Identity Verified
Spain
Local time: 05:39
Member (2002)
English to Spanish
+ ...
A few global replacements Nov 4, 2002

Hi All,



I assume that you know how to copy and paste the entire text in Word. From that point, my personal solution to get flowing text with a good accuracy (in terms of good flowing) is to make 3 global replacements:



1- Replace every period followed by a paragraph mark with a unique tag like ZZZ.

2- Replace every remaining paragraph mark with nothing.

3- Replace ZZZ with a period followed by a paragraph mark.



And
... See more
Hi All,



I assume that you know how to copy and paste the entire text in Word. From that point, my personal solution to get flowing text with a good accuracy (in terms of good flowing) is to make 3 global replacements:



1- Replace every period followed by a paragraph mark with a unique tag like ZZZ.

2- Replace every remaining paragraph mark with nothing.

3- Replace ZZZ with a period followed by a paragraph mark.



And that\'s all I do globally. Then, I guess you have to take care of the 5% (or whatever) remaining.



Please note that this could not be a good idea if there are lots of circumstances in which there is a natural paragraph mark without a period. The accuracy may vary a lot.



Best regards,

Juan Pablo

Collapse


 
Al Gallo
Al Gallo
English to Spanish
+ ...
I use Adobe Acrobat 5.0 Nov 5, 2002

Hi Mary,

First of all I click on the small T in the toolbar, then press Ctrl and Alt together and with the mouse I select each full column independently, pasting each column into Word, where I have inserted a 2 column table. I paste the original language on the left. When all is translated, I reformat to imitate the original.

Luck

Al


 
monitor
monitor  Identity Verified
Local time: 05:39
English to German
+ ...
Professional Tool «Gemini Solo» Nov 5, 2002

If you should have repeated demand for extracting text and grafics from pdf\'s go to www.iceni.com and have a look what they offer.

Instead of spending the money for Adobe Acrobat you\'d better bought Gemini Solo.

Solves virtually exactly that question.

Kind Regards

Marcel
[addsig]


 
Ann VDP
Ann VDP
Local time: 05:39
French to Dutch
+ ...
Wordfast Nov 5, 2002

I usually extract the text by means of Wordfast. You can download Wordfast for free at http://www.champollion.net. Just follow the guidelines to install it, open the PDF file, open a new Word document and start the Wordfast session (by clicking on the Wordfast button). Normally Wordfast detects automatically that there is a PDF file opened and it subsequently asks you whether you want to import it. Simply click yes and wait... See more
I usually extract the text by means of Wordfast. You can download Wordfast for free at http://www.champollion.net. Just follow the guidelines to install it, open the PDF file, open a new Word document and start the Wordfast session (by clicking on the Wordfast button). Normally Wordfast detects automatically that there is a PDF file opened and it subsequently asks you whether you want to import it. Simply click yes and wait until Wordfast has imported the entire file. You will have to double check the document though, since the lay out tends to change (titles, columns, etc appear on a different place), but it is definitely a lot easier than copying and you don\'t have the annoying hard returns at the end of each line.



Hope it helps!



Kind regards,



Anneken
Collapse


 
Nathalie M. Girard, ALHC (X)
Nathalie M. Girard, ALHC (X)  Identity Verified
English to French
+ ...
F.Y.I. Wordfast is no longer *free* Nov 5, 2002

Good morning Anneken



I just wanted to make a little correction on your post, as this change is rather recent:



Wordfast is unfortunately no longer *free*.



You can see the pricing details on the website...



Have a great day everyone!

Nathalie



 
mckinnc
mckinnc  Identity Verified
Local time: 05:39
French to English
+ ...
Just tried what you suggested in Acrobat Nov 5, 2002

I converted a simple word file without tables into PDF then saved as .rtf. Unfortunately, I lost a lot of formatting information (line breaks, page breaks etc).



I then tried it on a typical file that I translate, including, tables footnotes and side boxes overlaid on pages. It was not too bad witha standard word table but didn\'t cope with at lot of these other things properly at all.



So it might work for straightforward texts, provided you do some refo
... See more
I converted a simple word file without tables into PDF then saved as .rtf. Unfortunately, I lost a lot of formatting information (line breaks, page breaks etc).



I then tried it on a typical file that I translate, including, tables footnotes and side boxes overlaid on pages. It was not too bad witha standard word table but didn\'t cope with at lot of these other things properly at all.



So it might work for straightforward texts, provided you do some reformatting afterwards. It should, of course, be taken as read that clients provide you with the source files. Anything else is patently stupid.
Collapse


 
Evert DELOOF-SYS
Evert DELOOF-SYS  Identity Verified
Belgium
Local time: 05:39
Member
English to Dutch
+ ...
Readiris Pro 8 Nov 5, 2002

should do the trick.



Opens PDF documents (even read-only!), and converts them into editable files you can send directly to your favorite application:



http://www.irislink.com/opt/uk/products/readiris/pc/features/index.html



Good luck





[ This Message was edited by: on 2002-11-05 1
... See more
should do the trick.



Opens PDF documents (even read-only!), and converts them into editable files you can send directly to your favorite application:



http://www.irislink.com/opt/uk/products/readiris/pc/features/index.html



Good luck





[ This Message was edited by: on 2002-11-05 12:08 ]
Collapse


 
Mary Worby
Mary Worby  Identity Verified
United Kingdom
Local time: 04:39
German to English
+ ...
TOPIC STARTER
So there is no answer! Nov 5, 2002

Thanks to you all for your suggestions. It would appear that there is no easy answer (and there I was hoping that Acrobat would solve all my problems ).



I\'ve tried the demo version of Gemini Solo in the past, and found the results less than satisfactory. Obviously, a lot depends on how well the document was created in the first place! But on the short documents I tried, I would have had to do almost as much reformatting as
... See more
Thanks to you all for your suggestions. It would appear that there is no easy answer (and there I was hoping that Acrobat would solve all my problems ).



I\'ve tried the demo version of Gemini Solo in the past, and found the results less than satisfactory. Obviously, a lot depends on how well the document was created in the first place! But on the short documents I tried, I would have had to do almost as much reformatting as when I\'ve simply copied and pasted the text .



I\'ve also used the global replace methods before, but have found this, as you say, only to be effective for texts in normal paragraphs. If a text has a lot of bullet points or other formatting, it\'s not much use.



And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em



Thanks again for all your suggestions, it looks like I may have to head back to the drawing board.



Regards



Mary
Collapse


 
Karin Adamczyk (X)
Karin Adamczyk (X)  Identity Verified
Canada
Local time: 23:39
French to English
No original files not possible Nov 5, 2002

Quote:




And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em







You probably already know this by now, ... See more
Quote:




And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em







You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own. All PDF documents are generated from some other format. That\'s the whole idea behind Acrobat -- the resulting documents are intended to be distributed to people who do not have the program that created the original files.



Here is the general description of Acrobat from the Adobe site:



Whether you create business plans, spreadsheets, graphically rich brochures, or Web sites, Adobe® Acrobat® 5.0 software lets you convert any document to an Adobe Portable Document Format (PDF) file. Anyone can open your document across a broad range of hardware and software, and it will look exactly as you intended — with layout, fonts, links, and images intact.



HTH,

Karin Adamczyk ▲ Collapse


 
Mary Worby
Mary Worby  Identity Verified
United Kingdom
Local time: 04:39
German to English
+ ...
TOPIC STARTER
So true ... Nov 5, 2002

Quote:


You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own.





I know, that\'s what makes the whole thing so bloomin\' frustrating!



Even worse was the one I had recently which was \'the customer has the original files but doesn\'t want you t... See more
Quote:


You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own.





I know, that\'s what makes the whole thing so bloomin\' frustrating!



Even worse was the one I had recently which was \'the customer has the original files but doesn\'t want you to have them\'! Talk about not making life easy - luckily the job didn\'t come to anything!



I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...



Evert - thanks for the link. Do you have any experience of the software?



Regards



Mary









[ This Message was edited by: on 2002-11-05 13:47 ]Collapse


 
Karin Adamczyk (X)
Karin Adamczyk (X)  Identity Verified
Canada
Local time: 23:39
French to English
You don't need to nag Nov 5, 2002

Quote:


I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...





All you need to do is inform your customer of your hourly rate for the extra time involved in extracting and formatting the text.



It\'s actually quite hilarious how quickly they manage to come up with the original files then!! (works well for faxed documents too, ... See more
Quote:


I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...





All you need to do is inform your customer of your hourly rate for the extra time involved in extracting and formatting the text.



It\'s actually quite hilarious how quickly they manage to come up with the original files then!! (works well for faxed documents too, but in the case of faxed documents, sometimes there really are no original documents, but some clients will decide to type them up themselves)



Good luck,

Karin ▲ Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Extracting text from PDFs in Acrobat







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »