Extracting text from PDFs in Acrobat Thread poster: Mary Worby
| Mary Worby United Kingdom Local time: 04:39 German to English + ...
Folks,
More and more of my work is coming in in PDF format. Which currently means either copying and pasting the text into word and all the associated rigmarole of reformatting the text, getting rid of paragraph marks, etc. or just printing the thing out and starting from scratch. Neither of which is a particularly time-effective soluton.
What I would like is a system that allows me to extract the text directly into a Word or PDF format. It does not have... See more Folks,
More and more of my work is coming in in PDF format. Which currently means either copying and pasting the text into word and all the associated rigmarole of reformatting the text, getting rid of paragraph marks, etc. or just printing the thing out and starting from scratch. Neither of which is a particularly time-effective soluton.
What I would like is a system that allows me to extract the text directly into a Word or PDF format. It does not have to be perfectly formatted, but it would be nice to have flowing text which is all in the right order.
I\'m tempted to get the full version of Acrobat, which allegedly allows you to save in RTF format. My question is whether this actually works! Does it do what it says on the tin, or are there reasons why this would not be the right way to go?
Thanks in advance for any suggestions!
Mary ▲ Collapse | | | Full version of Acrobat is not the ideal solution either | Nov 4, 2002 |
The full version allows you to save in rtf format, but it\'s not marvellous either: there\'s a hard return after each line and if there are columns in your document, everything is mixed up. The best solution is to ask for the original text, but I know that clients often say that they don\'t have access to that.
I for one treats pdf files as texts I receive on paper or by fax, which means that I charge extra for them.
fwiw
<... See more The full version allows you to save in rtf format, but it\'s not marvellous either: there\'s a hard return after each line and if there are columns in your document, everything is mixed up. The best solution is to ask for the original text, but I know that clients often say that they don\'t have access to that.
I for one treats pdf files as texts I receive on paper or by fax, which means that I charge extra for them.
fwiw
Joeri ▲ Collapse | | | E-nauta Spain Local time: 05:39 Member (2002) English to Spanish + ... A few global replacements | Nov 4, 2002 |
Hi All,
I assume that you know how to copy and paste the entire text in Word. From that point, my personal solution to get flowing text with a good accuracy (in terms of good flowing) is to make 3 global replacements:
1- Replace every period followed by a paragraph mark with a unique tag like ZZZ. 2- Replace every remaining paragraph mark with nothing. 3- Replace ZZZ with a period followed by a paragraph mark.
And... See more Hi All,
I assume that you know how to copy and paste the entire text in Word. From that point, my personal solution to get flowing text with a good accuracy (in terms of good flowing) is to make 3 global replacements:
1- Replace every period followed by a paragraph mark with a unique tag like ZZZ. 2- Replace every remaining paragraph mark with nothing. 3- Replace ZZZ with a period followed by a paragraph mark.
And that\'s all I do globally. Then, I guess you have to take care of the 5% (or whatever) remaining.
Please note that this could not be a good idea if there are lots of circumstances in which there is a natural paragraph mark without a period. The accuracy may vary a lot.
Best regards, Juan Pablo
▲ Collapse | | | I use Adobe Acrobat 5.0 | Nov 5, 2002 |
Hi Mary, First of all I click on the small T in the toolbar, then press Ctrl and Alt together and with the mouse I select each full column independently, pasting each column into Word, where I have inserted a 2 column table. I paste the original language on the left. When all is translated, I reformat to imitate the original. Luck Al | |
|
|
monitor Local time: 05:39 English to German + ... Professional Tool «Gemini Solo» | Nov 5, 2002 |
If you should have repeated demand for extracting text and grafics from pdf\'s go to www.iceni.com and have a look what they offer. Instead of spending the money for Adobe Acrobat you\'d better bought Gemini Solo. Solves virtually exactly that question. Kind Regards Marcel [addsig] | | | Ann VDP Local time: 05:39 French to Dutch + ...
I usually extract the text by means of Wordfast. You can download Wordfast for free at http://www.champollion.net. Just follow the guidelines to install it, open the PDF file, open a new Word document and start the Wordfast session (by clicking on the Wordfast button). Normally Wordfast detects automatically that there is a PDF file opened and it subsequently asks you whether you want to import it. Simply click yes and wait... See more I usually extract the text by means of Wordfast. You can download Wordfast for free at http://www.champollion.net. Just follow the guidelines to install it, open the PDF file, open a new Word document and start the Wordfast session (by clicking on the Wordfast button). Normally Wordfast detects automatically that there is a PDF file opened and it subsequently asks you whether you want to import it. Simply click yes and wait until Wordfast has imported the entire file. You will have to double check the document though, since the lay out tends to change (titles, columns, etc appear on a different place), but it is definitely a lot easier than copying and you don\'t have the annoying hard returns at the end of each line.
Hope it helps!
Kind regards,
Anneken ▲ Collapse | | | Nathalie M. Girard, ALHC (X) English to French + ... F.Y.I. Wordfast is no longer *free* | Nov 5, 2002 |
Good morning Anneken
I just wanted to make a little correction on your post, as this change is rather recent:
Wordfast is unfortunately no longer *free*.
You can see the pricing details on the website...
Have a great day everyone! Nathalie
| | | mckinnc Local time: 05:39 French to English + ... Just tried what you suggested in Acrobat | Nov 5, 2002 |
I converted a simple word file without tables into PDF then saved as .rtf. Unfortunately, I lost a lot of formatting information (line breaks, page breaks etc).
I then tried it on a typical file that I translate, including, tables footnotes and side boxes overlaid on pages. It was not too bad witha standard word table but didn\'t cope with at lot of these other things properly at all.
So it might work for straightforward texts, provided you do some refo... See more I converted a simple word file without tables into PDF then saved as .rtf. Unfortunately, I lost a lot of formatting information (line breaks, page breaks etc).
I then tried it on a typical file that I translate, including, tables footnotes and side boxes overlaid on pages. It was not too bad witha standard word table but didn\'t cope with at lot of these other things properly at all.
So it might work for straightforward texts, provided you do some reformatting afterwards. It should, of course, be taken as read that clients provide you with the source files. Anything else is patently stupid. ▲ Collapse | |
|
|
| Mary Worby United Kingdom Local time: 04:39 German to English + ... TOPIC STARTER So there is no answer! | Nov 5, 2002 |
Thanks to you all for your suggestions. It would appear that there is no easy answer (and there I was hoping that Acrobat would solve all my problems ).
I\'ve tried the demo version of Gemini Solo in the past, and found the results less than satisfactory. Obviously, a lot depends on how well the document was created in the first place! But on the short documents I tried, I would have had to do almost as much reformatting as ... See more Thanks to you all for your suggestions. It would appear that there is no easy answer (and there I was hoping that Acrobat would solve all my problems ).
I\'ve tried the demo version of Gemini Solo in the past, and found the results less than satisfactory. Obviously, a lot depends on how well the document was created in the first place! But on the short documents I tried, I would have had to do almost as much reformatting as when I\'ve simply copied and pasted the text .
I\'ve also used the global replace methods before, but have found this, as you say, only to be effective for texts in normal paragraphs. If a text has a lot of bullet points or other formatting, it\'s not much use.
And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em
Thanks again for all your suggestions, it looks like I may have to head back to the drawing board.
Regards
Mary ▲ Collapse | | | Karin Adamczyk (X) Canada Local time: 23:39 French to English No original files not possible | Nov 5, 2002 |
Quote:
And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em
You probably already know this by now, ... See more Quote:
And yes, the answer would be to get the customer to supply the document in the right format. It\'s especially frustrating when you\'re translating something which is patently a Word file converted into PDF, and they claim that there is no original document. Customers, eh, who\'d \'ave \'em
You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own. All PDF documents are generated from some other format. That\'s the whole idea behind Acrobat -- the resulting documents are intended to be distributed to people who do not have the program that created the original files.
Here is the general description of Acrobat from the Adobe site:
Whether you create business plans, spreadsheets, graphically rich brochures, or Web sites, Adobe® Acrobat® 5.0 software lets you convert any document to an Adobe Portable Document Format (PDF) file. Anyone can open your document across a broad range of hardware and software, and it will look exactly as you intended — with layout, fonts, links, and images intact.
HTH, Karin Adamczyk ▲ Collapse | | | Mary Worby United Kingdom Local time: 04:39 German to English + ... TOPIC STARTER
Quote: You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own.
I know, that\'s what makes the whole thing so bloomin\' frustrating!
Even worse was the one I had recently which was \'the customer has the original files but doesn\'t want you t... See more Quote: You probably already know this by now, but it is not even possible that there are no original files because Acrobat cannot create files on its own.
I know, that\'s what makes the whole thing so bloomin\' frustrating!
Even worse was the one I had recently which was \'the customer has the original files but doesn\'t want you to have them\'! Talk about not making life easy - luckily the job didn\'t come to anything!
I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...
Evert - thanks for the link. Do you have any experience of the software?
Regards
Mary
[ This Message was edited by: on 2002-11-05 13:47 ] ▲ Collapse | |
|
|
Karin Adamczyk (X) Canada Local time: 23:39 French to English You don't need to nag | Nov 5, 2002 |
Quote: I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...
All you need to do is inform your customer of your hourly rate for the extra time involved in extracting and formatting the text.
It\'s actually quite hilarious how quickly they manage to come up with the original files then!! (works well for faxed documents too, ... See more Quote: I was just hoping there would be an answer which didn\'t involve nagging the customer for the original files ...
All you need to do is inform your customer of your hourly rate for the extra time involved in extracting and formatting the text.
It\'s actually quite hilarious how quickly they manage to come up with the original files then!! (works well for faxed documents too, but in the case of faxed documents, sometimes there really are no original documents, but some clients will decide to type them up themselves)
Good luck, Karin ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Extracting text from PDFs in Acrobat Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |