Creating a translation memory from PDF documents Thread poster: Elisa Fernández Vic
|
Hello all! So, I have the following ingredients: - A number of PDFs in English and Spanish. - A Mac computer. - Omega T 3.1.8 (updating to 3.1.9 right now). - No idea what I'm doing. I want to create a translation memory for this project based on the PDF documents. How can I do this? Any help will be much appreciated. Thanks in advance! | | | Susan Welsh United States Local time: 13:41 Russian to English + ...
First you have to convert the PDFs into .DOCX or .ODT format. I do this with ABBYY Finereader, which is software you have to buy. There are others that do the same thing, but that's what I use. (Maybe someone else will suggest something cheaper.) Then you have to align the two files. LF Aligner is a good tool, and free: https://sourceforge.net/projects/aligner/ There are many ... See more First you have to convert the PDFs into .DOCX or .ODT format. I do this with ABBYY Finereader, which is software you have to buy. There are others that do the same thing, but that's what I use. (Maybe someone else will suggest something cheaper.) Then you have to align the two files. LF Aligner is a good tool, and free: https://sourceforge.net/projects/aligner/ There are many others. That will give you your TM. ▲ Collapse | | | esperantisto Local time: 20:41 Member (2006) English to Russian + ... SITE LOCALIZER ABBYY PDF Transformer | Jul 1, 2015 |
Susan Welsh wrote: I do this with ABBYY Finereader, which is software you have to buy. There are others that do the same thing, but that's what I use. (Maybe someone else will suggest something cheaper.) If all you need is extracting texts from PDF files, ABBYY PDF Transformer may be a solution. It’s actually a trimmed-down version of Finereader, thus, its price is lower. | | | Cheaper like free | Jul 1, 2015 |
Susan Welsh wrote: ...Maybe someone else will suggest something cheaper. Casualtextractor should do the tick, especially if you extract to plain text, which is good enough for creating a TMX file. And it's free. It doesn't work for scanned ("dead") PDFs, though. And yes, nothing can beat LF_Aligner, but I'm afraid it doesn't have a graphic interface in the Mac version, so you'll need the Terminal. The instructions Andras provides are very clear, though. And then there's YouAlign, a free web service that also processes PDFs. That means you'd only have to upload the PDFs. Very good, perhaps not wise to use if you signed any NDAs. Cheers, Hans
[Edited at 2015-07-01 11:26 GMT]
[Edited at 2015-07-01 11:41 GMT] | |
|
|
Dan Lucas United Kingdom Local time: 18:41 Member (2014) Japanese to English Depends on the PDFs | Jul 1, 2015 |
Elisa Fernández Vic wrote: - A number of PDFs in English and Spanish. If the PDFs are image only PDFs you will have to OCR them as described by others. OCR is not much fun, whatever software you use. Check the ouput files very carefully for errors. However, machine-readable PDFs can usually be saved as plain text files. How do you know if it's a machine-readable file? If you can select text with the mouse, it's machine-readable. Sometimes the file is protected from copying or exporting, in which case you're out of luck. If it's machine readable and not protected, using the entirely free Sumatra PDF you can simply choose "Save As..." from the File menu to save text only. The screenshot below shows me doing just that with a publicly available Japanese document. If the formatting is not too complex saving to text might be both quicker and less effort than OCR. Regards Dan | | | Didier Briel France Local time: 19:41 English to French + ...
Elisa Fernández Vic wrote: So, I have the following ingredients: - A number of PDFs in English and Spanish. - A Mac computer. - Omega T 3.1.8 (updating to 3.1.9 right now). - No idea what I'm doing. I want to create a translation memory for this project based on the PDF documents. How can I do this? What you need is an aligner. You can use LF Aligner: https://sourceforge.net/projects/aligner/ If your PDFs contain text (not images), you will be able to align directly from the PDF files. Didier | | | | Elisa Fernández Vic Spain Local time: 19:41 Member (2015) English to Spanish + ... TOPIC STARTER LF aligner issues | Jul 1, 2015 |
Hello all, Thank you very much for your valuable information I have managed to convert the files into .txt with UTF-8 and download LF_aligner. But when I try to align the two files, there is an error that I don't know how to solve. I will copy it as it shows, only changing the client's and file's name for privacy reasons: ERROR: Input file not found (No such file or directory) at line 52066 (file: /U... See more Hello all, Thank you very much for your valuable information I have managed to convert the files into .txt with UTF-8 and download LF_aligner. But when I try to align the two files, there is an error that I don't know how to solve. I will copy it as it shows, only changing the client's and file's name for privacy reasons: ERROR: Input file not found (No such file or directory) at line 52066 (file: /Users/elisafernandezvic/Desktop/TRADUCCIÓN/CLIENTES/CLIENT/MATERIAL\ DE\ REFERENCIA\ INGLÉS/\(583153876\)\ 3020\ File\ name\ EN.txt) Try again! What can I do to solve it? Thank you very much in advance. ▲ Collapse | |
|
|
Short path and short file name in ASCII | Jul 1, 2015 |
Elisa Fernández Vic wrote: ERROR: Input file not found (No such file or directory) at line 52066 (file: /Users/elisafernandezvic/Desktop/TRADUCCIÓN/CLIENTES/CLIENT/MATERIAL\ DE\ REFERENCIA\ INGLÉS/\(583153876\)\ 3020\ File\ name\ EN.txt) Try again! What can I do to solve it? Thank you very much in advance. Elisa, Try C:\name\EN.txt + second.txt Possible issues: TRADUCCIÓN/, INGLÉS/\(583153876\)\ Milan | | | Elisa Fernández Vic Spain Local time: 19:41 Member (2015) English to Spanish + ... TOPIC STARTER Success!! And now... how to merge tmx together? | Jul 1, 2015 |
Thank you! I have managed to create my first translation memory and it seems to work properly! Do I get a cookie? Next on the list: as I said, I have a bunch of texts to align. With this method, I will end with a bunch of aligned TMX files. Do I just move them all to the TM folder in OmegaT, or do I have to merge them somehow? Sorry if this is a stupid question - as I said, it's my first time trying to create my own ... See more Thank you! I have managed to create my first translation memory and it seems to work properly! Do I get a cookie? Next on the list: as I said, I have a bunch of texts to align. With this method, I will end with a bunch of aligned TMX files. Do I just move them all to the TM folder in OmegaT, or do I have to merge them somehow? Sorry if this is a stupid question - as I said, it's my first time trying to create my own TM from files. ▲ Collapse | | | Auto sub-folder | Jul 1, 2015 |
Elisa Fernández Vic wrote: Next on the list: as I said, I have a bunch of texts to align. With this method, I will end with a bunch of aligned TMX files. Do I just move them all to the TM folder in OmegaT, Elisa, put all your relevant TMs into folder tm\auto\ then look at "files in project", you will see if the TMXs are relevant or not. There is no need to merge TMXs. Milan | | | Elisa Fernández Vic Spain Local time: 19:41 Member (2015) English to Spanish + ... TOPIC STARTER Thank you very much! | Jul 2, 2015 |
Milan Condak wrote: Elisa Fernández Vic wrote: Next on the list: as I said, I have a bunch of texts to align. With this method, I will end with a bunch of aligned TMX files. Do I just move them all to the TM folder in OmegaT, Elisa, put all your relevant TMs into folder tm\auto\ then look at "files in project", you will see if the TMXs are relevant or not. There is no need to merge TMXs. Milan So it was actually this easy Thank you very much for your help! | |
|
|
Thx for the topic | Jul 3, 2015 |
and for the answers! I'am working on my TM and most of the documents are in pdf. This made it so much easier and faster, Thank you all | | | There is no moderator assigned specifically to this forum. To report site rules violations or get help, please contact site staff » Creating a translation memory from PDF documents Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |