OmegaT/indesign compatibility
Thread poster: Roy Williams
Roy Williams
Roy Williams  Identity Verified
Austria
Local time: 13:01
German to English
Oct 20, 2008

Hello all,

I've using Wordfast up until now (for ms office formats) and have started experimenting with OmegaT so that I can work with other file formats. There was no mention of this any of the documentation I've looked through but would anyone know if OmegaT can be used with Indesign and or PDF formats as well?


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 13:01
English to French
+ ...
InDesign through Rainbow Oct 20, 2008

WilRoy wrote:
I've using Wordfast up until now (for ms office formats) and have started experimenting with OmegaT so that I can work with other file formats. There was no mention of this any of the documentation I've looked through but would anyone know if OmegaT can be used with Indesign

First, InDesign should be exported to the INX format.
Then Rainbow (Okapi) can be used to create an OmegaT project using an intermediate format.

and or PDF formats as well?

What do you call "PDF formats"?
OmegaT cannot read PDF files directly, the content must be extracted or converted (by OCR) first.

Didier


 
Roy Williams
Roy Williams  Identity Verified
Austria
Local time: 13:01
German to English
TOPIC STARTER
INX Oct 21, 2008

By PDF format I meant PDF files. What is INX?

 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 13:01
English to French
+ ...
INX is an export/import format Oct 21, 2008

WilRoy wrote:
By PDF format I meant PDF files.

PDF files can either contain text.
In this case, it can be extracted by copy/pasting into Word, for instance. Some reformatting will usually have to be done to get rid of the excess linefeeds.
Or they contain images, and no CAT tool can translate images. The images must be converted to text first, using OCR software.

What is INX?

An XML intermediate format allowing to export and import document in InDesign.

Didier


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:01
Member (2006)
English to Afrikaans
+ ...
Erm... Oct 21, 2008

WilRoy wrote:
By PDF format I meant PDF files.


I know of no CAT tool that can translate PDF files. Not even the mighty Trados can do it. You may be able to translate text extracted from PDF files, and if you're clever you can put the text back yourself using a PDF editor (search the forums), but I know of no CAT tool that offers both extraction and putting it back.

What is INX?


Tell me, how do you translate InDesign files at the moment?


 
esperantisto
esperantisto  Identity Verified
Local time: 14:01
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Neither do I, but… Oct 21, 2008

Samuel Murray wrote:

I know of no CAT tool that can translate PDF files.


I vaguely remember some new and bright wannabe program asserting that it supports PDF as input, or so. Maybe, they implemented PDF-to-something conversion on-the-fly? In all respects, though, that program did not look interesting, and I even can't remember its name.


 
Roy Williams
Roy Williams  Identity Verified
Austria
Local time: 13:01
German to English
TOPIC STARTER
Pdf Oct 21, 2008

In the wordfast documentation it claims to be able to translate PDF's but also states that it "uncertain" as PDF were designed no be write protected. I reasoned that if wordfast could make such a claim, maybe there could be a better tool. I have not had to work with PDF's so I don't know if WF can actually do it.

As for indesign, the company where I work has only recently started using it. At present most of the documentation are still .doc files from which PDF's are created post tr
... See more
In the wordfast documentation it claims to be able to translate PDF's but also states that it "uncertain" as PDF were designed no be write protected. I reasoned that if wordfast could make such a claim, maybe there could be a better tool. I have not had to work with PDF's so I don't know if WF can actually do it.

As for indesign, the company where I work has only recently started using it. At present most of the documentation are still .doc files from which PDF's are created post translation. So to answer your question sam, at the moment I don't translate in Indesign. But with it's increasing use, I thought it would be prudent to find a tool to process said files.
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:01
Member (2006)
English to Afrikaans
+ ...
Not Wordfast Oct 21, 2008

WilRoy wrote:
In the Wordfast documentation it claims to be able to translate PDF's but also states that it "uncertain" as PDF were designed no be write protected. I reasoned that if Wordfast could make such a claim, maybe there could be a better tool. I have not had to work with PDF's so I don't know if WF can actually do it.


The Wordfast manual makes no such claims. Can you quote from it? The PlusTools manual does have a section on its PDF conversion functionality. I quote it here in full:

PDF

This pane offers two features: 1. extract textual contents from a PDF document currently opened with Acrobat Reader in the background, and 2. convert text from a currently opened document (typewriter-style, where all lines end with a paragraph mark) into regular text with whole paragraphs.

Both tasks are uncertain. The PDF format was created at first to be a read-only format, this is why it is CAT tool-unfriendly. Extracting text from Acrobat Reader is therefore uncertain.

Re-creating whole paragraphs in a document where each line ends with a paragraph mark (carriage return) is also an uncertain task for a machine, since it supposes an understanding of the text. A 90% success rate is usually achieved, however.


As for InDesign, the company where I work has only recently started using it. ... So to answer your question sam, at the moment I don't translate in Indesign.


Get your hands on a copy of it and find out how to export and import INX files. Then show the graphic people how to do it.


 
Roy Williams
Roy Williams  Identity Verified
Austria
Local time: 13:01
German to English
TOPIC STARTER
OK PlusTools then Oct 22, 2008

[quote]Samuel Murray wrote:



[i]PDF

This pane offers two features: 1. extract textual contents from a PDF document currently opened with Acrobat Reader in the background, and 2. convert text from a currently opened document (typewriter-style, where all lines end with a paragraph mark) into regular text with whole paragraphs.

Both tasks are uncertain. The PDF format was created at first to be a read-only format, this is why it is CAT tool-unfriendly. Extracting text from Acrobat Reader is therefore uncertain.


The text you quoted is actually what I was refering to when talking about the PDF files, I keep the plustools, wordfast documentation and training manual in a folder I refer to as wordfast docs. Because it seems like a somewhat time-intensive process with "uncertain" results, I haven't tried it. So I thought if one tool had a method of working with PDF, howerver uncertain, perhaps there was another one that could but with solid results.

Didier thanks for answering, the information you provided has proven most useful.


[Edited at 2008-10-22 05:51]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:01
Member (2006)
English to Afrikaans
+ ...
Okay, let me put it this way Oct 22, 2008

WilRoy wrote:
So I thought if one tool had a method of working with PDF, howerver uncertain, perhaps there was another one that could but with solid results.


It is my understanding that any non-OCR method of extracting text from a PDF will be flawed, because the paragraph reorderiser has to guess, based on certain rules made up by the programmer.

I have used the PlusTools method a number of times and I'm quite happy with the results, especially when the PDF is fairly simple. For shorter documents, I prefer to select and copy text by hand, for more control.


 
Roy Williams
Roy Williams  Identity Verified
Austria
Local time: 13:01
German to English
TOPIC STARTER
Hmm Oct 23, 2008

Ok so I tried extracting text from a PDF with PlusTools and The extraction itselfe is not as time intensive as I thought after reading the manual. The Problem though is none of the formating was preserved; all text (content directory, text from tables, etc.) were simply left justified. Is that limit to PlusTools ability for this particular task?

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 13:01
Member (2006)
English to Afrikaans
+ ...
Tables etc Oct 23, 2008

WilRoy wrote:
The Problem though is none of the formating was preserved; all text (content directory, text from tables, etc.) were simply left justified. Is that limit to PlusTools ability for this particular task?


Yes. PlusTools may in certain circumstances retain the character formatting, but not layout formatting. That is too difficult to guess correctly. Tables etc... forget about it. If you want a file with tables intact, pay your $9 per month here:

http://www.freepdfconvert.com/membership.asp

But even there you still need to do some post-formatting (eg removing superfluous tabs, superfluous hard returns etc).


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT/indesign compatibility






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »