How to handle segments with too many tags ?
Thread poster: Michael Mestre
Michael Mestre
Michael Mestre
France
Local time: 23:52
English to French
+ ...
Sep 13, 2010

Dear colleagues,

I am currently working on a PPTX document.
I was delighted to see that OmegaT's latest version (2.1.8_1) now aggregates the hundreds of tags that XML files are riddled with.
Everything now seems more manageable.

However, I still have many segments that look like this fictional example :

"
What[t0/] [t1/]is the[t2/] Internal Management Sales Program (IMSP) [t3/]and[t4/] [t5/]how[t6/] [t7/]does[t8/] [t9/]the[t10/] [t11/]co-o
... See more
Dear colleagues,

I am currently working on a PPTX document.
I was delighted to see that OmegaT's latest version (2.1.8_1) now aggregates the hundreds of tags that XML files are riddled with.
Everything now seems more manageable.

However, I still have many segments that look like this fictional example :

"
What[t0/] [t1/]is the[t2/] Internal Management Sales Program (IMSP) [t3/]and[t4/] [t5/]how[t6/] [t7/]does[t8/] [t9/]the[t10/] [t11/]co-operation[t12/] [t13/]with[t14/] internal [t15/]elements[t16/] [t17/]work[t18/]?
"
[edit: I replaced the inferior/superior characters with [/] so that the tags can be displayed in the forum]

I wanted to ask you how such segments could be handled ?
Is it a good strategy to dump all the tags at the beginning / end of the segment ?
Or should I first insert the tags into the empty target segment, then fill in the blanks by respecting the structure as much as possible ?

I understand that the first strategy might result in a different layout, while the second one makes proofreading inside OmegaT almost impossible (and increases the probability of missing words from the source segment).

Also, as I do not own a recent version of Microsoft Office, I do not have the option of modifying the source document (and doing this in Openoffice would probably alter the layout).

What would you do in a such a case ?

Thank you !
Michael


[Edited at 2010-09-13 10:11 GMT]

[Edited at 2010-09-13 10:13 GMT]
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 23:52
English to French
+ ...
Take care of real formating tags and group the others Sep 13, 2010

(You would have quicker answers by asking in the Yahoo support group.)

Michael Mestre wrote:
However, I still have many segments that look like this fictional example :

"
What[t0/] [t1/]is the[t2/] Internal Management Sales Program (IMSP) [t3/]and[t4/] [t5/]how[t6/] [t7/]does[t8/] [t9/]the[t10/] [t11/]co-operation[t12/] [t13/]with[t14/] internal [t15/]elements[t16/] [t17/]work[t18/]?
"
I wanted to ask you how such segments could be handled ?
Is it a good strategy to dump all the tags at the beginning / end of the segment ?

If the tags are not formating tags, yes.

To know whether the tags are formating tags, keep the original open in parallel in PowerPoint.

Or should I first insert the tags into the empty target segment, then fill in the blanks by respecting the structure as much as possible ?

To work like that (which is often what I do), I find it more efficient to use Source text to populate the target (from Editing Behaviour in Options).


I understand that the first strategy might result in a different layout, while the second one makes proofreading inside OmegaT almost impossible (and increases the probability of missing words from the source segment).

A small add-on from Marc Prior allows to display the target segment without any tags.


Also, as I do not own a recent version of Microsoft Office, I do not have the option of modifying the source document (and doing this in Openoffice would probably alter the layout).

What would you do in a such a case ?

I would first get at least the free converters from Microsoft. It doesn't allow editing new .pptx features (which are then converted to images in the .ppt version, but kept intact when you convert back), but you can at least change the basic layout and see how your target document will look like.

Didier


 
Michael Mestre
Michael Mestre
France
Local time: 23:52
English to French
+ ...
TOPIC STARTER
Thanks for your answer ! Sep 13, 2010

Thanks a lot for your answer Didier.

I find the TCL scripts very useful. I will use them, they solve most of my problems.
Then I can try to visually inspect the PPT to detect the "important" formatting tags and just dump the rest at the end of the segments.

Actually, a nice feature addition would be for OmegaT to highlight (in different colors for instance) the formatting tags so that the user can understand right away that their location has to be preserved.
... See more
Thanks a lot for your answer Didier.

I find the TCL scripts very useful. I will use them, they solve most of my problems.
Then I can try to visually inspect the PPT to detect the "important" formatting tags and just dump the rest at the end of the segments.

Actually, a nice feature addition would be for OmegaT to highlight (in different colors for instance) the formatting tags so that the user can understand right away that their location has to be preserved.

Best regards,
Michael
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 23:52
English to French
+ ...
OmegaT cannot differentiate the tags Sep 13, 2010

Michael Mestre wrote:
Actually, a nice feature addition would be for OmegaT to highlight (in different colors for instance) the formatting tags so that the user can understand right away that their location has to be preserved.

Differentiating the tags is a nice idea, but would involve a lot of work (I'm not speaking of the display, just trying to understand the role of each tag) with little result, since most tags are invisible/useless formatting.

Furthermore, since we're now aggregating tags, we're mixing most of the time useless and useful ones.

Didier


 
Michael Mestre
Michael Mestre
France
Local time: 23:52
English to French
+ ...
TOPIC STARTER
True.. Sep 13, 2010

.. I had not thought about this detail !

But maybe some modules that know about the meaning of the tags for specific formats (ODT, PPTX, etc..) could help identify the most common ones (such as text style changes).
As for the grouping, it is indeed an issue unless some grouping rules could be defined - in this way, the "special" tags identified by various means could be excluded from the grouping (and unless the document is badly messed up, the number of such tags should be lo
... See more
.. I had not thought about this detail !

But maybe some modules that know about the meaning of the tags for specific formats (ODT, PPTX, etc..) could help identify the most common ones (such as text style changes).
As for the grouping, it is indeed an issue unless some grouping rules could be defined - in this way, the "special" tags identified by various means could be excluded from the grouping (and unless the document is badly messed up, the number of such tags should be low).

Another issue that I often have with tags is that they are numbered, which makes it impossible to change their order (is that true ?)
Example of what usually happens :

English sentence : "I only like Indian food".
French sentence : "J'aime uniquement la nourriture indienne"

(Impossible to swap bold and italic).
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 23:52
English to French
+ ...
Various tag questions Sep 13, 2010

Michael Mestre wrote:
But maybe some modules that know about the meaning of the tags for specific formats (ODT,

OmegaT knows the meaning of .odt tags.
E.g., [f] is for formatting. I was speaking of OpenXML ("MS 2007") tags.

PPTX, etc..)
could help identify the most common ones (such as text style changes).[

Possibly, although that brings a lot of other (development) trouble.
As usual, (nearly) everything is possible given sufficient time and resources.


Another issue that I often have with tags is that they are numbered, which makes it impossible to change their order (is that true ?)
Example of what usually happens :

English sentence : "I only like Indian food".
French sentence : "J'aime uniquement la nourriture indienne"

(Impossible to swap bold and italic).

It is perfectly possible to do it (*except* for OpenXML), see the manual on tag handling.
You will get a warning with Ctrl+T, but that won't prevent from opening the target document.

Didier


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 18:52
Russian to English
+ ...
automatic spell-checking? Sep 14, 2010

Marc suggests that since just about every word is tagged, automatic spelling checking may be causing the problem. Try turning that off in PPT, if possible.

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 23:52
Member (2006)
English to Afrikaans
+ ...
Add-on to remove tags Sep 14, 2010

Didier Briel wrote:
A small add-on from Marc Prior allows to display the target segment without any tags.


I see in a later post that the OP finds Marc's script useful, but allow me to mention mine. If you have Windows, you can use my TextTagRem script (or add-on, as Didier calls it) to remove all tags (the text remains) or to remove all text (the tags remain). My script is a little less sophisticated than Marc's because in my script the script works on what is currently in the current segment's target field, so to use it you have to have OmT autocopy the source to the target. My script will also work on fuzzy matches that you have inserted.


 
Michael Mestre
Michael Mestre
France
Local time: 23:52
English to French
+ ...
TOPIC STARTER
Answers Sep 14, 2010

@Didier: thanks for the suggestion, I will have a look at the manual.

@Susan: yes, this will probably improve things a lot. I will try next time..

@Samuel: thanks for this link ; I (un)fortunately do not own Windows, but I am sure that our colleagues will benefit from your script.


 
Marina Herrera
Marina Herrera
United States
Local time: 18:52
French to English
+ ...
Google translator puts in the tags-- Nov 4, 2010

I was having a hard with the tags, but recently discovered that Google translator when using Ctrl M to insert their suggested translation puts in the tags and while it does a lousy job of translating words broken by the tags, you can use about 50-70% of their other suggested translated terms (En to Es) and work them around the tags. It has been a great discovery for me and a great time saver.
Marina


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 23:52
English to French
+ ...
There are other ways to insert the tags Nov 4, 2010

Marina Herrera wrote:
I was having a hard with the tags, but recently discovered that Google translator when using Ctrl M to insert their suggested translation puts in the tags and while it does a lousy job of translating words broken by the tags, you can use about 50-70% of their other suggested translated terms (En to Es) and work them around the tags.

Google Translate is not the only way. To insert the tags automatically, you can:
- Use 'The source text' option in Options/Editing Behaviour. That way, you copy automatically the source text into the target segment, and can work around the tags.
- Use the Edit/Insert Source Tags menu entry. It inserts all the source tags into the current segment. (It's available since version 2.1.2)

Didier


 
Ronja Addams-Moring
Ronja Addams-Moring  Identity Verified
Finland
Local time: 00:52
Finnish to Swedish
+ ...
Now we have Project > Properties > Remove Tags Aug 1, 2013

This is for others who may also be confused, because the "Remove tags" feature in OmegaT seemed to disappear.


How to use "Remove Tags" in OmegaT now (this works at least in version 2.6.3, I've tested):

"Remove Tags" is now a project property feature, instead of being a global option, so use:

Project > Properties > Remove Tags
or
Ctrl E and check Remove Tags.


What happened to cause my confusion:

Didier's p
... See more
This is for others who may also be confused, because the "Remove tags" feature in OmegaT seemed to disappear.


How to use "Remove Tags" in OmegaT now (this works at least in version 2.6.3, I've tested):

"Remove Tags" is now a project property feature, instead of being a global option, so use:

Project > Properties > Remove Tags
or
Ctrl E and check Remove Tags.


What happened to cause my confusion:

Didier's post here: http://sourceforge.net/p/omegat/feature-requests/755/

says that in OmegaT 2.5 there is a Tools > Remove tags menu choice (I have also seen this functionality mentioned thus elsewhere on and off ProZ).

However, the Remove tags functionality is not under that menu anymore, as I found out from this post: http://tech.dir.groups.yahoo.com/group/OmegaT/message/26504


THANK YOU dear OmegaT team for making this a standard feature, my newest project would have been such a pain without this option.

[Edited at 2013-08-01 17:20 GMT]
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How to handle segments with too many tags ?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »