XLIFF mixed language -- any tool/process to input into a translation workflow?
Thread poster: SoerenB

SoerenB  Identity Verified
Denmark
Local time: 01:21
English to Danish
+ ...
Jul 26

Sorry if this has been discussed in the past (would have thougth so, but find none).

The normal process for XLIFFs is to get:
either a delta-only file - with untranlated translation units with the new/changed source
or a full, untranslated file - with all units in the source language (and then use a memory to 'translate' the unchanged stuff).

But if you get from a customer a bunch of XLIFF files with a mix of translated units (translations
... See more
Sorry if this has been discussed in the past (would have thougth so, but find none).

The normal process for XLIFFs is to get:
either a delta-only file - with untranlated translation units with the new/changed source
or a full, untranslated file - with all units in the source language (and then use a memory to 'translate' the unchanged stuff).

But if you get from a customer a bunch of XLIFF files with a mix of translated units (translations in the target element of the translation unit), and untranslated units (in this case English - which is populated to the target element).

In the CAT tools I know of this could never work. Because the already translated values would be considered (unknown) source. Unless there is some attribute info for those units of unchanged content.

Is it just me not knowing of a trick in this or that CAT tool for allowing such mixed language XLIFFs and still getting to easily learn of exactly what units are new/changed and have source to translate?

So far I told the customer, that this is a no-go, and requested either delta-only or full source and memory. But am I ignorant of a way to handle mixed language XLIFFs?
Collapse


 

Rodolfo Raya  Identity Verified
Local time: 21:21
English to Spanish
Wrong tools? Jul 29

Today most modern tools can handle partially translated XLIFF files without problems.

Having an XLIFF file that has target text in some segments and TM matches in others is quite common. You haven't mentioned what tools you use but it looks like they are outdated or not mainstream apps.

Regards,
Rodolfo M. Raya


Jorge Payan
 

Samuel Murray  Identity Verified
Netherlands
Local time: 01:21
Member (2006)
English to Afrikaans
+ ...
Some do, some don't Jul 29

Rodolfo Raya wrote:
Having an XLIFF file that has target text in some segments and TM matches in others is quite common. You haven't mentioned what tools you use but it looks like they are outdated or not mainstream apps.


I agree with Rodolfo that there are definitely CAT tools that can deal with the type of file you mention (in which some text is translated and some is not, and/or the non-translated text is either pre-segmented or pre-translated or not). For example, as far as I know, Wordfast Pro 3 and OmegaT vanilla can't handle such files, but Wordfast Pro 5 and OmegaT with the Okapi plugin can. Trados and MemoQ can also handle such files.

--

SoerenB wrote:
The normal process for XLIFFs is to get:
either a delta-only file - with untranlated translation units with the new/changed source
or a full, untranslated file - with all units in the source language (and then use a memory to 'translate' the unchanged stuff).


I don't think that this is the only "normal" process. A small number of my clients send me files that you refer to as "delta files", i.e. they contain only segments whose source text had changed. But most of my clients who send me XLIFF files send either 100% untranslated files or partially translated files.

In the CAT tools I know of this could never work. Because the already translated <target> values would be considered (unknown) source.


I've read this several times, but I don't see the logic of that... unless you're using a CAT tool that reads the target field as if it contains the source text (some CAT tools with more primitive XLIFF filters do that, yes).

Unless there is some attribute info for those units of unchanged content.


I suspect that what you're talking about is specifically software localization translation, where the originating systems can distinguish between changed and unchanged content, and in which each segment has a unique key (so that multiple identical segments can have non-identical translations).

Not all XLIFF workflows work on the principle of changed vs. unchanged content, however. Some clients create XLIFF files by converting the entire new version of a file (or project) to XLIFF, and then pre-translating unchanged content against the TM. In other words, they don't distinguish between changed and unchanged content when creating the XLIFF file, and they rely on in-context-matching instead of segment keys to ensure that the right translation sits with the right source text.

So far I told the customer, that this is a no-go, and requested either delta-only or full source and memory.


Well, one thing you should not do is use the term "delta". (-:

[Edited at 2019-07-29 16:16 GMT]


 

SoerenB  Identity Verified
Denmark
Local time: 01:21
English to Danish
+ ...
TOPIC STARTER
For XLIFF without 'state' or similar to flag which TUs store existing translations Jul 30

Thank you for all comments. I confess lack of knowledge about some modern CAT tools, and I should have been more precise.

I know that you can add your own information to an XLIFF file, so fx. CAT tools can flag an internal (known only to that tool) status.

My question relates to some overly simple XLIFFs where the TUs look like this:

<trans-unit id="1586926" datatype="plaintext" size-unit="char" maxwidth="1000">
<source
... See more
Thank you for all comments. I confess lack of knowledge about some modern CAT tools, and I should have been more precise.

I know that you can add your own information to an XLIFF file, so fx. CAT tools can flag an internal (known only to that tool) status.

My question relates to some overly simple XLIFFs where the TUs look like this:

<trans-unit id="1586926" datatype="plaintext" size-unit="char" maxwidth="1000">
<source>Electric</source>
<target>Elektric</target>
</trans-unit>

I.e. no use of 'state' attribute or internal attributes to inform the CAT tool or the translator about which target elements have existing translations, and which have a copy of the source, needing translation.

To me there is nothing in that syntax to tell me or the CAT tool to know which of the hundreds of thousands target element strings need translation, and which are just the existing translated values.

[Edited at 2019-07-30 08:43 GMT]
Collapse


 

Rodolfo Raya  Identity Verified
Local time: 21:21
English to Spanish
Default values Jul 30

In a translation unit like the one in your example, you start with the default attribute values.

Any translation present in ‹target› is just a hint unless the "approved" attribute of ‹trans-unit› is added and set to "yes". If the attribute is missing, the segment needs review before the translation can be used. CAT tools are used to set the attribute value.

Regards,
Rodolfo

[Edited at 2019-07-30 10:20 GMT]


 

SoerenB  Identity Verified
Denmark
Local time: 01:21
English to Danish
+ ...
TOPIC STARTER
Thanks for help - must get back to the customer Jul 30

Thank you, Rodolfo and Samuel for your valuable insigths into this.

So when I only get thousands of files in such simple structure. And with obvious mix of already translated and requiring translation - but no trace of state or approved attribute/value - then I am surely missing something from the customer. Or they must reproduce the xliffs to include those attributes.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 01:21
Member (2006)
English to Afrikaans
+ ...
Yes Jul 30

Rodolfo Raya wrote:
In a translation unit like the one in your example, you start with the default attribute values. ... Any translation present in ‹target› is just a hint unless the "approved" attribute of ‹trans-unit› is added and set to "yes". If the attribute is missing, the segment needs review before the translation can be used.


That is my interpretation as well (though I'm not an expert at this at all).

If the XLIFF file was created by a tool that implements the XLIFF specification correctly, or is to be edited by a tool that implements the specification correctly, then if ‹trans-unit› has no "approved" attribute, then the "approved" attribute's value is automatically "no". However, if the client's XLIFF tool does not implement the specification correctly, e.g. it uses some other way to determine if a segment is approved or not, then it may not matter what the XLIFF file itself declares about it.

An example (a poor example, perhaps, but an example nonetheless) is Virtaal. The XLIFF files created by the Virtaal utilities do not rely on the value of "approved" in the ‹trans-unit›, but instead relies on the "state" of the ‹target›. If the state is not mentioned, then Virtaal assumes the state is "needs work" (whatever that means), and if the segment is translated, then Virtaal sets the "state" of the ‹target› to e.g. "translated" or "reviewed", without making any edits to the "approved" in the ‹trans-unit›. This goes against the specification, but: as long as translators who work on XLIFF files that were created by the Virtaal utilities use Virtaal (or the Virtaal utilities) to work on those XLIFF files, everything will work out all right.

A pragmatic CAT tool might make assumptions behind the user's back. For example, in the absence of "approved" in the ‹trans-unit›, the developers may have decided not to assume "approved=no" (as specified in the specifications) but instead to assume things like: (a) if the ‹target› is empty, then the segment is untranslated, (b) if the ‹target› is the same as the ‹source›, then the segment is untranslated, and (c) if the ‹target› is different from the source, then it is translated.

One CAT tool might assume that if the source and target text are identical, then the target text is not translated but simply contains the untranslated source text, whereas another might assume the opposite: if the target text is not empty, then whatever is in the target text is considered the final translation upon delivery.

SoerenB wrote:
So when I only get thousands of files in such simple structure, and with obvious mix of already translated and requiring translation -- but no trace of state or approved attribute/value -- then I am surely missing something from the customer.


Well, one of the first things you can do is to ask the client what XLIFF program they're using. You may be able to figure out what assumptions their system makes if you know what the client's XLIFF tool is. Another thing you can do is ask the client (assuming he is knowledgeable enough to know what you're asking), e.g. "how can I distinguish between segments that need no review and segments that do need review", etc. or ask questions that lead you to the answer, e.g. "do you want me to review existing translations".

You can also make some assumptions yourself, e.g. if I were given such a stateless file and I was asked to "translate" and not to review (e.g. ignore existing translations, or e.g. not getting paid for 100% matches), then I would have assumed that any segment that has either no ‹target› or an empty ‹target› or a ‹target› that is the same as the source text, must be translated by me, and that any other ‹target› should be considered to be in a final state, not to be touched by me.

Remember, the fact that text has a certain status in the XLIFF file does not necessarily mean that that is the state that the client believes the text has. If a client uses his XLIFF tool in an "incorrect" way, or if the client's XLIFF tool implements the specification in a non-standard way, then what matters is what the client says, and not what the XLIFF specification says.



[Edited at 2019-07-30 13:02 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

XLIFF mixed language -- any tool/process to input into a translation workflow?

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
Déjà Vu X3
Try it, Love it

Find out why Déjà Vu is today the most flexible, customizable and user-friendly tool on the market. See the brand new features in action: *Completely redesigned user interface *Live Preview *Inline spell checking *Inline

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search