Tags diminishing machine translation results
Thread poster: Thijs Vissia

Thijs Vissia
Netherlands
Mar 2

I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

... See more
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory(Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):

“Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

With some tags/formatting strewn in, this became: “Ieder instituut gaat beschikken over meer geld en meer gebouwen.”

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
“Everyone institute about more money and more buildings.”

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:

“Every institute will have more money and more buildings.”

Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?
Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-03-03 10:37 GMT]
Collapse


 

Milan Condak  Identity Verified
Local time: 09:38
English to Czech
Translator can remove tags before translation Apr 1

Thijs Vissia wrote:

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


Translator can remove tags before pretranslation against TMX or using MT,

http://www.condak.cz/nove/2019-03/31/en/00.html

and put them back after pretranslation.

Milan


 

Thijs Vissia
Netherlands
TOPIC STARTER
ah Apr 1

Milan Condak wrote:

Translator can remove tags before pretranslation against TMX or using MT, (...)
and put them back after pretranslation.

Milan


hi Milan,
Ah, thank you for the clarification, I didn't realize you could put them back afterwards by toggling the option again, but of course the source file isn't changed. I somehow assumed this worked the same way as tagwipe, which does affect the source file.

I think the documentation could be a bit clearer about this, or even the option in Preferences, 'Remove tags' seems rather definitive.

But clearly this solves my problem, I can translate and use MT and manually put tags back after translating.

cheers,
Thijs


 

Samuel Murray  Identity Verified
Netherlands
Local time: 09:38
Member (2006)
English to Afrikaans
+ ...
Fixed post (your membership fee will never buy fixed forum software) Apr 2

Thijs Vissia wrote:
I was wondering about the results I’m getting from Machine Translation through OmegaT. Currently I’m using one service that is available free of charge, MyMemory (Machine). I’m noticing that a lot of idiom is not being recognised and translated accordingly when tags are inbetween several words that make up an idiom.

For example, I was translating the following sentence (in Dutch):
Ieder instituut gaat beschikken over meer geld en meer gebouwen.

With some tags/formatting strewn in, this became:
Ieder <f0>instituut gaat beschikken</f0><f1> </f1><f2>over meer geld</f2> en meer gebouwen.

In there, there is the widespread idiom “beschikken over” (meaning “have access to”, “have at ones disposal”). I don’t know about the quality of the machine translation at MyMemory, but as a widespread idiom this should be recognised and translated properly.

When the tags were in there, this was returned as:
Everyone <f0> institute </f0><f1></f1><f2> about more money </f2> and more buildings.

The connection between “ieder” and “institute” was broken by the tag, so instead of “every institute” it rendered this as “everyone institute”. Similarly, the composite verb “beschikken over”, was also interrupted by a tag, so the MT treated each piece separately, and apparently left out the verb entirely.

However, after creating a new file without tags, it came back as:
Every institute will have more money and more buildings.
Which may not be my phrasing of choice but otherwise a fine translation.

So I was wondering why the tags get sent out to the machine translation services in the first place? Is it so that all the formatting doesn’t need to be put back in manually afterwards? Wouldn’t it be almost as easy to strip the strings of the tags before sending the query to the MT service?

Considering that OmegaT already needs to recognise tags as such (to treat them differently in the editor pane), wouldn't it be possible to make sending them to the MT service optional?

It seems to me that sending the tags along is seriously reducing the quality of the MT results.


[Edited at 2019-04-02 05:54 GMT]


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Tags diminishing machine translation results

Advanced search






WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running, helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search