Pages in topic:   [1 2] >
When a 50% match isn't a 50% match?
Thread poster: Chris S

Chris S  Identity Verified
United Kingdom
Swedish to English
+ ...
Nov 16, 2018

I did a rare CAT job today and noticed this:

Segment in TM:
Hvis der udtages biologisk materiale til en forskningsbiobank:

Segment to be translated:
Samtykke til at udtage og opbevare biologisk materiale i forbindelse med forsøget (en forskningsbiobank)

This came up as a 50% match.

Since when do four words out of 14 make a 50% match?

On the other hand, those four words do make up 50% of the
... See more
I did a rare CAT job today and noticed this:

Segment in TM:
Hvis der udtages biologisk materiale til en forskningsbiobank:

Segment to be translated:
Samtykke til at udtage og opbevare biologisk materiale i forbindelse med forsøget (en forskningsbiobank)

This came up as a 50% match.

Since when do four words out of 14 make a 50% match?

On the other hand, those four words do make up 50% of the segment in the TM. Is that what is happening? Is that normal?!
Collapse


 

Thomas T. Frost  Identity Verified
Member (2014)
Danish to English
+ ...
Report it to support Nov 16, 2018

I would report it to the CAT tool provider's support.

I don't give reductions below 75%, though, as fuzzy matches below that threshold are too unreliable (at least in MemoQ) and don't generally justify any reduction.


Laura Kingdon
 

Lincoln Hui  Identity Verified
Hong Kong
Local time: 21:40
Member
Chinese to English
+ ...
50% match Nov 16, 2018

Most CAT tools don't even display matches below 60% by default. I sometimes adjust them so that they actually have a better chance of showing certain things.

Vesa Korhonen
 

Roman Karabaev  Identity Verified
Russian Federation
Local time: 17:40
Member (2010)
English to Russian
+ ...
Well, it's normal nowadays Nov 16, 2018

MemoQ

A screenshot from MemoQ. Zero words out of two make... a 68% match.


Vesa Korhonen
 

megane_wang  Identity Verified
Spain
Local time: 14:40
English to Spanish
+ ...
I don't think I never trusted even a 80% match.... Nov 16, 2018

.... can you imagine a "50%"? I don't care about them at all.

CAT tools are a help, not THE ultimate tool (fortunately for us translators)

Ruth


Nadia Silva Castro
Laura Kingdon
 

Jean Dimitriadis  Identity Verified
France
Local time: 14:40
Member
English to French
+ ...
Which CAT tool? Nov 16, 2018

Out of curiosity, which CAT tool did you use?

Mine gives a 33% match, also counting the word "til".


 

Chris S  Identity Verified
United Kingdom
Swedish to English
+ ...
TOPIC STARTER
A program that clearly can't count Nov 16, 2018

It is large agency's own system.

I got paid 100% for this, so that's not the issue.

What bothers me is how a computer can possibly add up wrong...

And how much else it might get wrong...


neilmac
 

Nadia Silva Castro
United States
Member (2017)
German to Portuguese
+ ...
almost funny Nov 16, 2018

Roman Karabaev wrote:

MemoQ

A screenshot from MemoQ. Zero words out of two make... a 68% match.


(I)nstruction -> (co)nstruction....it's weird how your MemoQ "thinks"!


 

Nadia Silva Castro
United States
Member (2017)
German to Portuguese
+ ...
Configuration Nov 16, 2018

Jean Dimitriadis wrote:

Out of curiosity, which CAT tool did you use?

Mine gives a 33% match, also counting the word "til".



Most CAT tools allow you to set your preferred match rate, I personally set it to a minimum of 75% percent -- less than that it feels like (at least in most cases) just easier to translate from scratch.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 14:40
Member (2006)
English to Afrikaans
+ ...
It's a bit of science and a bit of magic Nov 16, 2018

Chris S wrote:
I did a rare CAT job today and noticed this:
...
This came up as a 50% match.


By character, 65% of the segment in the TM matches 40% of the segment in the source text.

I believe CAT tools that can't do proper morphological stemming/tokenization may try to strike a balance between word matching and character matching. My own CAT tool, WFC, favours character matching when the segment is short, and word matching when the segment is long. This leads to things similar to Roman's construction/instruction.

Since when do four words out of 14 make a 50% match?


Is the proposed translation 50% useful to you? If yes, then it is a true 50% match. If not, then they didn't get the magic quite right, but magic isn't precise anyway.

Jean Dimitriadis wrote:
Out of curiosity, which CAT tool did you use?
Mine gives a 33% match, also counting the word "til".


Without any morphological analysis (i.e. default tokenizer, "language unknown"), OmegaT says it's below the match threshold (i.e. below 30%). With an English tokenizer, OmegaT says it's a 38% match. But with the Danish tokenizer, OmegaT says it's a 50% match.

[Edited at 2018-11-16 18:34 GMT]


 

Endre Both  Identity Verified
Germany
Local time: 14:40
Member (2002)
English to German
Matches only start getting useful at 70-80% Nov 16, 2018

As Thomas and Nadia have mentioned, it’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.

It would be interesting to compare the development of match ratings in CAT tools over time. Unfortunately the calculation of match rates is a race to the bottom.
... See more
As Thomas and Nadia have mentioned, it’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.

It would be interesting to compare the development of match ratings in CAT tools over time. Unfortunately the calculation of match rates is a race to the bottom. For obvious reasons, agencies are interested in pushing match rates upwards until they are just this side of indefensible - higher matches are a windfall to them.

Even translators who have the occasion and the inclination to compare CAT tools tend to assume that higher match rates equate to better TM leveraging, when in fact there is scant correlation between the two, the differences mostly boiling down to the audacity of the CAT tool’s marketing team. As displayed by the examples in this thread, they are getting pretty audacious.
Collapse


 

DZiW
Ukraine
English to Russian
+ ...
culture-dependents: half-full is half-empty Nov 16, 2018

Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation.

Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrif
... See more
Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation.

Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrificing functional parts. A couple years ago I was pleasantly surprised to watch a demonstration where some app analyzed simple, complex, and compound sentences and could tell about similarity of the context--noting the antecedents (the meaning).

However, even in a new/small TM I never used a 50% fuzzy match, because I also doubt that many 'false positives' are any useful for speeding the process up
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 14:40
Member (2006)
English to Afrikaans
+ ...
@Endre Nov 17, 2018

Endre Both wrote:
It’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.


I've had the opposite experience. Especially with regard to lengthier segments, a low match would save me time if it concerns a repeated phrase. I can recall several instances when my CAT tool yielded no match but the first result in a concordance search was something that I would very much would have liked to see suggested as a fuzzy match. This is particularly true for matches consisting of consecutive words.

Here's a hypothetical example of such a no match that would have saved time and sanity:

Segment 1: Thinking of your experience with Company X over the past 7 days, please rate the following on a scale of 1 to 10:
Segment 2: Thinking of your experience with Company X over the past 7 days, and considering how the company's Y compares with that of other companies mentioned in question Z, please tell in your own words how satisfied you were with the following:


 

Chris S  Identity Verified
United Kingdom
Swedish to English
+ ...
TOPIC STARTER
The answer Nov 19, 2018

The helpdesk tells me the reason for this showing as a 50% match is because otherwise this 30% match wouldn't show up as a match at all. In other words, they're trying to help me, not rip me off. This seems reasonable.

The construction/instruction thing made me laugh/cry though


 

Lincoln Hui  Identity Verified
Hong Kong
Local time: 21:40
Member
Chinese to English
+ ...
Fragments Nov 19, 2018

Samuel Murray wrote:

Endre Both wrote:
It’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.


I've had the opposite experience. Especially with regard to lengthier segments, a low match would save me time if it concerns a repeated phrase. I can recall several instances when my CAT tool yielded no match but the first result in a concordance search was something that I would very much would have liked to see suggested as a fuzzy match. This is particularly true for matches consisting of consecutive words.

Here's a hypothetical example of such a no match that would have saved time and sanity:

Segment 1: Thinking of your experience with Company X over the past 7 days, please rate the following on a scale of 1 to 10:
Segment 2: Thinking of your experience with Company X over the past 7 days, and considering how the company's Y compares with that of other companies mentioned in question Z, please tell in your own words how satisfied you were with the following:

I think we call them fragments rather than matches. Most CAT tools definitely use them, and I often wish that they are more robust in detecting them.


Endre Both
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

When a 50% match isn't a 50% match?

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search