When a 50% match isn't a 50% match? (CAT Tools Technical Help)

Technical forums » CAT Tools Technical Help »
When a 50% match isn't a 50% match?
Track this topic

Pages in topic: [1 2] >

When a 50% match isn't a 50% match?

Thread poster: Christopher Schröder

Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...

Nov 16, 2018

I did a rare CAT job today and noticed this:

Segment in TM:
Hvis der udtages biologisk materiale til en forskningsbiobank:

Segment to be translated:
Samtykke til at udtage og opbevare biologisk materiale i forbindelse med forsøget (en forskningsbiobank)

This came up as a 50% match.

Since when do four words out of 14 make a 50% match?

On the other hand, those four words do make up 50% of the... See more

Thomas T. Frost

Portugal
Local time: 11:03
Danish to English
+ ...

Report it to support

Nov 16, 2018

I would report it to the CAT tool provider's support.

I don't give reductions below 75%, though, as fuzzy matches below that threshold are too unreliable (at least in MemoQ) and don't generally justify any reduction.

Laura Kingdon

Lincoln Hui

Hong Kong
Local time: 18:03
Member
Chinese to English
+ ...

50% match

Nov 16, 2018

Most CAT tools don't even display matches below 60% by default. I sometimes adjust them so that they actually have a better chance of showing certain things.

Vesa Korhonen

Roman Karabaev

Russian Federation
Local time: 14:03
English to Russian
+ ...

Well, it's normal nowadays

Nov 16, 2018

A screenshot from MemoQ. Zero words out of two make... a 68% match.

Vesa Korhonen

megane_wang

Spain
Local time: 12:03
Member (2007)
English to Spanish
+ ...

I don't think I never trusted even a 80% match....

Nov 16, 2018

.... can you imagine a "50%"? I don't care about them at all.

CAT tools are a help, not THE ultimate tool (fortunately for us translators)

Ruth

Nadia Silva Castro

Laura Kingdon

Jean Dimitriadis

English to French
+ ...

Which CAT tool?

Nov 16, 2018

Out of curiosity, which CAT tool did you use?

Mine gives a 33% match, also counting the word "til".

Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...

TOPIC STARTER

A program that clearly can't count

Nov 16, 2018

It is large agency's own system.

I got paid 100% for this, so that's not the issue.

What bothers me is how a computer can possibly add up wrong...

And how much else it might get wrong...

neilmac

Nadia Silva Castro
United States
English to German
+ ...

almost funny

Nov 16, 2018

Roman Karabaev wrote:

MemoQ

A screenshot from MemoQ. Zero words out of two make... a 68% match.

(I)nstruction -> (co)nstruction....it's weird how your MemoQ "thinks"!

Nadia Silva Castro
United States
English to German
+ ...

Configuration

Nov 16, 2018

Jean Dimitriadis wrote:

Out of curiosity, which CAT tool did you use?

Mine gives a 33% match, also counting the word "til".

Most CAT tools allow you to set your preferred match rate, I personally set it to a minimum of 75% percent -- less than that it feels like (at least in most cases) just easier to translate from scratch.

Samuel Murray

Netherlands
Local time: 12:03
Member (2006)
English to Afrikaans
+ ...

It's a bit of science and a bit of magic

Nov 16, 2018

Chris S wrote:
I did a rare CAT job today and noticed this:
...
This came up as a 50% match.

By character, 65% of the segment in the TM matches 40% of the segment in the source text.

I believe CAT tools that can't do proper morphological stemming/tokenization may try to strike a balance between word matching and character matching. My own CAT tool, WFC, favours character matching when the segment is short, and word matching when the segment is long. This leads to things similar to Roman's construction/instruction.

Since when do four words out of 14 make a 50% match?

Is the proposed translation 50% useful to you? If yes, then it is a true 50% match. If not, then they didn't get the magic quite right, but magic isn't precise anyway.

Jean Dimitriadis wrote:
Out of curiosity, which CAT tool did you use?
Mine gives a 33% match, also counting the word "til".

Without any morphological analysis (i.e. default tokenizer, "language unknown"), OmegaT says it's below the match threshold (i.e. below 30%). With an English tokenizer, OmegaT says it's a 38% match. But with the Danish tokenizer, OmegaT says it's a 50% match.

[Edited at 2018-11-16 18:34 GMT]

Endre Both

Germany
Local time: 12:03
English to German

Matches only start getting useful at 70-80%

Nov 16, 2018

As Thomas and Nadia have mentioned, it’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.

It would be interesting to compare the development of match ratings in CAT tools over time. Unfortunately the calculation of match rates is a race to the bottom. For obvious reasons, agencies are interested in pushing match rates upwards until they are just this side of indefensible - higher matches are a windfall to them.

Even translators who have the occasion and the inclination to compare CAT tools tend to assume that higher match rates equate to better TM leveraging, when in fact there is scant correlation between the two, the differences mostly boiling down to the audacity of the CAT tool’s marketing team. As displayed by the examples in this thread, they are getting pretty audacious. ▲ Collapse

DZiW (X)
Ukraine
English to Russian
+ ...

culture-dependents: half-full is half-empty

Nov 16, 2018

Not dwelling too much on such "secret vendors' know-hows" as hashes, checksums, shingles, clusters, vectors, Levenshtein distances, encoders, SounEx, and other weird stuff, it's just an attempt to obfuscate the fact that very idea of "similar sentences"--let alone in different language--is but an expensive miscalculation.

Little by little modern trends steadily come to per-language structural [subj-pred-obj] parts aggregation, considering synonyms and weighting antonyms while sacrificing functional parts. A couple years ago I was pleasantly surprised to watch a demonstration where some app analyzed simple, complex, and compound sentences and could tell about similarity of the context--noting the antecedents (the meaning).

However, even in a new/small TM I never used a 50% fuzzy match, because I also doubt that many 'false positives' are any useful for speeding the process up ▲ Collapse

Samuel Murray

Netherlands
Local time: 12:03
Member (2006)
English to Afrikaans
+ ...

@Endre

Nov 17, 2018

Endre Both wrote:
It’s usually only somewhere above 70% that matches actually have a chance to be useful in the sense of saving any time at all compared to a fresh translation. Lower matches can help with terminology consistency (although there are better tools for that), but they don't make you quicker.

I've had the opposite experience. Especially with regard to lengthier segments, a low match would save me time if it concerns a repeated phrase. I can recall several instances when my CAT tool yielded no match but the first result in a concordance search was something that I would very much would have liked to see suggested as a fuzzy match. This is particularly true for matches consisting of consecutive words.

Here's a hypothetical example of such a no match that would have saved time and sanity:

Segment 1: Thinking of your experience with Company X over the past 7 days, please rate the following on a scale of 1 to 10:
Segment 2: Thinking of your experience with Company X over the past 7 days, and considering how the company's Y compares with that of other companies mentioned in question Z, please tell in your own words how satisfied you were with the following:

Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...

TOPIC STARTER

The answer

Nov 19, 2018

The helpdesk tells me the reason for this showing as a 50% match is because otherwise this 30% match wouldn't show up as a match at all. In other words, they're trying to help me, not rip me off. This seems reasonable.

The construction/instruction thing made me laugh/cry though

Lincoln Hui

Hong Kong
Local time: 18:03
Member
Chinese to English
+ ...

Fragments

Nov 19, 2018

Samuel Murray wrote:

I think we call them fragments rather than matches. Most CAT tools definitely use them, and I often wish that they are more robust in detecting them.

Endre Both

Pages in topic: [1 2] >

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Natalie	[Call to this topic]
Peter Zauner	[Call to this topic]
Prachya Mruetusatorn	[Call to this topic]

You can also contact site staff by submitting a support request »

When a 50% match isn't a 50% match?

Translation news related to CAT tools

» Memsource Sells to Carlyle: The Inside Story
(0 comments)
» memoQ 9.4: Turbo-Charging Productivity
(0 comments)
» The Future Of Work Now: The Computer-Assisted Translator And Lilt
(0 comments)

Submit translation news about CAT tools »
Read more translation news »

Forum rules

Help and orientation

TM-Town
Manage your TMs and Terms ... and boost your translation business Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work. More info »

Anycount & Translation Office 3000
Translation Office 3000 Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

When a 50% match isn't a 50% match?

When a 50% match isn't a 50% match?

You have native languages that can be verified

Your current localization setting

Select a language