Pages in topic:   < [1 2]
Internal fuzzy matching: WFP 3.4 x Trados 2011 x MemoQ 6.2
Thread poster: Samuel Murray
..... (X)
..... (X)
Local time: 19:50
Sounds like you want more transparency as well Sep 17, 2015

Hi Bernhard,

I don't work for an agency, I'm not starting an agency, I don't calculate rates or discounts, and I really have no skin in the game for fuzzy matches apart from some intellectual curiosity.


Bernhard Sulzer wrote:
What's the definition of a 85% match?


This was my one and only point. There are many potential ways you can do a fuzzy match calculation. All might come out to about the same percentage, but it would be nice to know how the tool you are using is calculating that number. If you reread my first post in this thread you will see I was arguing that CAT tool makers should have more transparency in how they calculate fuzzy matches (and IMO that calculation should be standardized across CAT tools to make it easier for translators to understand where the number is coming from).

I think you misinterpreted my post to be about discount tables (it was not). Discount rate tables should not be standardized. As Samuel mentioned there is a "degree of arbitrariness in the setting of discount percentages". So as translator you should decide what yours should be (and maybe it is 0% discount across the board). If the client doesn't agree and you can't find a happy medium then pass on the job, simple as that.


Samuel Murray wrote:
Each CAT tool will calculate the matches according to its own proprietary algorithm. That is why the tool whose match algorithm was used should always be mentioned, unless the translator is happy to bank on averages.

But note this: each tool that performs a matching will always yield the same match percentage for the same match, so therefore it can't be arbitrary (unless you're applying a special, private definition of "arbitrary"?).



Bernhard Sulzer wrote:
Why would there always be the same results if the algorithm is different?


As Samuel pointed out. Fuzzy match calculations are deterministic - given source segment A and source segment B, a CAT tool will calculate the same fuzzy match percentage every time. The point is that each CAT tool uses a slightly different algorithm. So for a given source segment A and source segment B, CAT tool #1 will always produce X% and CAT tool #2 will always produce Y% no matter how many times you run the calculation. X% and Y% might be the same, but might be different as CAT tool #1's algorithm is different from CAT tool #2's algorithm.

My point is let's decide on 1 standardized fuzzy match calculation algorithm so that all CAT tools produce the same result given the same the same source segment A and source segment B.

If you want more details on the factors that may influence a fuzzy match calculation (and cause CAT #1 to produce a different result from CAT tool #2) please read this.

Kevin


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 11:50
Member (2009)
Dutch to English
+ ...
Unscrupulous agencies are leaving homogeneity switch ON when counting these days Sep 17, 2015

John Fossey wrote:

MemoQ has a feature "Calculate homogeneity". This has something to do with internal matches and when it is checked on a document with internal matches you will get a lower score than if it is unchecked. I'm not too clear as to how it works, but try it with the feature checked and unchecked. I don't know how it compares with other CAT tools.

[Edited at 2015-09-17 12:28 GMT]


Funny you should mention memoQ's infamous homogeneity switch. I gave a Dutch agency a ‘1’ on their Blue Board just the other day for switching it on when counting. It can make a hell of a difference, and should be switched OFF at all times. In an ideal, robotic world (= project manager heaven), it might say something about how fast a certain ‘vendor’ (yes, that's what they actually called me. I'm a translator, not a bloomin' vendor!) will do a job, but in the real world (the one I live and work in) it is just more idiotic 1984 BS.

My advice is: if they switch it ON, dump 'em. More than enough fish in the LSPsea.

Michael


 
Bernhard Sulzer
Bernhard Sulzer  Identity Verified
United States
Local time: 06:50
English to German
+ ...
What's in a match? Sep 18, 2015

Kevin Dias wrote:

Hi Bernhard,

I don't work for an agency, I'm not starting an agency, I don't calculate rates or discounts, and I really have no skin in the game for fuzzy matches apart from some intellectual curiosity.


But you do talk about it on your website. See bottom of my post.


Bernhard Sulzer wrote:
What's the definition of a 85% match?


This was my one and only point. There are many potential ways you can do a fuzzy match calculation. All might come out to about the same percentage, but it would be nice to know how the tool you are using is calculating that number. If you reread my first post in this thread you will see I was arguing that CAT tool makers should have more transparency in how they calculate fuzzy matches (and IMO that calculation should be standardized across CAT tools to make it easier for translators to understand where the number is coming from). [/quote]


As always, I am trying to get to the bottom of things and when you say that "all might come out to about the same percentage," I will still need to ask: what exactly is that 85% match? And: what's the definition of a 85% match?

I understand you might not have a definition for that match. It's not enough to point to a general definition of fuzzy matching. I know how to find the definition but it doesn't say anything about what that really means and what algorithmic decisions are behind it and how a certain percentage holds a particular and never-changing value for translation.

See wikipedia:
https://en.wikipedia.org/wiki/Fuzzy_matching_(computer-assisted_translation)
"Fuzzy matching is a technique used in computer-assisted translation as a special case of record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations. It usually operates at sentence-level segments, but some translation technology allows matching at a phrasal level. It is used when the translator is working with translation memory (TM)." (Unquote)
------------------------------

85% match in all kinds of texts (technical, religious, marketing, law) - what does that percentage match mean? What does the 85% stand for exactly and why would it be a measure for translation discounts? It makes a difference also if we're talking about with or without previous TM.


Kevin Dias wrote:
Here's a definition of "fuzzy match":
I think you misinterpreted my post to be about discount tables (it was not). Discount rate tables should not be standardized. As Samuel mentioned there is a "degree of arbitrariness in the setting of discount percentages". So as translator you should decide what yours should be (and maybe it is 0% discount across the board). If the client doesn't agree and you can't find a happy medium then pass on the job, simple as that.

I don't see a definition here. This is about discounting for matches.
....


Bernhard Sulzer wrote:
Why would there always be the same results if the algorithm is different?


Kevin Dias wrote:
As Samuel pointed out. Fuzzy match calculations are deterministic - given source segment A and source segment B, a CAT tool will calculate the same fuzzy match percentage every time. The point is that each CAT tool uses a slightly different algorithm. So for a given source segment A and source segment B, CAT tool #1 will always produce X% and CAT tool #2 will always produce Y% no matter how many times you run the calculation. X% and Y% might be the same, but might be different as CAT tool #1's algorithm is different from CAT tool #2's algorithm.


I already understand that. The problem is that various different results are indeed used to demand discounts. And even if we would standardize the algorithm, what does that 85% or whatever percentage really mean?

Second problem is that certain agencies act as if it's the most honorable and logical thing to expect. It certainly is the opposite. The translator, as you rightly pointed out, is the one who decides about the price.

Even it fuzzy match percentages were (and they're NOT) a valuable tool to gauge the amount of work involved in using these matches in the new target text, it isn't logical to ask for an automatic or standardized discount per fuzzy word, no matter how many people are doing it in the industry.

Tools that are used to improve a translator's work (and holding a fuzzy match analysis in my face is not a tool by itself) shouldn't be used to make him/her less money. Faster delivery is an added bonus that should also be paid for.

Negative possibilities working with fuzzies abound as well: when people accept fuzzy matches and don't edit them enough, simply to get done faster, people ignoring the actual context of the current text and creating a patchwork of sentences rather then a homogeneous translation. Agencies pressing translators to hurry up and get done more quickly with much more work than ever before for far less money! N,o thank you. I know you're not proposing that.

But back to basics: what defines a 85% match? What does the 85% signify?
If we are honest, we must say that whatever matches are found and indeed usable and easily editable, you can't just discount away based on fuzzy match percentage suggestions. You need to look at the whole text in context.

Kevin Dias wrote:
My point is let's decide on 1 standardized fuzzy match calculation algorithm so that all CAT tools produce the same result given the same the same source segment A and source segment B.


I suggest let's first determine what is really meant by a certain match percentage before a one-and-only algorithm becomes the new golden cow on which discounts are hung.

You might find that you can't pin that definition down, the meaning of 85% will change from text to text,from field to field and from context to context. I am trying to show what a certain high-percentage match can actually really mean for the translation. As you will see, the "leverage' is very relative.

Here's an example from your own website that you linked to above and my discussion:
this

(Quote:)

A little background

With your favorite CAT or TEnT tool you can utilize fuzzy matches to get more leverage from your translation memory. If your tool only showed you exact matches (aka 100% matches) you would be missing out.
Take for example the following sentences:

John went to the store. (source segment)
Bob went to the store. (translation memory segment)
As you can see, except for the name highlighted in orange, the sentences are the same. By leveraging fuzzy matching in this example, a translator would only have to translate one word instead of the whole sentence. (Unquote)
------------------

Let's give this some context: The matches and their translations appear in boldface.

>>> Context 1: I don't see John around this afternoon. Where is he? John went to the store.
In German: John sehe ich heute nirgendwo. Wo ist er? John ist einkaufen gegangen.


>>>Context 2: Sorry I'm late. But here's the wrench Bob wanted. Oh, he's not here? Oh, he couldn't wait? What did he do? What? Bob went to the store. I see.

Translation: Entschuldigung. ich bin spät dran. Aber hier ist der Schraubenschlüssel, den John wollte. Oh, er ist nicht hier? Oh, er konnte nicht länger warten? Was hat er gemacht? Wie bitte? Bob kauft sich gerade einen. Ich verstehe.

Point is that a match is not always the same match in the new target segment, and 85% can be very misleading, or arbitrary. There is no guarantee to arrive at the same 85% match in the target language. In my example translations, I had to translate more than just the different name. I'm not saying that's always the case, but just because I get an 85% match of something doesn't necessarily mean my output is 85% easier, in some way faster, more appropriate, ..... or lends itself to being automatically discounted.

If a tool helps me with great suggestions for a new text segment, that's fine. It might make my job easier. It might. It might also be interesting to get a real value for what a 85% match really is. Yes, most likely more than a 45% match. But it's very relative. may I mention Heisenberg? But approximations if done with the right goal in mins are okay.
Automatic discounts for fuzzy (= unclear) data is not.

Thanks for talking with me, Kevin!

[Edited at 2015-09-18 00:56 GMT] edited for typo

[Edited at 2015-09-18 02:53 GMT]


 
..... (X)
..... (X)
Local time: 19:50
Knowledge is power Sep 18, 2015


Bernhard Sulzer wrote:
But you do talk about it on your website.


Yes, I do. I think it is very important for translators to understand about fuzzy matches and how they are calculated. The more educated translators are on the subject, the better they can understand when they are being taken advantage of. The better a translator understands fuzzy matches, the better they can evaluate a job where a discount table is proposed.

The problem is CAT tools all calculate fuzzy matches a little differently and none of them release their algorithms. Hence why I think there should be more transparency. So if you ask me what a 85% match is. I don't know. I have an idea of generally how that number is arrived at. However as I mention in the post, there are different factors that influence the calculation and where each CAT tool might take a slightly different approach:
- Word order
- Punctuation
- Stop words
- Partial substring matches
- Formatting and tags
- Matches longer than the source segment


Bernhard Sulzer wrote:
Point is that a match is not always the same match in the new target segment, and 85% can be very misleading, or arbitrary. There is no guarantee to arrive at the same 85% match in the target language.


This is a great point. Fuzzy match calculations are taking a segment from the source document you are translating and comparing it to a source segment in a translation memory. That's it. It says nothing about the target. It basically is saying, this source segment is 85% similar to this source segment from your TM. It is not saying that your job translating that segment will be 85% easier, or 85% faster. This is why I think education is important. If translators know what it means, what it is actually calculating, then they can also understand what it doesn't mean and what it is not calculating.

Each CAT tool having slightly different names and different calculations just makes things more difficult for translators to understand. I think this ambiguity gives the advantage to the agencies (and the unscrupulous agencies). If there were a transparent, standardized way to calculate a fuzzy match I think it would benefit translators. Just look at Michael's post about "homogeneity". Why have so many translators been taken advantage of by this? Well because it is not standard, not defined, and if translators can't understand it, they don't know that they should stand up and say it should be turned off.


Bernhard Sulzer wrote:
Thanks for talking with me, Kevin!


Likewise! It seems like you are saying that agencies should never ask for discounts. My point is that, pragmatically speaking, agencies are still going to ask - so my goal is to educate translators as much as possible so they can understand and make more intelligent choices when considering potential translation jobs. I think better transparency in fuzzy match calculations would benefit translators.

Kevin


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:50
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
@Bernhard Sep 18, 2015

Bernhard Sulzer wrote:
Samuel Murray wrote:
If it was arbitrary, then the tool would give say that a given segment's match against a given translation memory unit is e.g. 85% on one day, and then say it's a 90% on another day. No, these tools do not decide that something is an 85% by whim, but by calculation.

Why would there always be the same results if the algorithm is different?


The CAT tool won't change its algorithm from day to day. Yes, obviously if the algorithm is different, the match results will be different. If you use two CAT tools to analyse a text, you'll get two slightly different results. No-one is disputing that.

You seem to be wanting a universal 85%, but believe me, you're the only one trying to find it. Do you refuse to accept that fuzzy matching should be evaluated on a CAT by CAT basis, and not universally across tools as if all tools use the same definition of an "85% match"?

Surely, if the translator and the agency agree on a rate based on match statistics, then they should also agree on which tool is used to generate those statistics (unless either or both of them are satisfied that the calculations from different tools will average out over time).

Who decides about the algorithm and with what in mind?


The developer, obviously. He wants his product to be most useful to its users, so he'll make the algorithm as accurate as he can, based on his interpretation of the weights of the variables (and the degree to which fine-tuning has a sizeable effect).

That a 85% match (of what? I ask again) equates to 15% less work writing this text in the target language?


No, of course not. Only new translators who have never used CAT tools will think that (and even they will stop thinking that after they've used their CAT tools for a while).

Or that an 85% match of whatever in the source language equates to the same 85% match in the target language?


No, of course not. Anyone who is a language professional would know that.

An 85% match as in both source and target language?


No, of course not. An 85% match in the two segments' source texts will rarely be an 85% match in the two target texts. Anyone who thinks that it would (or should) doesn't understand how language works.

Define "match" - match for what?


Erm, err... well, since you insist: in distance-based systems (e.g. Levenstein matching), that would be an adjusted number that is based on the number of character overlaps in the two source text segments. Note that this is an adjusted number, i.e. the CAT tool developer acknowledges that simple mathematical distance scores do not apply to complex systems such as human languages, and therefore adjusts the score based on a variety of variables that are selected to the CAT tool developer in an attempt to make the score more relevant for CAT tool users.

If 7 words of 10 in a sentence match another sentence, we get 70% fuzzy.


Only in the most simple, underdeveloped CAT tools.

Does [a 70% match] mean that the work we have to execute on such sentence is 30% of the work we would have when translating from scratch?


No, of course not. Who on earth would think that?


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:50
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
Perhaps it is the % sign that confuses you, Bernhard Sep 18, 2015

Bernhard Sulzer wrote:
As always, I am trying to get to the bottom of things and when you say that "all might come out to about the same percentage," I will still need to ask: what exactly is that 85% match? And: what's the definition of a 85% match?


Having read that post, I'm beginning to suspect that you are being mislead by that percentage sign.

In many areas of life where things are described by a number (particularly if followed a percentage sign), the number is generally regarded as an accurate measurement of something. But in fuzzy matching, it is important to remember that it is fuzzy matching. Adding a percentage sign after the "85" is merely a convention, probably left over from the days when fuzzy matching was more mathematical and less practical.

The "85" (or 85%) is not an accurate indication of something. It is a fuzzy indication. Translators learn over time what it means to them in their subject fields and in their types of text and for their productivity if they get something called an "85% match".

By analogy, imagine you have to evaluate whether a given colour is more green or more blue. The paint store will analyse the colour and say "it's 85% blue". Every time they analyse the colour, their computer will say it's 85% blue. But for professional interior decorators, that percentage is simply a fuzzy indicator of the actual colour, and whether the colour will be suitable for a given room, or will be experienced by inhabitants of a house as "more blue" or "more green". A skilled decorator will eventually learn what is the relation between the paint store computer's percentages and the amount of blue or green colourfulness is added to the buildings that they typically decorate.

That is also why discount categories are not tied to individual percentages but to broad ranges of percentages. It is simply for practical reasons that match discount categories do not have soft boundaries (e.g. "roughly 85% to roughly 95%" instead of "85-95%").

It is a mistake (a beginner's mistake) to think of fuzzy match percentages as "numbers". Rather, imagine matches like a hot-to-cold colour spectrum, and instead of seeing numbers next to the matches, seeing a colour ranging from blue (cold, little match) to red (hot, big match). The reason we use numbers is simply because it's easier to do calculations with numbers. We all know that language is not numbers.

==

Kevin Dias wrote:
IMO that calculation should be standardized across CAT tools to make it easier for translators to understand where the number is coming from.
...
My point is let's decide on 1 standardized fuzzy match calculation algorithm so that all CAT tools produce the same result given the same the same source segment A and source segment B.


This would be a bad idea, because then all translators would have to relearn how to interpret the numbers given by their CAT tools. This ties in with what I wrote above in this post -- the match percentage is not really a number: it is a hazy indicator that the translator learns to "feel". And it is based on those feelings that translators decide whether discount categories are fair or unfair.

I recall a time in WFC's update history when there was dramatic change in WFC's fuzzy match algorithm. It was touted as "more accurate" or "more useful", but many long-time users found it less useful... because long-time users had learnt how to interpret those percentages. After the update, they could no longer rely on the fuzzy match statistics as they used to (because the numbers now meant different things), and they had to relearn what those numbers mean (and learning takes time, because it is based on experience, not mathematics).

In addition, the different algorithms have different strengths and weaknesses.

==

Kevin Dias wrote:
Each CAT tool having slightly different names and different calculations just makes things more difficult for translators to understand. I think this ambiguity gives the advantage to the agencies (and the unscrupulous agencies).


You do have a point there. Agencies use a variety of CAT tools, and translators often don't have access to all the materials that are used to calculate the fuzzy match statistics. It becomes more difficult, therefore, for a translator to know how similar is one agency's 85% match to another agency's 85% match.

Still, I think translators who work for agencies over a longer period of time eventually learn how to interpret it, and whether the rate needs to be adjusted upwards.

If there were a transparent, standardized way to calculate a fuzzy match I think it would benefit translators.


I don't think it would. Sure, it would be interesting to see what factors are taken into account, but knowing what those factors are and what weights are assigned to them will not help the translator know what effect those variable will have on the amount of work that is required for e.g. an 85% match.

If I knew that tool A uses a list of 20 stop words and applies a penalty for sentence length worth 5% for each 10% of difference, and I know that tool B uses two category-based lists of 10 stop words each and applies a penalty for sentence length worth 6% for each 8% of difference, it would not help me to know whether tool A or tool B will result in more favourable matches for me, and it will not help me know how much more productive I'd be.

I'm a translator, not a post-graduate mathematician.


[Edited at 2015-09-18 08:30 GMT]


 
..... (X)
..... (X)
Local time: 19:50
@Samuel Sep 18, 2015


Samuel Murray wrote:
This would be a bad idea, because then all translators would have to relearn how to interpret the numbers given by their CAT tools. This ties in with what I wrote above in this post -- the match percentage is not really a number: it is a hazy indicator that the translator learns to "feel". And it is based on those feelings that translators decide whether discount categories are fair or unfair.

I recall a time in WFC's update history when there was dramatic change in WFC's fuzzy match algorithm. It was touted as "more accurate" or "more useful", but many long-time users found it less useful... because long-time users had learnt how to interpret those percentages. After the update, they could no longer rely on the fuzzy match statistics as they used to (because the numbers now meant different things), and they had to relearn what those numbers mean (and learning takes time, because it is based on experience, not mathematics).

In addition, the different algorithms have different strengths and weaknesses.


I agree, I think the number itself has no real meaning, but becomes a "feel" with experience. However, what about new translators? Having to learn the feel of 85% in 2, 3, or 4 different CAT tools is a tough ask and leads to a lot of unnecessary confusion. This is why I think a standardized formula would benefit translators. Then you can learn the "feel" of 85% once and it doesn't matter what CAT tool you are using.

Of course for seasoned translators and long-time users such as yourself who have already learned the feel of 85% across different CAT tools it would be a temporary negative, but I think overall it would be a long term benefit for translators.

You are correct though that actually knowing how a fuzzy match is calculated will have no meaning to 99% of translators, but I think transparency into the current calculations CAT tools use would be the first step toward a standardized formula across the industry.

Kevin


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:50
Member (2006)
English to Afrikaans
+ ...
TOPIC STARTER
New translators, and old translators using new tools Sep 18, 2015

Kevin Dias wrote:
I agree, I think the number itself has no real meaning, but becomes a "feel" with experience. However, what about new translators? Having to learn the feel of 85% in 2, 3, or 4 different CAT tools is a tough ask...


Yes, new translators are at the mercy of agencies (or clients). They must learn that some things that seem obvious are obvious, and some things that seem obvious are not.

Another point to keep in mind is that the fuzzy match percentage is but one variable in how productive you can be in a CAT tool. Even if all CAT tools do matching in the same way, translators will have different efficiency in different tools. For example: my efficiency in Trados 2015 is much lower than my efficiency in WFC, so an 85% match in a job that absolutely requires Trados would be of a lower "efficiency ranking" than an identically calculated 85% match in WFC, and it would take me longer to translate it, despite having an identical match that was calculated in the identical way.


 
Bernhard Sulzer
Bernhard Sulzer  Identity Verified
United States
Local time: 06:50
English to German
+ ...
Not confused Sep 18, 2015

Samuel Murray wrote:

Having read that post, I'm beginning to suspect that you are being mislead by that percentage sign.

In many areas of life where things are described by a number (particularly if followed a percentage sign), the number is generally regarded as an accurate measurement of something. But in fuzzy matching, it is important to remember that it is fuzzy matching. Adding a percentage sign after the "85" is merely a convention, probably left over from the days when fuzzy matching was more mathematical and less practical.

The "85" (or 85%) is not an accurate indication of something. It is a fuzzy indication. Translators learn over time what it means to them in their subject fields and in their types of text and for their productivity if they get something called an "85% match".


It doesn't take a math whizz to realize that if something isn't exact that you shouldn't treat it as exact.
However, for certain agencies and even translators that % sign is there to justify the discount percentages - this whole scheme of duping people into discounting depends on making these word analyses look like an exact method of pinpointing exact discount percentages. Newcomers aren't asked what they think about it, they are presented with the analysis and the discounted word percentages already ready to be accepted by them - as if it's the most logical and honest thing to do.

Especially with "fuzzy" matching, the word means "unclear", and I can tell you from experience that any kind of match is not an exact and often not even an approximate measure for how this helps or doesn't help with translation. It all depends on many other variables in the translation process. As you know. Now you're going to repeat that 85% can mean different things in different texts and fields and that you have to develop a feel for it.

Justifyng set discount percentages on feelings that can be vastly different between us is simply wrong.

Samuel Murray wrote:
By analogy, imagine you have to evaluate whether a given colour is more green or more blue. The paint store will analyse the colour and say "it's 85% blue". Every time they analyse the colour, their computer will say it's 85% blue. But for professional interior decorators, that percentage is simply a fuzzy indicator of the actual colour, and whether the colour will be suitable for a given room, or will be experienced by inhabitants of a house as "more blue" or "more green". A skilled decorator will eventually learn what is the relation between the paint store computer's percentages and the amount of blue or green colourfulness is added to the buildings that they typically decorate.


Unfortunately, that isn't a good analogy for the word analysis. It's not an absolute 100% blue/template text that we could measure that 85% segment against. And the 85% itself refers to how a particular string appeared before and now seems to match the new string. But it doesn't mean that the new context would even justify the number 85%. The surrounding context can be vastly different and so an 85% is often not at all an 85% taking into account context of very different fields (granted, if you're in the same field, it's more likely there is an approximation). But you can't "feel" that from the results of a machine's word analysis. Do you really believe that?

Samuel Murray wrote:
That is also why discount categories are not tied to individual percentages but to broad ranges of percentages. It is simply for practical reasons that match discount categories do not have soft boundaries (e.g. "roughly 85% to roughly 95%" instead of "85-95%").


So we're replacing a number that isn't really exact (85%) with a range (85-95%) and then assign to it one specific discounted rate.

Samuel Murray wrote:
It is a mistake (a beginner's mistake) to think of fuzzy match percentages as "numbers". Rather, imagine matches like a hot-to-cold colour spectrum, and instead of seeing numbers next to the matches, seeing a colour ranging from blue (cold, little match) to red (hot, big match). The reason we use numbers is simply because it's easier to do calculations with numbers. We all know that language is not numbers.


Numbers - that's what fuzzy matches are treated as, numbers, percentages. And we deal in words, not color hues. This is getting really hazy, I'm sorry.

Samuel Murray wrote:
Still, I think translators who work for agencies over a longer period of time eventually learn how to interpret it, and whether the rate needs to be adjusted upwards.


You're doing it again. You're justifying giving/adjusting discounts for certain unclear segment matches. Would you at least consider that what you use to achieve an excellent translation is your business, and that you are able to use a CAT tool in certain situations doesn't justify automatic discounts. Same goes for analyses. Maybe the analyses help you get an idea that there are possibly similar-"looking" segments or segmentt parts - but they are not necessarily similar in meaning or appearance (verb endings, case endings, even tenses, ...) at all because of varying context and language rules?!

Samuel Murray wrote:
If I knew that tool A uses a list of 20 stop words and applies a penalty for sentence length worth 5% for each 10% of difference, and I know that tool B uses two category-based lists of 10 stop words each and applies a penalty for sentence length worth 6% for each 8% of difference, it would not help me to know whether tool A or tool B will result in more favourable matches for me, and it will not help me know how much more productive I'd be.

I'm a translator, not a post-graduate mathematician.


Excellent point. We're translators. We know about the real value of words and repetitions.

[Edited at 2015-09-18 14:31 GMT]

[Edited at 2015-09-18 15:33 GMT]


 
Bernhard Sulzer
Bernhard Sulzer  Identity Verified
United States
Local time: 06:50
English to German
+ ...
Not feeling it Sep 18, 2015

Kevin Dias wrote:
IMO that calculation should be standardized across CAT tools to make it easier for translators to understand where the number is coming from.
...
My point is let's decide on 1 standardized fuzzy match calculation algorithm so that all CAT tools produce the same result given the same the same source segment A and source segment B.



[quote]Samuel Murray wrote:
This would be a bad idea, because then all translators would have to relearn how to interpret the numbers given by their CAT tools. This ties in with what I wrote above in this post -- the match percentage is not really a number: it is a hazy indicator that the translator learns to "feel". And it is based on those feelings that translators decide whether discount categories are fair or unfair.

I'm not feeling it.


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Internal fuzzy matching: WFP 3.4 x Trados 2011 x MemoQ 6.2







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »