Pages in topic:   < [1 2 3 4 5 6 7] >
'[MT] is most often used alongside [TM] as an adjunct to human translation'. Are you using it?
Thread poster: Henry Dotterer
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
terminology is a key for using most MT systems Oct 17, 2009

Russell Jones wrote:
I am currently trying it for the first time (for many years) - only Google, nothing fancy.
...
My only gripe so far is that there is no consistency of terminology; an excellent idea in one segment is replaced with something literal or banale in the next.




Russell:

This is exactly why I don't use Google Translate for any languages which I know and work with, and have MT software packages for.
Being able to adjust and customize terminology (and thus improve the consistent translation output of that terminology) is one of the key factors to how I have successfully used in all projects thus far, which includes my own personal work, but also deploying and training teams on using various MT software/systems in team sizes from 1 person up through dozens of in-house translators and external translation agencies, and all kinds of hybrid environments.

see:
MT preparation steps
http://www.proz.com/post/1123567#1123567

and higher in this thread I referred to another of my single posts that explain why Google has terminology problems, and will continue to have it. It's the way that type of MT system functions. See the post at:
http://www.proz.com/post/998639#998639

Jeff

[Edited at 2009-10-17 16:00 GMT]


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
be careful to put MT vs TM (or CAT) in customer contracts Oct 17, 2009

B D Finch wrote:
I am concerned about the ethics of using MT when presenting oneself to the Client as a human translator. I have recently put in an application to be on a panel of translators for a Regional Chambre de métiers and their tender document specified that use of MT was not acceptable. As they asked for detail of translation methodology, I thought it best to explain the difference between CAT and MT. Unfortunately, it sounds as though this may be becoming more blurred.

BDF



BFD:

A big word of caution here.....

Any attempt to claim TM (or CAT in general) and rule out MT within any legal contract with a customer is a potential source of legal problems later on.

Translation Memory is just a commercially derived catchy name for one of the several already existing MT types/methods.
Be careful not to mix up MT types and who is actually offering the software/systems that correspond to those types.
I've explained the different MT types (types, approaches) in several other posts here on ProZ (below). Any customer who is aware of this relationship could come against you later for using TM as an MT-based method.

My posts below aim at clarifying this, and I am certainly willing to answer any further questions on this topic:

combining MT and TM
http://www.proz.com/post/213153#213153

Merging of rule, example and statistics based MT
http://www.proz.com/post/453752#453752

the mixing of TM and MT technologies
http://www.proz.com/post/1000970#1000970

clarification about why TM is a daughter of Example-based MT (EBMT)
http://www.proz.com/post/492418#492418

posts describing combining MT and TM
http://www.proz.com/post/275366#275366

It is even possible to argue that if you use the search and replace functions (a basic concept of TM) in MS Word, then you could be liable for utilizing an MT method.

The point to highlight with customers should be the quality of the delivered "product" (the translation) and the "processes" and methods used to ensure that quality.
It should not be the tools (computer, dictaphone, typewriter, brain, voice, fingers, software X, hardware Y, etc) used as the instruments to generate, collect, match, filter, etc the intermediate version and final versions of the translated/localized product.
You need to deliver what is expected in terms of expressed/agreed upon level of quality, format, following certain terminology, using certain style guidelines, etc that is requested and required.

If you promise a draft translation and use MT + very minor postediting, and that corresponds to the request, then it should not hinder you from meeting that need.
If you promise a high quality translation and use standard Translate/Edit/Proof cycle, or you use an MT dictionary pre-processing/MT processing/interactive dictionary update/postediting and 2nd level postediting phase, or even crowdsourcing methods, and you can achieve the expected deliverable, then how you do it is your concern. Just make sure you can demonstrate your process and how the resulting deliverable corresponds to the agreed upon expectatations.

Jeff


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
MT usefulness needs overall view of variety of MT products Oct 17, 2009

gianfranco wrote:
From my experience using a commercial product, a PRO version with customizable glossaries, MT technology can be used probably with some long term benefit only if the following conditions are met:


Jeff Allen wrote:
From what you indicate, your experience is limited to 1 specific MT system, and only in large enterprise context.


gianfranco wrote:
for some reason, MT makes me think ....


Gianfranco,

It is good to have some discussion from someone like yourself who has also had experience in working with MT. Hands-on use does make the discussion more concrete.

I am just concerned about a blanket generalization statement about MT when you have stated that your experience is on "a PRO version" of "a commercial product", which I interpret to mean that this use is limited to a single model of one MT brand (possibly different versions of that model). Your 1-sentence description indicates it likely to be a Rule-based MT based system. (And yet that is just one of several different MT system types).

Maybe I am wrong, and please do correct me if I have misinterpreted it. Your Proz profile does seem to give an indication of which MT product it is.

My own experience over the past 15 years in closely working with, throughly testing, releasing, deploying, and actually using the following:
- all MT system types (rule-based, knowledge-based, example-based/TM, stat-based, hybrid, etc)
- including several different product brands (Reverso, PROMT, SYSTRAN, LEC MagellanPro, Globalink, CMU KANT, PANGLOSS/DIPLOMAT)
- a range of models per brand totalling 30+ different MT product models which cover commercial off-the-shelf, Government off-the-shelf, corporate customized, industry research, etc
- implementing them in small-medium business up through large corporate deployment contexts
- published MT software product reviews aiming to be objective, and as thorough as possible. Each takes 40-80 hours, including official bug reports, discussions with MT vendor on issues, and write-up of the review with discussions with MT vendor. Some conclusions state where and how the product is limited, and where stated features do not work as claimed.
- using several different MT products on many of specialized subject areas and general domains

And there are several MT products I still have not tested or used, but have heard are good to consider.

Experience on 1 specific system of 1 MT type is not a problem, as this is the case for the majority of MT users who usually focus on a given product implementation. They get to know that product very well. Yet, I would hesitate to make an evaluation statement about MT in general based on only taking the perspective of 1 specific product model.

Not all MT products can meet all needs (content gisting and translation publication) for all languages in off-the-shelf products. It is often necessary to adapt one's workflow more or less depending on the selection of a given product.

Some factors in choosing one or more MT products:
- pros and cons of each MT system
- each product range varies in usability functionality (cross-product functionality as well as intra-product range functionality -- from low to high end products)
- technology integration with other CAT and desktop publishing products
- some MT vendors focus on 1 language pair, others on just 2-3, others on maybe 10 languages total, and some on hundreds.
- a trend between language pair coverage and usability; those MT vendors who focus closely on a very small set of languages also tend to have better usability and functionality for non-gisting production contexts, sometimes more tuned for handling the linguistic issues of their specific source language and target language audiences.
- budget: most customers (corporate and freelancers) do not want to spend the money to purchase multiple MT software/systems. So, the users want 1 product to meet all their needs for all language pairs (not quite possible)
- translation output quality approach: some MT vendors focus on the content-gisting audience, and other vendors focus on the translation publication audience. And each newly created language pair must first achieve content-gisting levels and progressively improve for translation publication needs.
- language direction maturity: Languages with 15+ years of development will show better content gisting and translation publication quality than languages with 1-2 years of on-the-market history. Nearly all MT vendors have some language pairs with 10+ year history and other languages developed more recently.
- my content gisting vs translation publication pairs are different from yours, and these are different from all other MT users.
- need for training (and multiple training in the cases of multiple systems)
- marketing/sales of MT vendors: Market need influences linguistic investment. MT vendors invest in market needs where there are potential sales. the above forces MT vendors to focus on language pair coverage

The MT users who are most successful are those who have carefully identified their needs, their workflow, their expectations, and take the time to evaluate the options and what can correspond best to their needs, and where they can also adapt to the MT systems and make compromises.

The ideal implementation would be multiple MT systems based on the best per language direction, but that would require different types of training for users per product. The technical and human factors ramifications are complex, and I've only seen one corporate user do that. Everybody else wants the simple, single solution, but need to evaluate any to all of the above depending on their specific context and needs.

Jeff


 
Vito Smolej
Vito Smolej
Germany
Local time: 10:57
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
Is it MT? Or is it TM? Oct 17, 2009

Here's a point along the line of Jeff's statement "Translation Memory is just a commercially derived catchy name for one of the several already existing MT types/methods." Actually - my opinion - it is (also / at least partly / somehow?...) the other way around.

Here's two cases involving Google Translate.

Eurolex is a nice multilingual collection of European legislation. It is in a TMX format and contains close to a million segments in pretty much all the languages,
... See more
Here's a point along the line of Jeff's statement "Translation Memory is just a commercially derived catchy name for one of the several already existing MT types/methods." Actually - my opinion - it is (also / at least partly / somehow?...) the other way around.

Here's two cases involving Google Translate.

Eurolex is a nice multilingual collection of European legislation. It is in a TMX format and contains close to a million segments in pretty much all the languages, spoken in the union. Now and again people ask on ProZ, where to download it, how to unpack and add it to TM, is there a newer version etc etc. It is good quality, official stuff.

Enter MT - because, as far as the general opinion goes, Google Translate IS an MT tool. If I enter some segment from EuroLex into GT, for instance

laying down principles and detailed guidelines for good clinical practice as regards investigational medicinal products for human use, as well as the requirements for authorisation of the manufacturing or importation of such products

GT will provide the following suggestion for Slovenian - identical to the TMX contents and the official/legal wording:

o načelih in podrobnih smernicah za dobro klinično prakso v zvezi z zdravili za uporabo v humani medicini, kakor tudi zahteve za pridobitev dovoljenja za proizvodnjo ali uvoz takšnih izdelkov

(you may wish to try your own language pair).

Let's try something different:

...Some years ago -- never mind how long precisely -- having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

in Spanish(*):

... Hace algunos años - no importa cuánto tiempo exactamente - tiene poco o ningún dinero en mi bolsillo, y nada de particular que me interesara en tierra, pensé que iba a navegar sobre un poco y ver la parte acuosa del mundo.

No need to climb the grammar trees, avoid false friends ... Just associate (visit for instance http://www.mgar.net/docs/melville.htm - Google has probably scanned the whole Moby Dick anyhow, in English, Spanish etc etc).

It this MT? Some may feel like quoting John Lennon: "I think, er, no, I mean, er, yes, but it's all wrong / That is, I think I disagree".

Fact is though, Google has done exactly what we have been doing since our first CAT tool with its TM memorizing our translations: it has been compiling material into something that from a certain perspective can be seen and used as one single, humungous TM. Of course GT is more than that (see their FAQ). But at some pre-translational level, it is as useful as any other TM. We all know now, for instance, that we will not need our EuroLex TMs anymore.

regards

Vito
*: Somehow GT succeeded to get rid of Call me Ismael / Llamadme Ismael. Nothing is perfect (for the time being).
Collapse


 
Amy Duncan (X)
Amy Duncan (X)  Identity Verified
Brazil
Local time: 06:57
Portuguese to English
+ ...
Oddly, I have sometimes found MT useful Oct 18, 2009

I never used MT until quite recently. I was working on a literary project and decided to throw a paragraph into Google Translate for fun. I was surprised at the results. Even though it needed corrections, it was useful to me because it came up with some ways of stating things and some word choices I wouldn't have thought of that were actually better than the ones I did think of.

I tried GT again on another project (also literary), and the results were awful, so I guess it really dep
... See more
I never used MT until quite recently. I was working on a literary project and decided to throw a paragraph into Google Translate for fun. I was surprised at the results. Even though it needed corrections, it was useful to me because it came up with some ways of stating things and some word choices I wouldn't have thought of that were actually better than the ones I did think of.

I tried GT again on another project (also literary), and the results were awful, so I guess it really depends on the text and it's impossible to make generalizations about what kind of text it handles best.
Collapse


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
simply different MT types which now create hybrid/multi-engine MT Oct 18, 2009

VitoSmolej wrote:

Here's a point along the line of Jeff's statement "Translation Memory is just a commercially derived catchy name for one of the several already existing MT types/methods." Actually - my opinion - it is (also / at least partly / somehow?...) the other way around.

Here's two cases involving Google Translate ... Eurolex ... Enter MT - because, as far as the general opinion goes, Google Translate IS an MT tool. If I enter some segment from EuroLex into GT, for instance ... GT will provide the following suggestion for Slovenian - identical to the TMX contents and the official/legal wording: ...

Fact is though, Google has done exactly what we have been doing since our first CAT tool with its TM memorizing our translations: it has been compiling material into something that from a certain perspective can be seen and used as one single, humungous TM. Of course GT is more than that (see their FAQ). But at some pre-translational level, it is as useful as any other TM. We all know now, for instance, that we will not need our EuroLex TMs anymore.


Thanks Vito for the comments.

TM has always been a variant of MT, based on what is called Example based MT (Nagao, 1984). See posts:

MT is the parent of TM
http://www.proz.com/post/440750#440750

clarification about why TM is a daughter of Example-based MT (EBMT)
http://www.proz.com/post/492418#492418


The term CAT was already being used in the early 1990s (Hutchins, 1992) to cover both:

Machine Aided/Assisted Human Translation (MAHT) & Human Aided/Assisted Machine Translation (HAMT). See posts:

Definition of CAT
http://www.proz.com/post/184880#184880

the merging of CAT, MT, MAHT, HAMT, etc
http://www.proz.com/post/328685#328685

CAT, MT, TM: lesquels sont des outils de traduction
http://www.proz.com/post/192653#192653

Translation memory product vendors started appearing in the 1990s. see:

http://www.mt-archive.info/LIM-1992-11-3.pdf

http://www.jostrans.org/issue04/art_garcia.pdf


Hybrid/Multi-engine MT (MEMT) is not a new concept. The mixing the different types of MT approaches is explained in the following, which include a description of how Google and other statistical MT systems as well as rule-based MT systems can specifically use example-based MT (ie, TMs).

Merging of rule, example and statistics based MT
http://www.proz.com/post/453752#453752

the mixing of TM and MT technologies
http://www.proz.com/post/1000970#1000970

statistical MT approach + TMs
http://www.proz.com/post/998639#998639

It's simply that TM became the buzz word in the late 90s, and were considered by the professional translation community to be different from MT because TM vendors were separate from Rule-based MT vendors. Yet the industrial research community had already invented and used the TM concept almost a decade earlier. It has only been during the last few years that the exact terms (RBMT, EBMT, SBMT, KBMT) have been commonly used in professional translation-related forum discussions, in order to clarify the method being used.

Jeff


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:57
Member (2006)
English to Afrikaans
+ ...
Whether TM is MT Oct 18, 2009

Jeff Allen wrote:
TM has always been a variant of MT, based on what is called Example based MT.


I understand where you're coming from, Jeff, and I have no doubt that many MT system use TM, and I also have no doubt that TM was developed by people from the MT industry, and that the reason they developed it was to produce better MT. Many things have had their origins in other things, but that doesn't mean the one is always a variant of the other.

A computer programmer who wants to create a TM system doesn't have to know anything about MT (except the parts that also apply to TM). Users of TM doesn't need to know anything about MT either. Someone who is highly skilled in MT will not necessarily make a good TM user. The fact that TM is a variant of MT is purely academic. Things have moved ahead since Nagao wrote what he wrote in 1984.

As you said yourself, simple find/replace operations could (by a long shot) be seen as a type of MT. I'm sure one can classify things in various ways -- and although it would be perfectly valid to classify TM and MT in terms of their historical development, I ask whether it is the most useful way of classifying them.

Historically speaking, translation studies is a subsection of semantics which is a subsection of general linguistics, but these days you don't need to study general linguistics to become a good translator. And in fact, it would be a misconception to say that knowledge of general linguistics will make you a good translator. From an academic point of view one can classify these two things as one being a variant of the other, but that classification has no practical benefit. So too with TM and MT.

Historically speaking, astronomy is a variant or subfield of astrology. The parent technology is currently regarded as hocus pocus. Historically, electricity is a subfield of magnetism, but these days you won't learn about electricity from the magnetism professor. I wonder if the same can't be said of MT and TM. After all, a candidate who studied TM would be lying on his résumé if he said that he had studied MT (even if he tried to point out that MT is the genus of which TM is a species).


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
context of warning about TM and MT Oct 19, 2009

Jeff Allen wrote:
TM has always been a variant of MT, based on what is called Example based MT.


Samuel Murray wrote:
... I have no doubt that many MT system use TM, and I also have no doubt that TM was developed by people from the MT industry, and that the reason they developed it was to produce better MT. Many things have had their origins in other things, but that doesn't mean the one is always a variant of the other.

A computer programmer who wants to create a TM system doesn't have to know anything about MT (except the parts that also apply to TM). Users of TM doesn't need to know anything about MT either. Someone who is highly skilled in MT will not necessarily make a good TM user. The fact that TM is a variant of MT is purely academic. Things have moved ahead since Nagao wrote what he wrote in 1984 ...
After all, a candidate who studied TM would be lying on his résumé if he said that he had studied MT (even if he tried to point out that MT is the genus of which TM is a species).


Thanks Samuel for the comments.

The context of the statements in the few posts above in this thread on TM & MT was not about a CV for someone to get a job, nor about translation software marketing techniques. It was concerning potential legal ramifications of making a simple statement of MT vs CAT in writing in a legal contract with a customer.

Now in re-reading BFD's post, I see that her post was more about application materials to be part of a panel:
B D Finch wrote:
I am concerned about the ethics of using MT when presenting oneself to the Client as a human translator. I have recently put in an application to be on a panel of translators for a Regional Chambre de métiers and their tender document specified that use of MT was not acceptable. As they asked for detail of translation methodology, I thought it best to explain the difference between CAT and MT. Unfortunately, it sounds as though this may be becoming more blurred.
BDF


Sorry BFD about that. Yet, I know that "I don't do MT" statements also can go as far as legal contracts.

The significant amount of evidence in copyrighted articles in peer reviewed internationally recognized publications within the translation industry, would probably stand more ground than any individual claims about software categorization, or even those of a local or regional association.

There is an increasing hybridization of translation technologies which muddy the waters: MT vendors creating internal MT modules, creating TM plug-ins, allowing for TMs to be attached to the system like dictionaries, the more recent Stat-based MT systems which can be trained on TMs, TM vendors adding functions and links to online MT portals and MT software, TM vendors adding MT-like features to generate translations for the under-threshold fuzzies, etc.

If someone really wants to make a statement, then it might be wise to be quite specific (eg, not using rule-based, knowledge-based or primarily statistical-based MT systems).

And if BFD can explain the history of the blurring technologies to the Chambre de Metiers jury, a possibly even better chance of getting on the panel with knowing the topic, where the field came from, where it is now, and where it is moving.

Differentiating TM from MT in translation technology marketing materials is another story.

Jeff


[Edited at 2009-10-19 05:23 GMT]


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 05:57
Russian to English
+ ...
@Jeff - can we take this back a few steps? Nov 13, 2009

Jeff Allen wrote:

@Susan, unfortunately your statement below is not quite true.

Jeff


Well, Jeff, I have read and learned a lot (mostly from your cited articles) since you responded to this ill-informed post of mine some months back, but I must confess that the more I learn about some aspects, the more baffled I become about others.

You say in one of your posts that people need to consider their specific needs, and which software will work best for them. Aye, there's the rub.

I've looked at the PROMT and SYSTRAN sites, Wikipedia pages, etc., but can't find much for an MT beginner and non-techie to go by. (Systran, as far as I can ascertain from their website, does not even offer a free trial on their products.)

It would be great to find a product-by-product comparison of the most popular systems that allow you to edit the dictionary (i.e., not Google Translate)--but I haven't found anything like that. For example: 1) are there multiplatform systems? I mainly use Linux, although I have Windows in a Virtual Box. 2) Moses seems to be free and open source, but when I go to the Moses site, I can hardly understand a word that's written there. 3) Promt has the Russian I need (of course, since it's Russian in origin), but it says the dictionaries provided are just for Internet and Tourism, and other dictionaries have to be purchased separately. Is this for real? What use is that? Most translations are in the financial and legal fields, so far as I know. I need those, plus scientific terms (e.g., psychology). 4) From both Promt and Systran, I get the idea that "only the best (most expensive) will do." But maybe that's overkill (it would certainly kill my bank account, esp. since I have two language pairs). For the top-of-the-line products of both companies, do you get a lot of capabilities that the ordinary user is going to have no use for? As I said, "all" I want to do is to be able to edit the dictionary, so the machine doesn't make the same mistakes over and over again.

Thanks for your seemingly endless willingness to write long and informative posts on these matters!

Susan


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
1 step back and 2 steps forward to explain MT dictionaries Nov 21, 2009

Hi Susan,



I've looked at the PROMT and SYSTRAN sites, Wikipedia
pages, etc., but can't find much for an MT beginner and
non-techie to go by. (Systran, as far as I can ascertain
from their website, does not even offer a free trial on
their products.)


Forget wikipedia for this kind of thing. You won't find this there, not yet at least.

And as for the 2 vendor websites you've mentioned, it's likely not there either. That's why I've written a number of articles/papers and many forum posts on the topic in order to fill the void.



It would be great to find a product-by-product comparison
of the most popular systems that allow you to edit the
dictionary (i.e., not Google Translate)--but I haven't found
anything like that.


- there is the Translation Software Compendium which is an ongoing inventory of all translation-related software/systems
- however, you are looking for a Consumer Reports of translation systems. It simply doesn't exist.

The serious software reviews take significant effort to conduct. Each one I do requires about 40+ hours because they take the same approach, are tested on multiple platforms, use the same sets of data, and test various technical, functional and other aspects. The basic reviews by technology columnists in general magazines usually only cover basic functionality, and focus on the home/office versions of such products. They never dive into the powerful functionality of term extraction, linguistic part-of-speech categorization (called dictionary entry coding) that are must-have features for multilingual terminology/lexicology experts, and professional translators.

There were attempts to create organizations to perform such activities but either they did not last, or they were not funded. So now just individual reviews are conducted and published in various magazines.
This is why I created the Language Technology Evaluation (langtecheval) website in 2003 on Geocities, which lists all known software reviews per product (some dating back even 10-15 years). Geocities closed down 2 weeks ago, so I've just uploaded the last version to my LinkedIn profile

LinkedIn profile
http://www.linkedin.com/in/jeffallen
in Files>Software Reviews

I haven't had time to do a major update of the content, fix the broken links to those on www.multilingual.com (they are all still there, and can easily be found by doing a search on the site on the author name), and republish it elsewhere, so all I've done for now is make the last version of that website html page (v28 from Aug 2005) available on my linkedin profile under files/software reviews/langtecheval

So it is up to the users to read any and all existing reviews, along with posts like this, and make their decision.

Also, there is much less activity on MT topic in the online discussion forums (including several MT user groups on Yahoo), but there is some increase over time.



For example: 1) are there multiplatform systems? I mainly use Linux, although I have Windows in a Virtual Box.


This depends on the level of the system.

mobile applications:
* at least 2 or 3 brands have Mobile translators that can be installed on Pocket PC, and other Mobile hardware platforms. I published a review of PocketPROMT v4 (http://www.multilingual.com/articleDetail.php?id=702) and have also conducted extensive testing of Systran mobile v5.

desktop/laptop applications:
* Everything is pretty much Windows-based
* Reverso Pro 4 (PROMT engine and interface) did have a Macintosh version. It was not continued to v5 and PROMT does not seem to have continued this platform support for v6 and beyond.

Enterprise-level client-server systems:
* Linux is available yes, but only for server based systems. Both Systran and PROMT have this. But way out of your budget and your need.


2) Moses seems to be free and open source, but when I go to the Moses site, I can hardly understand a word that's written there.


This doesn't surprise me. Statistics-based MT systems have always been created by computational linguistics experts, and as there hasn't been much software interace to deal with due to it requiring more technical experts to process the content, that is the audience that Moses is addressing, the IT technicians. Only now are some of the commercial Stat-based systems starting to try and change this. And open-source is even more prone to being simply technical and talking to IT guys.
You would have the same problem with OpenLogos. I'm subscribed to the list, and each time there is a request, it's focused on the incompatibility of the install with 64-bit operating systems, and the like.



3) Promt has the Russian I need (of
course, since it's Russian in origin), but it says the
dictionaries provided are just for Internet and Tourism, and
other dictionaries have to be purchased separately. Is this
for real? What use is that? Most translations are in the
financial and legal fields, so far as I know. I need those,
plus scientific terms (e.g., psychology).


For several types of rule-based MT software (including SYSTRAN and PROMT), there are often 3 different types of dictionaries:
1) internal dictionary (always created and delivered with any rule-based MT system). These are the default dictionaries which the system requires to be able to generate a translation. Systems come with such a internal dictionary (general and multi-domain) as a standard deliverable. Internal dictionaries are always different from vendor to vendor and from language to language. They often are in the range of 100,000 words/entries for a language pair. They are meant to have the most coverage as possible, without being too focused to one sector/domain or another, and they regularly are updated with samples of technical terms from specific domains.
The specific way that an MT vendor creates attributes within the dictionary entries can optimize the number of separate entries that are needed in the General internal dictionary.

2) topical / specialized dictionary (optional purchasable add-ons): different vendors refer to these with different names. These are usually "industry-specific: topical dictionaries, created and sold separately by the MT vendor, which cover a large set of common technical terms in an industry. A topical dictionary attempts to provide the 2,000, 5,000, 10,000 most common technical terms in a specific sector as a way to override the terms that are available in the internal dictionary.
These are add-on dictionaries, which attach to the MT engine in the same way as the user dictionaries (described below). However, these topical/specialized dictionaries, are locked and cannot be modified.
In some tools like PROMT (expert), you can also take a user dictionary (described below), add copyright info and comments, and lock it to be a topical dictionary which can be distributed to others.

3) user dictionary: these are also override dictionaries. These are dictionaries that users can make themselves.
They are intended to override the internal dictionary and can even be used to override the topical dictionaries which might provide a good general industry technical term, but not the exact one that is needed for a specific client.
I have created both topical/specialized dictionaries (when working for MT companies) and many user dictionaries. However, I have never needed to purchase topical dictionaries, because I can create my own user dictionaries in very optimized ways that replace that optional add-on.

In user dictionaries modules, sometimes there is novice/beginner mode with some basic functionality, and then an advanced mode in pro and expert versions of the products.
The beginner mode is good for any general user to be able to quickly create a multiword term and get the system to recognize and use it, but such basic level entries create problems for a more serious approach to consistent translations with glossaries. The Advanced mode usually provides some language-specific settings that can inform the system about how to analyze the different component words of the multi-word term in necessary ways. Yes, I can get the basic level mode to work, but as a language professional, I feel handicapped with that.
Analogy: a 5-speed bike is certainly better for climbing hills than a one-speed bike. But if you ask a professional cyclist to do the Tour de France with a 5-speed, they will laugh at you, because they would need a much much performant set of front and back cogs to provide 18, 21, 27 or the necessary speed-set combination for the cyclist for their various needs for various riding conditions.
So, using a basic mode dictionary module is better than just an free online MT tool with no dictionary customization, but the advanced level dictionary management is tailored to the needs of users who need that functionality.

So, all of the comments about Online MT systems translating word-for-word is not true. It's simply that when the system runs into some technical (or marketing) multi-word terms, then it ties to combine them with linguistic rules, and when it can't then it tries word-by-word. So, all that is needed is the override the system by adding in those technical/marketing terms in to the dictionary.

How you name and categorize such dictionaries is your own choice as a user. The various MT vendor products offer different ways to attach 1 or more of these dictionaries (topical and user) to a project, and how to set the order of priority by which the dictionaries override each other.

The only thing is that all such entries are then also subject to the linguistic grammar rules as well, so it is important to be careful how to do it.

There is simply no publicly available info on how to do this. What has been written above is a quick set of a few paragraphs to explain what usually costs thousands of dollars in private training courses or provided as a technical expert service to customers.

I have explained some differences in levels of dictionaries at:
http://www.proz.com/post/189704#189704
http://www.proz.com/post/189718#189718
http://www.proz.com/post/1205999#1205999

Some MT vendors sell the topical dictionaries, some do not. And they are not always avaiable for all sectors/domains, nor for all languages. It is a lot of manual work and is usually determined by market need.


4) From both Promt
and Systran, I get the idea that "only the best (most
expensive) will do." But maybe that's overkill (it would
certainly kill my bank account, esp. since I have two
language pairs). For the top-of-the-line products of both
companies, do you get a lot of capabilities that the
ordinary user is going to have no use for? As I said, "all"
I want to do is to be able to edit the dictionary, so the
machine doesn't make the same mistakes over and over again.


See my description above about using the basic level vs advanced level features and the analogy with the cyclist.
You can buy the 50-100$ tool, but it will have very limited functionality, especially on the dictionary entry management.



Thanks for your seemingly endless willingness to write long
and informative posts on these matters!


Hope it helps. I've been thinking for a long time to write the Handbook for MT dictionary building, but cannot be done is just some free time here and there, as all of the articles and posts could be provided.

Jeff


 
david young
david young  Identity Verified
France
Local time: 10:57
French to English
MT better than TM :) Nov 21, 2009

I started using Power Translator many years ago for my biomedical French-English translation work. The results were awful, but it was still quicker editing the "franglais" than typing the entire translation with two fingers.
To improve the results I developed a few macros (mini-programs) in Word, whereby I could copy, with a single keystroke (Ctrl + 1, 2, 3, 4 or 5) frequently recurring mistranslations (1, 2, 3 , 4 or 5 words) into a separate Word file consisting of a two-column table. Onc
... See more
I started using Power Translator many years ago for my biomedical French-English translation work. The results were awful, but it was still quicker editing the "franglais" than typing the entire translation with two fingers.
To improve the results I developed a few macros (mini-programs) in Word, whereby I could copy, with a single keystroke (Ctrl + 1, 2, 3, 4 or 5) frequently recurring mistranslations (1, 2, 3 , 4 or 5 words) into a separate Word file consisting of a two-column table. Once a month or so I would add the correct translations to the right-hand column and use another macro to convert the table into a list of "search-replace" commands.
Over the years I've built up a database of about 2500 mistranslations and their corrections.
When I start a new translation I run it through Power Translator (same old version...) then execute the database-macro that corrects most of the frequent mistranslations. The whole process takes between 5 and 10 minutes, and the result is pretty good - better than the latest Systran release, anyway.
I'm extremely impressed by Google Translate, although the mistranslations can be dangerous.
In my view, customized MT can be very useful for translators working in a narrow subject area, but less so for "generalists". Fiction is clearly a no-go for MT.
Having followed this subject closely for some 20 years, I'm pretty sure that things are poised to accelerate quickly in the field of MT, especially using corpus-based approaches -- and the Web is a pretty massive digital corpus.

Why is MT better than TM in certain fields, such as medicine? Most scientists, in my experience, can't write succinct, syntactically correct prose. My clients rely on me to both edit and translate. This often implies leaving some sentences untranslated ("AIDS is a major public health problem"), merging several sentences into one, or switching the order of ideas. Even fuzzy-matching TM systems can't handle that, as far as I know.
Second, scientific prose is among the simplest, and thus the most amenable to MT.
Collapse


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 05:57
Russian to English
+ ...
Thanks, Jeff Nov 21, 2009

Jeff Allen wrote:

Hope it helps. I've been thinking for a long time to write the Handbook for MT dictionary building, but cannot be done is just some free time here and there, as all of the articles and posts could be provided.

Jeff


Extremely useful. I find it hard to believe that you couldn't find funding for a handbook, esp. since the companies that make these things would stand to make $$$ if more people knew how to use their products.

Thanks,
Susan


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
there are many specialized dictionaries for PROMT Nov 21, 2009


Susan wrote
3) Promt has the Russian I need (of course, since it's Russian in origin), but it says the
dictionaries provided are just for Internet and Tourism, and other dictionaries have to be purchased separately. Is this for real? What use is that? Most translations are in the
financial and legal fields, so far as I know. I need those, plus scientific terms (e.g., psychology).


Jeff Allen wrote:
2) topical / specialized dictionary (optional purchasable add-ons): different vendors refer to these with different names. These are usually "industry-specific: topical dictionaries, created and sold separately by the MT vendor, which cover a large set of common technical terms in an industry. A topical dictionary attempts to provide the 2,000, 5,000, 10,000 most common technical terms in a specific sector as a way to override the terms that are available in the internal dictionary.
These are add-on dictionaries, which attach to the MT engine in the same way as the user dictionaries (described below). However, these topical/specialized dictionaries, are locked and cannot be modified.
In some tools like PROMT (expert), you can also take a user dictionary (described below), add copyright info and comments, and lock it to be a topical dictionary which can be distributed to others.


Hi Susan,
I just checked out the PROMT site, and they seemed to have beefed up their offer of Specialized dictionaries. See the Dictionary Collection English and Dictionary Collection Multilingual.
Both of your language directions are there.
And they are all compatible with v8 which would be what you would go for.

Now, those dictionaries will work with the range of PROMT models (@promt Expert 8.0
@promt Professional 8.0 , @promt Office 8.0 , @promt Personal 8.0 , @promt Professional 8.0 NET), so you could even buy a low-range or mid-range product like PROMT office or PROMT Personal and attach the add on dictionaries to it.

But you should really check out the product comparison chart on their site (Compare v8 products) for the chart that compares all of their product models in a matrix chart.

If I can extend my analogy of cycling a little more, you can see that the free online PROMT would be the 1-speed bike, PROMT personal would be the 5-speed, PROMT Office would be a 10-speed, PROMT Professional would be the 18-speed, PROMT Expert would be the 21-speed, etc. if you get the add-on specialized dictionaries, then its like adding 3 extra gears to any of those. it's not necessary to add those specialized dictionaries, but its a good springboard because they have already tested/validated those entries in-house, and you would only need to add your own customized user dictionary entries on top to handle terminology that is not there, or needs to be tweaked in some ways to work with the existing entries in the general internal dictionary or the specialized dictionaries.

Pay more for the bike and you get more speeds to climb hills more easily (especially important for long-distance travelling with panier bags on front and back with a tent, sleeping bag, clothes, and food). That's similar to the context of being a freelance translator where you need to be in full production mode and are cycling with all the baggage, and are fighting against natural elements (rain, hot sun, high wind) and other dangers (ditches on the side of the road, glass, gravel, and big trucks with wind-drafts that can cause you to lose stability).

So you could buy the 5-speed or 10-speed and buy an extra 3-speed add-on, but that will only allow you to handle some of the obstacles mentioned above. The top of the line with all the speed, lightweight, all the racks installed can make the trip easier to make.

I didn't test the terminology extraction module in v6 in my software review of it (it was already quite time-consuming and so needed to set a time-limit and priority of what to test, the management of all the bug reports I send in, and regression testing against bugs that I submitted during software reviews of previous versions).
But I've got the v6 expert version for several languages, and that was the first version where they introduced it. Maybe I'll try it out and see how it does and give some feedback. Need to realize however that this terminology extractor dates back to 2004, so it can have been improved significantly since then.

As for specialized dictionaries, when I obtained certification as a PROMT dictionary developer back in 2005, there were few specialized dictionaries available for PROMT v6 and above. And as I had participated in the creation/editing of the PROMT/Reverso v4 industry-specific specialized dictionaries several years before, it was simpler for me to just create my user dictionaries from scratch. As you can see from my case study projects, I can do this quickly and with high-quality. And now I'm putting together a project to multiply the content coverage by 100x and plan to significantly optimize the process with automated terminology extraction, and other special techniques.

Jeff


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 05:57
Russian to English
+ ...
MT and bicycles Nov 21, 2009

Jeff Allen wrote:

But you should really check out the product comparison chart on their site (Compare v8 products) for the chart that compares all of their product models in a matrix chart.


Actually, I did check this out a few weeks ago, but the chart doesn't mean as much to me as it does to you.
Looking at just the Professional and Expert in Promt 8, they are identical up to the last three items. Professional does not have, but Expert does have:

"Advanced tools for vocabulary management" - Now what the heck does that mean? Do I need advanced tools? Would I know what to do with them if I had them? If I wanted to learn, how long and steep would the learning curve be?

"Automatic terminology extraction for fast subject dictionary creation" - Now, who could not want that? But what does it mean by "automatic"? Is it worth almost $200 more, for just one of my two language pairs? (I also need a new pair of shoes....)

Plug-in for TRADOS TMs - I don't have TRADOS and have no interest in getting it. If I did, I would certainly not be able to afford an MT package. I use OmegaT, sometimes Across, and may start using Wordfast.

Now, back to bicycles:

Jeff Allen wrote:

If I can extend my analogy of cycling a little more, you can see that the free online PROMT would be the 1-speed bike, PROMT personal would be the 5-speed, PROMT Office would be a 10-speed, PROMT Professional would be the 18-speed, PROMT Expert would be the 21-speed, etc.


First of all, I am not intending to participate in the Tour de France. To be literal about it, I just passed my 60th birthday, I have a lot of neck and shoulder pain, and the last time I rode a bike was when my 20-year-old son was about 3. It was a 3-speed bike. I have never ridden a 5-speed bike. On my 3-speeder, the gears kept slipping, and I would always end up back in third gear. I didn't know how to fix it, and neither did my husband, who learned to ride a bike from me, after we got married. (He grew up in Manhattan, where kids didn't ride bikes.) If I had a 21-speed bike, what's your guess about whether I would be able to ride it without running into a tree?

I hope you get my point.

But I certainly would want an MT package that allowed me to modify the dictionaries. Buying a cheap MT package plus add-on specialized dictionaries that are locked by the manufacturer would not seem like a sensible way to go.

Nevertheless, you have answered enough of my questions that I will probably just download a trial of one of these things and see if I can make it work. Even if I have to use Windows instead of Linux, which will make my son mad when he comes home for Thanksgiving.

Thanks,
Susan


 
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:57
Multiplelanguages
+ ...
funding for the handbook for MT Nov 22, 2009

Jeff Allen wrote:
Hope it helps. I've been thinking for a long time to write the Handbook for MT dictionary building, but cannot be done is just some free time here and there, as all of the articles and posts could be provided.


Susan Welsh wrote:
Extremely useful. I find it hard to believe that you couldn't find funding for a handbook, esp. since the companies that make these things would stand to make $$$ if more people knew how to use their products.


If only it were that simple. There are some general system independent concepts that can be taught/learned and which are important to understand to master any of the tools in a really productive way.
And it doesn't matter which tool is used, because most of the different brands can do it. However, each of those vendors has a range of different products, and the low-end models have a different translation environment interface than do the high-end models. And then sometimes from a major product version to another, some major changes can take place.
And each vendor has a different way of implementing their dictionary module(s). So, I can explain general principles and how to put them into practice on using one MT product, but then with another product, I have to explain it practically in a different way because the tasks done are a bit different and are relevant to how it is managed with the specific software interface.


So, it's not quite comparing apples and oranges, but rather oranges, nectarines, and clementines, or from the French persective, different varieties of grapes.

Back in 2004 I presented the idea to at least one MT vendor of creating screen-recorded webinars with Camtasia, and did a 2 minute example of one in real-time of how it can be done.
Not really a complicated idea. Not that expensive (for a company) to do either. Don't know why 5 years later this kind of idea hasn't been used, where it is used for many other types of software products on a daily basis. Heck, in my current job (not MT-related), I create such webinars and videos and PPT presentations every 1-2 weeks to show specific teams how to use tools for specific needs in their specific contexts.
Maybe it's a chicken and egg problem. If the MT vendors need all the screencorder vendors to agree on creating a cross-vendor/cross-product chart of all screencorder products before the MT vendors can make a decision on which brand to buy and which specific model of the product, in order to create a webinar, then users will be sitting around for decades waiting for a simple recording of what the features look like.

Maybe I should just do it, but it's much better to do this within the framework of an independent organization that is recognized for comparing products, that each of the vendors is a member of, to avoid the hassle of having to go around and get the agreement from each vendor on each video of each of their products.

Jeff


 
Pages in topic:   < [1 2 3 4 5 6 7] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

'[MT] is most often used alongside [TM] as an adjunct to human translation'. Are you using it?






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »