Pages in topic:   < [1 2]
Understanding Google translate?
Thread poster: Jeff Whittaker
Jeff Allen
Jeff Allen  Identity Verified
France
Local time: 10:13
Multiplelanguages
+ ...
English as a pivot language for MT Nov 22, 2009

Mathilde Verbaas wrote:

I used google a lot to get the gist of Czech websites. I found out that Czech-English translations produce a quite good result (I can understand the basic meaning of the text) but Czech-Dutch translations are crap, sometimes the translation even has the opposite meaning of the original text! After playing around a bit, it seems that google translates Czech texts first to English and then translates the English texts to Dutch.

Do other people have the same experience with other language pairs?


Hi Mathilde,

About 10 years ago, the only languages that were both source and target were English (for most MT sytems), French (for Systran and Reverso) and Russian (for PROMT), and maybe possible German for the MT systems developed in Germany.
Few language directions existed that did not have 1 of these languages as source or target.
A few exceptions were FrenchSpanish, GermanFrench which were direct translation directions. For example, Softissimo sold those direct language directions.

At that time, for requests for MT for other language pairs, especially for Asian languages, would be the creation of a new direct language direction (at a panel at AMTA1998 of MT vendors on the topic making MT available for minority language, a figure of 500,000 USD was mentioned), or sometimes the proposal of pivot language (like English) between the source and final target languages, but the latter option always contains many risks.

In 2006-2007, Systran made available a new matrix with many new language directions that were neither EN nor FR as the source or target. This raised questions among user group audiences about these being all direct or using internal pivot languages.

Google Translate then switched over from Systran to their own statistical-based MT engine about a year later. it is easier to create Statisical-based MT engines as language independent systems which can treat language specific data (similar to how Translation Memory tools are developed). However, these statistical-based system need language data to work with.

Since the MT discussion group on LinkedIn has a number of people from the commercial MT vendors community who answer questions about their systems compared to how the general online free tools do things, I'll ask this question about pivot languages there and when a reply is posted to that thread, I'll provide a link to it here.

How about that?

Jeff


 
philgoddard
philgoddard
United States
German to English
+ ...
. Nov 22, 2009

Jeff Whittaker wrote:

It seems that Google may just skip over words it does not know and when it encounters a grammatical structure it does not understand, it seems to make something up or pull some kind of fuzzy match from cyberspace.


Isn't that what we all do?


 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 04:13
Member (2008)
French to English
+ ...
Termium's advice on MT Dec 31, 2009

From the Termium FAQ:

"Some suppliers of machine translation software say that, using computer logic tools, rather than a language professional, they are able to provide instant translations of documents in the main languages spoken. Is this realistic?

It is hard to say whether they will succeed and, if they do, what the quality of the translations will be. At present, they cannot, and some people have learned this the hard way. For example, in November 2007, a group of
... See more
From the Termium FAQ:

"Some suppliers of machine translation software say that, using computer logic tools, rather than a language professional, they are able to provide instant translations of documents in the main languages spoken. Is this realistic?

It is hard to say whether they will succeed and, if they do, what the quality of the translations will be. At present, they cannot, and some people have learned this the hard way. For example, in November 2007, a group of Israeli journalists who were going to be attending a seminar in Amsterdam created a diplomatic incident. At the request of the Dutch consulate, they submitted five questions to the Dutch Ministry of Foreign Affairs, questions that they translated using one of the most advanced machine translation systems. The questions thus translated were incomprehensible: "Hello Bud, the mother your visit in Israel is a sleep to the favour or to the bed your mind on the conflict are Israeli Palestinian."

The Translation Bureau advises against use of machine translation systems other than for purposes of simple personal information."
Collapse


 
juvera
juvera  Identity Verified
Local time: 09:13
English to Hungarian
+ ...
Parallel? Jan 2, 2010

Daniel Grau wrote:
I think this is a Photoshop hoax. If you zoom in on the picture, you'll notice that the bottom of the characters are actually parallel to the bottom of the picture, instead of being parallel to the frame of the sign.


The top of the text is approx. 11.5 degrees off horizontal, the bottom of the text is about 10 degrees off, and the bottom frame is about 8 degrees. Taking the effect of perspective into consideration, this corresponds well to the expected arrangements of lines on a flat surface.


 
Daniel Grau
Daniel Grau  Identity Verified
Argentina
Member (2008)
English to Spanish
Not parallel! Jan 2, 2010

@juvera:

I was referring to "the bottom of the characters" (i.e., the ends of the vertical stems, where the serifs would be), while you are considering "the bottom of the text" (i.e., the baseline). Just zoom in to 300% and check the bottom ends of the lower case Rs.

Daniel


 
juvera
juvera  Identity Verified
Local time: 09:13
English to Hungarian
+ ...
@Daniel Jan 5, 2010

I am baffled.
You can check the bottom ends of any letter, or connect them up as a line, it doesn't make much difference, they are following the same perspective as the board, even at 425%. In other words, if you connect the top or bottom ends and the top and bottom lines of the board, the would end up in a point somewhere in the distance.
Are we looking at two different pictures?


 
Neil Coffey
Neil Coffey  Identity Verified
United Kingdom
Local time: 09:13
French to English
+ ...
Go-between language almost inevitable Jan 5, 2010

Mathilde Verbaas wrote:

I used google a lot to get the gist of Czech websites. I found out that Czech-English translations produce a quite good result (I can understand the basic meaning of the text) but Czech-Dutch translations are crap, sometimes the translation even has the opposite meaning of the original text! After playing around a bit, it seems that google translates Czech texts first to English and then translates the English texts to Dutch.

Do other people have the same experience with other language pairs?


It's almost inevitable that for less common language pairs, a go-between language will be used as you suggest:

(1) As the number of languages offered grows, the number of possible combinations increases exponentially (well, OK, factorially...) -- e.g. for 20 languages there are 380 combinations, 30 languages give 870 combinations, 40 give 1560 combinations... -- beyond some small limit, it's not practical to individually train the system for every possible combination...
(2) And even if it were, the performance of the system between a given pair of languages is highly dependent on the volume and quality of training data (i.e. existing examples of human translations available in electronic form) in the pair in question; there rarer either or both of the languages, the smaller this dataset will be...


 
Daniel Grau
Daniel Grau  Identity Verified
Argentina
Member (2008)
English to Spanish
@juvera Jan 5, 2010

if you connect the top or bottom ends and the top and bottom lines of the board, the would end up in a point somewhere in the distance.


Granted, but I was not referring to lines joining different characters, juvera.

Since the horizontal perspective lines run from the bottom left of the picture to the upper right, no horizontal element of a character should be parallel to the bottom of the picture. Yet, at a large zoom, it seems to me that the short horizontal bottoms of the vertical stems in the Rs—and even the horizontal strokes of the Es and their curved bottoms—are much too horizontal.

In addition, the error message does not sound legit. Being in English, presumably the software was written by English speakers, who would most likely have referred to a "translation server error." However, it does read funnier as it is.

Regards,

Daniel


 
Marcin Rey
Marcin Rey  Identity Verified
Poland
Local time: 10:13
Polish to French
+ ...
friartikkel Jan 5, 2010

Look at that:
http://fr.friartikkel.com/
http://pl.friartikkel.com/
It seems the English version is the original.
Impressive.


 
juvera
juvera  Identity Verified
Local time: 09:13
English to Hungarian
+ ...
@Daniel Jan 5, 2010

Indeed, even their software translation is dodgy.

But when you examine the enlarged picture, the top of the capital T seems to be behind the steel beam structure and the whitish surround on the top of the next "t" also seems to be visible above the steel bar. Also, the size of the letters is increasing at a proportionate rate, including the thickness of the vertical elements of the r-s.

Of course, it is possible to make it up, but it seems an awful lot of work to me.... See more
Indeed, even their software translation is dodgy.

But when you examine the enlarged picture, the top of the capital T seems to be behind the steel beam structure and the whitish surround on the top of the next "t" also seems to be visible above the steel bar. Also, the size of the letters is increasing at a proportionate rate, including the thickness of the vertical elements of the r-s.

Of course, it is possible to make it up, but it seems an awful lot of work to me.

Jeff, I hope you excuse us for this little sideline discussion.

Regards,
Judith

[Edited at 2010-01-05 11:40 GMT]
Collapse


 
Grayson Morr (X)
Grayson Morr (X)  Identity Verified
Netherlands
Local time: 10:13
Dutch to English
Only the machines... Jan 5, 2010

philgoddard wrote:

Jeff Whittaker wrote:

It seems that Google may just skip over words it does not know and when it encounters a grammatical structure it does not understand, it seems to make something up or pull some kind of fuzzy match from cyberspace.


Isn't that what we all do?

Phil, I suspect you actually do what I do: spend an hour or more researching the source term in the source language, then searching for appropriate target-language terms, then making sure these are actually used in the target language within the context of the source document.

Unless you've recently tossed your translation technique in favor of a "1000 words an hour" approach. Or turned into a machine.


 
Kirti Vashee
Kirti Vashee  Identity Verified
United States
Local time: 01:13
How SMT works Feb 4, 2010

There is a detailed but not overly technical over of how SMT works in a video at this link: http://languagestudio.com/Webinars.aspx

The graphics on that page may also be useful . There is also a good description of how general purpose systems like Google Translate differ from customized professional quality engines.


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Understanding Google translate?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »