Term Recognition for Non-Space languages
Thread poster: Peter Ross
Peter Ross
Peter Ross  Identity Verified
Australia
Local time: 06:59
Thai to English
+ ...
Apr 29, 2011

Hi

Does anyone know for sure what CAT programs support Term Recognition for non-space languages (languages that do not have spaces in between words but only at phrase or sentence level)? For example, Thai, Lao?

(Programs would need a special algorithm or look up to find and match terms within phrases/sentences)

Thanks

Peter


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 22:59
English to French
+ ...
OmegaT, for some languages Apr 29, 2011

Peter Ross wrote:
Does anyone know for sure what CAT programs support Term Recognition for non-space languages (languages that do not have spaces in between words but only at phrase or sentence level)? For example, Thai, Lao?

(Programs would need a special algorithm or look up to find and match terms within phrases/sentences)

OmegaT has a tokenizer plugin, which improves term recognition for such languages.

However, the specific algorithms depend on those provided by Lucene, and so only specific languages are covered.
I know Chinese and Japanese work.
A Thai tokenizer is available, too, but I have no feedback, so I don't know whether it is efficient or not.

Didier


 
Peter Ross
Peter Ross  Identity Verified
Australia
Local time: 06:59
Thai to English
+ ...
TOPIC STARTER
Term Recognition for Non-Space languages - presegmentation May 10, 2011

Thanks Didier

Hats of to OmegaT and other free programs. The interface is not for the faint-hearted so I'm sorry I didn't succeed in testing the Thai tokenizer.

I understand that when programs like Microsoft Word operate in a localized fashion they carry out a kind of background segmentation process which allows non-space language words to be recognized as such (for example in Thai, clicking on "unsegmented" text will highlight a Thai word). However, it seems that progr
... See more
Thanks Didier

Hats of to OmegaT and other free programs. The interface is not for the faint-hearted so I'm sorry I didn't succeed in testing the Thai tokenizer.

I understand that when programs like Microsoft Word operate in a localized fashion they carry out a kind of background segmentation process which allows non-space language words to be recognized as such (for example in Thai, clicking on "unsegmented" text will highlight a Thai word). However, it seems that programs like Trados and WordFast have yet to implement this kind of feature, so term recognition cannot work.

The alternative is to presegment text before translation. For Thai, there's a free program at http://pioneer.chula.ac.th/~awirote/index.html.

Peter
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 22:59
English to French
+ ...
It's easier with support May 10, 2011

Peter Ross wrote:
Hats of to OmegaT and other free programs. The interface is not for the faint-hearted so I'm sorry I didn't succeed in testing the Thai tokenizer.


With the help of the Yahoo support group, it would be easier.

Didier


 
Selcuk Akyuz
Selcuk Akyuz  Identity Verified
Türkiye
Local time: 23:59
English to Turkish
+ ...
Deja Vu X May 11, 2011

I have made a test with DVX, entered some terms to the termbase and DVX recognized them. You can download a 30-day trial version of DVX and test it yourself. See also DVX group at http://tech.groups.yahoo.com/group/dejavu-l/

Selcuk


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Term Recognition for Non-Space languages







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »