Adding suffix rule in Hunspell for Greek
Thread poster: Spiros Doikas
Spiros Doikas
Spiros Doikas  Identity Verified
Local time: 19:44
Member (2002)
English to Greek
+ ...
Dec 2, 2014

Hunspell appears to miss some rules for Greek resulting in flagging words ending in -ευτείτε as errors (i.e. ερωτευτείτε, εκμεταλευτείτε).

I tried adding rules in el_GR.aff file like:

SFX R εύομαι ευτείτε
Or
SFX Z ομαι ευτείτε εύομαι

but had no luck. Also tried to find Hunspel
... See more
Hunspell appears to miss some rules for Greek resulting in flagging words ending in -ευτείτε as errors (i.e. ερωτευτείτε, εκμεταλευτείτε).

I tried adding rules in el_GR.aff file like:

SFX R εύομαι ευτείτε
Or
SFX Z ομαι ευτείτε εύομαι

but had no luck. Also tried to find Hunspell people (http://elspell.math.upatras.gr/?section=oofficespell&subsection=feedback) but e-mail bounces.
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 19:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
More info Dec 2, 2014

1. Why asking in the Trados forum?
2. Greek is not among world's most spoken languages. Provide more details on the words in question, in particular their dictionary forms.
3. The first rule looks wrong.
4. As for the second, did you assign the respective flag to any word? Did you unmunch the dictionary? Does the required word form appear in the unmunched list?
5. Share the files, do not copy and paste text here, something important may be lost.


 
Spiros Doikas
Spiros Doikas  Identity Verified
Local time: 19:44
Member (2002)
English to Greek
+ ...
TOPIC STARTER
Because I use it through Trados Dec 2, 2014

1. Why asking in the Trados forum?
Because I use it through Trados

2. Greek is not among world's most spoken languages. Provide more details on the words in question, in particular their dictionary forms.

Forms listed in full dictionary:

ερωτεύομαι
ερωτευόμασταν
ερωτευόμαστε
ερωτευόμουν
ερωτεύονται
ερωτεύονταν
ερωτευόντουσαν
... See more
1. Why asking in the Trados forum?
Because I use it through Trados

2. Greek is not among world's most spoken languages. Provide more details on the words in question, in particular their dictionary forms.

Forms listed in full dictionary:

ερωτεύομαι
ερωτευόμασταν
ερωτευόμαστε
ερωτευόμουν
ερωτεύονται
ερωτεύονταν
ερωτευόντουσαν
ερωτευόσασταν
ερωτευόσαστε
ερωτευόσουν
ερωτευόταν
ερωτευτεί
ερωτεύτηκαν
ερωτεύτηκε


4. As for the second, did you assign the respective flag to any word? Did you unmunch the dictionary? Does the required word form appear in the unmunched list?
The full word with the suffix form does not appear in full list in the dictionary. Different forms of that word do appear as seen above. Words with similar affixes appear in the full list, with the full affix, i.e. εκμεταλλευτείτε. No flags are used in word list.

5. Share the files, do not copy and paste text here, something important may be lost
http://paxos.tk/huns.rar
Collapse


 
esperantisto
esperantisto  Identity Verified
Local time: 19:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Now, it’s clear Dec 2, 2014

OK, the problem is that the dictionary provided by you does not use affixes for declensions. To make advantage of it, you should:

1. Determine the initial form. Unfortunately, in your example it’s not clear. Let it be:
Code:
ερωτεύομαι


2. Determine the part th
... See more
OK, the problem is that the dictionary provided by you does not use affixes for declensions. To make advantage of it, you should:

1. Determine the initial form. Unfortunately, in your example it’s not clear. Let it be:
Code:
ερωτεύομαι


2. Determine the part that remains unchanged in any word form. I guess, it’s:
Code:
ερωτε


3. Determine the part to drop. It’s:
Code:
ύομαι


4. Determine the part to add. It’s, as you say:
Code:
υτείτε


5. Now, we’re ready to make a rule. It’s:
Code:
SFX R Y 1
SFX R ύομαι υτείτε ύομαι


In this rule, the first line, the header, means the following:
SFX = suffix (i. e., a part at the end of the word);
R = its identifier;
Y = this suffix may be combined with other affixes (I don’t know if it’s true, but it’s generally safe to put Y unless you’re sure otherwise);
1 = the line count except for the header.
The second line:
SFX R = well, it’s clear, I guess;
ύομαι for the first time = the part to drop;
υτείτε = the part to add;
ύομαι for the second time = this line (as such lines may be multiple for a rule) applies only to words ending with ύομαι.
6. Now, add the above rule to the aff file.
7. In the dic file, change
Code:
ερωτεύομαι


to
Code:
ερωτεύομαι/R



I’ve created a testcase with the above rule and the above word. Unmunching produces:
Code:
ερωτεύομαι
ερωτευτείτε



Looks right, is it?

[Edited at 2014-12-02 17:27 GMT]
Collapse


 
Spiros Doikas
Spiros Doikas  Identity Verified
Local time: 19:44
Member (2002)
English to Greek
+ ...
TOPIC STARTER
Thanks Dec 2, 2014

It is interesting that although the dictionary does not use affixes for declensions there is an .aff file with affixes... I wonder how these interact with the dictionary since the actual dictionary entries are not marked somehow.

 
esperantisto
esperantisto  Identity Verified
Local time: 19:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
A must Dec 2, 2014

An aff file is a must. It may be empty, containing only an encoding declaration, but its absence will cause failure of spellcheck (well, in the apps that I know, not sure about Trados).

[Edited at 2014-12-02 19:31 GMT]


 
Spiros Doikas
Spiros Doikas  Identity Verified
Local time: 19:44
Member (2002)
English to Greek
+ ...
TOPIC STARTER
I see Dec 3, 2014

So in this case the file, although it has entries, serves of no practical purpose?

By the way which tool do you use to unmunch the dictionary?


 
esperantisto
esperantisto  Identity Verified
Local time: 19:44
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Indeed Dec 4, 2014

Yes, in this particular case, the affix file seems to be virtually useless. I vaguely remember that some ten years ago or so MySpell (Hunspell’s predecessor) was reported to have severe problems with the Greek script, and the dictionary was built as a mere list of all inflected forms. Most probably, the problem does not exist anymore, but nobody has taken care to revamp the dictionary.

To unmunch the dictionary, I use the unmunch command from the Hunspell package:
Generate all word forms using Lucene & Hunspell.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Adding suffix rule in Hunspell for Greek







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »