Converting Freedict.org dictionaries for OmegaT
Thread poster: Samuel Murray

Samuel Murray  Identity Verified
Netherlands
Local time: 22:36
Member (2006)
English to Afrikaans
+ ...
Jun 5

Hello everyone

OmegaT can read two dictionary formats, namely StarDict (only version 2.4.2, with sametypesequence "m" or "g") and DSL. DSL dictionaries can be complex or simple. The dictionaries on Freedict.org are available in three formats, namely the .slob format (for mobile apps), the DICT.org format (an .index file), and a format named .dict.dz. I managed to convert some .index files to a .dsl files that OmegaT can read. I use Windows 7.

You need:
* Python
... See more
Hello everyone

OmegaT can read two dictionary formats, namely StarDict (only version 2.4.2, with sametypesequence "m" or "g") and DSL. DSL dictionaries can be complex or simple. The dictionaries on Freedict.org are available in three formats, namely the .slob format (for mobile apps), the DICT.org format (an .index file), and a format named .dict.dz. I managed to convert some .index files to a .dsl files that OmegaT can read. I use Windows 7.

You need:
* Python 3 installed
* the PyGlossary converter
* a text editor capable of saving UTF8N to UTF16LE
* a text editor capable of find/replace-ing tabs and line breaks

Not all files will convert. This may be due to problems with the PyGlossary converter or with the free dictionary files on Freedict.org. Also, only some types of conversions that PyGlossary appear to offer, will work -- but converting .index to .txt (tab-delimited) works fine in a number of cases.

For the conversion, I used the program PyGlossary. It requires Python 3, and must be started from a command window. I recommend installing Python 3 in an easy-to-type location, e.g. C:\Python3\ instead of the usual "Program Files\Python37-32\" location. To download PyGlossary, visit it’s web site, click the green "Clone/download" button, and select "Download ZIP". Unzip it into a separate folder anywhere (using e.g. 7-zip). Then open a command window in that folder (on my computer, I navigate one folder upwards and use Shift + right-click on the folder to reveal an option called "Open command window here"). In the command window, type C:\Python3\python.exe pyglossary.pyw --ui=tk. This should open the PyGlossary converter.

To download a dictionary from FreeDict.org, visit this URL, scroll down to "Dictionary downloads", and select one of your languages. It will show a list of available dictionaries. For demonstration purposes, let’s choose French-Portuguese. Unzip the file twice (using e.g. 7-zip) until you get an .index file (in this case, named fra-por.index).

In PyGlossary, click the "Read from format" button and select "DICT.org file format (.index)", then click the Browse button and navigate to the downloaded .index file (and select it, obviously). Click the "Write to format" button and select "Tabfile (txt, dic)". PyGlossary should automatically fill in the path. Click the Convert button to start the conversion.

1 read format

2 converted

We now have a file called fra-por.txt, which is a tab-delimited file in UTF8N format. You have to open this file in a text editor that can read it correctly, and save it as a TXT file in UTF16LE format. I use Akelpad, for example.

We then need to do some find/replace-ing to convert this into a DSL file that OmegaT will accept. Replace tabs with a line break, a space, a pipe (or any character you prefer), and a space. Also replace "\n" with a line break, a space, a pipe (or any character you prefer), and a space. Replace "<" and "[" with something else (because OmegaT ignores text between "<>" and "[]").

In Akelpad, these replacements work:

3 replacements

Finally, rename the file so that it has the file extension ".dsl".

This is what it appears like in OmegaT:

4 in omegat

[You'll notice that the headwords are repeated in lowercase. If you want to avoid this, you have to copy/paste the text from the TXT file into e.g. Excel, and then copy/paste the second column back into the TXT file, and then add line breaks between the headwords and parts of speech indicator. However, then it will match fewer words, for some reason that I can't determine.]

Your comments? Do let us know which files did convert, and which didn't.


[Edited at 2019-06-05 09:26 GMT]
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Converting Freedict.org dictionaries for OmegaT

Advanced search






PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »
SDL Trados Studio 2019 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2019 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2019 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search