How to run terminology check in large TMs? Thread poster: Lais Lewicki
|
Hello! I have been tasked with "cleaning up" our (very large) TMs. Our goals are to: Remove duplicates/inconsistent translations Run a number check Run a spellcheck Check terminology using our termbase for that specific TM I've successfully used Heartsome Editor to remove duplicates and inconsistent translations, but I'm stuck on how I could best carry out the remaining tasks. Usually, we use Verifika to run quali... See more Hello! I have been tasked with "cleaning up" our (very large) TMs. Our goals are to: Remove duplicates/inconsistent translations Run a number check Run a spellcheck Check terminology using our termbase for that specific TM I've successfully used Heartsome Editor to remove duplicates and inconsistent translations, but I'm stuck on how I could best carry out the remaining tasks. Usually, we use Verifika to run quality checks on translation projects. But when I tried to run it for this particular TM (over 300MB in size), the process ran for over 12 hours and it still did not finish. That seems unfeasible to me. Can you give me any pointers on what I could do or software I could use for this task? Thanks in advance! ▲ Collapse | | | Samuel Murray Netherlands Local time: 09:57 Member (2006) English to Afrikaans + ... Deselect checks | Mar 21, 2022 |
Lais Lewicki wrote: Usually, we use Verifika to run quality checks on translation projects. But when I tried to run it for this particular TM (over 300MB in size), the process ran for over 12 hours and it still did not finish. That seems unfeasible to me. I'm not familiar with Verifika, but... could it be that Verifika ran so slow because it was checking too many types of errors? Try selecting *just one* type of error at a time. | | | Charles Peng China Local time: 15:57 Member (2022) English to Chinese Try Xbench 3.0 | Mar 21, 2022 |
You can try Xbench 3.0, which can quickly check the issues you mentioned, i.e. export the TM as *.tmx format then load it into Xbench. And as @Samuel Murray suggested, you can check one error type at a time;
[修改时间: 2022-03-21 16:01 GMT] | | | Stepan Konev Russian Federation Local time: 10:57 English to Russian QA Distiller | Mar 21, 2022 |
You can also try QA Distiller (free software). It checks all the items in your list plus many others. Also, QA Distiller supports regex and you can use it to clean number-only segments for example: ^\P{L}*\d\P{L}*$ Examples of content to be cleaned: 1-22, [23], (3), 4+, !2, 3-3/3, ~2, 6.6.3, ^3*5…, 4:5, etc. ^\P{L}*\d\P{L}* .+... See more You can also try QA Distiller (free software). It checks all the items in your list plus many others. Also, QA Distiller supports regex and you can use it to clean number-only segments for example: ^\P{L}*\d\P{L}*$ Examples of content to be cleaned: 1-22, [23], (3), 4+, !2, 3-3/3, ~2, 6.6.3, ^3*5…, 4:5, etc. ^\P{L}*\d\P{L}* .+$ Examples of content to be cleaned: 4 ÷ 12mA, 245 rpm, 0 ÷ 100 % C.C.W. – passline, etc. I have processed a 221320 KB tmx just now. It took 15 minutes for QA Distiller to complete the task with all of your checks.
[Edited at 2022-03-21 16:35 GMT] ▲ Collapse | |
|
|
Crosscheck online | Mar 22, 2022 |
This is by far the best quality control tool I know. For a long time it has been free, but today it is pay-per-use: https://www.idioma.com/crosscheck | | | CafeTran on Mac Studio M1 Ultra | Mar 22, 2022 |
You can speed the QA process up with CafeTran on a Mac Studio M1 Ultra. Just open the TMX as a project. | | | Samuel Murray Netherlands Local time: 09:57 Member (2006) English to Afrikaans + ... Translate Toolkit | Mar 22, 2022 |
Lais Lewicki wrote: Remove duplicates/inconsistent translations Run a number check Run a spellcheck Check terminology using our termbase for that specific TM I haven't used the Translate Toolkit in a number of years, but their most recent update is from this year, so they appear to be alive still. The Translate Toolkit works by exporting matching segments to a separate file, then the user corrects whatever segments he wants to correct in that file, and then importing the export back into the original file. The advantage is that you're always working on a file that contains only the smaller subset of segments that match the particular check or search string. However, it's quite basic and requires a bit of commandline skill, and the entire process is done with PO files (so you need to convert to PO and you need to edit the files in a PO editor or a text editor). http://docs.translatehouse.org/projects/translate-toolkit/en/latest/installation.html Step 1 is to convert your TMX file to CSV (using a tool of your choosing). Then use the csv2po.py script to convert it to a PO file. At the very end, use po2tmx.py to generate a TMX file again. Unfortunately there is no tmx2po.py script. pofilter: this script exports segments based on quality check filters, e.g. number check, punctuation check, etc. pomerge: this script imports the export file back into the original file. pogrep: this script exports segments based on a search string (multiple search strings in a single query is possible). So, if I remember correctly, an example of a commend would be: pofilter.py -t startpunc "bigmomma.tmx.po" "export.po" which would create a file named export.po with all segments where the start punctuation of the source is different from the target. Before you use it on Windows, you must install some extra stuff (see installation guide), and create an environment where the EXE files will end up, e.g. C:\Users\YourName\Envs\myenvironment\Scripts. In a quick test, it refused to work on my CSV file that I exported directly from Excel. Also, it exports UTF8 without BOM (and doesn't tolerate a file if it contains a UTF8 BOM), so there is that too.
[Edited at 2022-03-22 10:09 GMT] | | | How to run terminology check in large TMs? | Apr 14, 2022 |
If you have big TM, it is better to distribute terms over several TMs. Usually, you can run the terms check for the separate elements in one TM. For example for a separate web site. In case you have split your TM into several parts, then there are 2 ways how to perform terminology checking in large TMs: | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » How to run terminology check in large TMs? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |