CAT-Tool FEATURE REQUEST
Thread poster: mikhailo

mikhailo
Local time: 17:55
English to Russian
+ ...
Feb 7

For cases, when traditional TM-repair doesn't work or create unacceptable TM a new TM-RESTORE function (F) is proposed

All translators have a lot of project.

F must scan projects in user-defined folders and archives (with filters - by date, by lang pair, by text string, by customer etc) and get the list of project with checkboxes.
Having checked all needed (or after removing unnecessary) project F creates TM(s) from these project (automatically open them in backg
... See more
For cases, when traditional TM-repair doesn't work or create unacceptable TM a new TM-RESTORE function (F) is proposed

All translators have a lot of project.

F must scan projects in user-defined folders and archives (with filters - by date, by lang pair, by text string, by customer etc) and get the list of project with checkboxes.
Having checked all needed (or after removing unnecessary) project F creates TM(s) from these project (automatically open them in background, send all (or project defined) segments to TM with some options (do not allow duplicate, rewrite old with new, other filters - different TM for different lang, different TM for different customers etc), close and goto next).

What community and SW-devs think about.
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:55
Member (2006)
English to Afrikaans
+ ...
@Mikhailo Feb 7

mikhailo wrote:
For cases when traditional TM-repair doesn't work or create unacceptable TMs, a new TM-RESTORE function is proposed.


If I understand correctly, you're asking for a utility that will create a TM by gathering segments from translation projects selected by the user. You're calling this hypothetical utility "TM-RESTORE". Despite its name, this proposed utility will not *restore* any TMs but will re-create TMs from old projects. Am I also correct in thinking that you're using the term "TM-repair" to mean TM editing?

All translators have a lot of projects.


That is true, but translators save their projects in a variety of ways. Were you aiming at translators using a specific project setup? Were you thinking of translators using a specific CAT tool?

I think you're saying:
The proposed TM-RESTORE utility would scan projects in folders and zip files specified by the user, and create a list of projects. The user would then select which projects to be processed, and TM-RESTORE would create a TM or TMs from those projects.
Users should be able to define which segments are sent to the TM, and specify other options, e.g. do not allow duplicate, rewrite old with new, create different TMs for different languages, different TMs for different customers etc.

What community and software developers think about this idea?


It would appear that you're trying to automate something that would involve many manual tasks anyway. Why not simply create the TM or TMs using your CAT tool's usual way of creating TMs?



[Edited at 2019-02-07 14:21 GMT]


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 14:55
Member (2009)
Dutch to English
+ ...
If I understood you correctly, I think this can be achieved with CafeTran's Total Recall system... Feb 7

mikhailo wrote:

For cases, when traditional TM-repair doesn't work or create unacceptable TM a new TM-RESTORE function (F) is proposed

All translators have a lot of project.

F must scan projects in user-defined folders and archives (with filters - by date, by lang pair, by text string, by customer etc) and get the list of project with checkboxes.
Having checked all needed (or after removing unnecessary) project F creates TM(s) from these project (automatically open them in background, send all (or project defined) segments to TM with some options (do not allow duplicate, rewrite old with new, other filters - different TM for different lang, different TM for different customers etc), close and goto next).

What community and SW-devs think about.


As follows:

1. at the end of every project, save project to Total Recall database (SQlite), remembering to first add project metadata (client, subject, etc.)
2. when you have a new project, retrieve from Total Recall, either:
a. only those segments CafeTran deems useful for your project (determined statistically by CT), i.e. based on how many words are similar
b. use filter (specific clients, subjects, etc.) to extract portion of Total Recall database to temporary project TM

Michael

some info: https://cafetran.freshdesk.com/support/solutions/folders/6000058183


 

mikhailo
Local time: 17:55
English to Russian
+ ...
TOPIC STARTER
re Feb 7

Michael Beijer wrote:
As follows:

1. at the end of every project, save project to Total Recall database (SQlite), remembering to first add project metadata (client, subject, etc.)
2. when you have a new project, retrieve from Total Recall, either:
a. only those segments CafeTran deems useful for your project (determined statistically by CT), i.e. based on how many words are similar
b. use filter (specific clients, subjects, etc.) to extract portion of Total Recall database to temporary project TM

Michael

some info: https://cafetran.freshdesk.com/support/solutions/folders/6000058183


It seems this is rather extracting a set of segs from big-mother-TM... This can be done in any CAT. But if TR DB fails?
It's just an idea for future CAT developments.
And what to do with >2gb projects? Does Cafetran work with X GB files without any problem?

Another idea — save project segs to TMX directly (for users of other cats, if it differs from translator's favorite one)

Samuel Murray wrote:
If I understand correctly, you're asking for a utility that will create a TM by gathering segments from translation projects selected by the user. You're calling this hypothetical utility "TM-RESTORE". Despite its name, this proposed utility will not *restore* any TMs but will re-create TMs from old projects. Am I also correct in thinking that you're using the term "TM-repair" to mean TM editing?

recreate - correct
repair — things that CATs make to restore damaged TMs. Each CAT has own TMs and makes something at own will.

Samuel Murray wrote:
That is true, but translators save their projects in a variety of ways. Were you thinking of translators using a specific CAT tool?

I think, most of translators use TS20xx. And all translators work in a favorite CAT. Am I wrong?

Samuel Murray wrote:
Why not simply create the TM or TMs using your CAT tool's usual way of creating TMs?

Any simple thing to be done a lot of times becomes difficult to impossible, doesn't it? How much time does You spend for 100 projects, for 300 or even more..... Open, select all segs, send to TM, close, open.......... Dumb job....


 

DZiW
Ukraine
English to Russian
+ ...
not F Feb 7

With what exactly in the mind: Background TMs? Multi-direction pre-translation? Multi-concordance? Post-post edit? Too multi-vague.

Right now I see no use for a freelance translator


 

mikhailo
Local time: 17:55
English to Russian
+ ...
TOPIC STARTER
re Feb 7

DZiW wrote:

With what exactly in the mind: Background TMs? Multi-direction pre-translation? Multi-concordance? Post-post edit? Too multi-vague.

Right now I see no use for a freelance translator



1. Recreating TMs in background.
???
???
???
You TMs is big-mom-TM is hardly damaged. You need to recreate it from scratch.


 

Natalie  Identity Verified
Poland
Local time: 15:55
Member (2002)
English to Russian
+ ...

Moderator of this forum
Isn't it better to keep backup copies of your TMs... Feb 7

mikhailo wrote:
For cases, when traditional TM-repair doesn't work or create unacceptable TM...


...instead of re-creating them from scratch?


 

Rodolfo Raya  Identity Verified
Local time: 11:55
English to Spanish
Backup in TMX format Feb 8

and use the TMX files to feed any translation tool if the one you use gets damaged.

 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:55
Member (2006)
English to Afrikaans
+ ...
SDLXLIFF 2 TMX converter, then Feb 8

mikhailo wrote:
Samuel Murray wrote:
Were you thinking of translators using a specific CAT tool?

I think, most of translators use TS20xx. ... Am I wrong?


At first I thought you meant Cypresoft's TS2000, which almost no-one uses anymore, but while writing my reply it dawned on me that you might be referring to Trados Studio. Well, yes, some translators do use Trados Studio.

So... you're looking for a utility that can extract segments from SDLXLIFF files (and possibly project packages as well) and add them to a TM (would TMX be okay?). And the reason you're looking for such a utility is that it is very cumbersome to do that inside Trados itself, right?

There are two tools in the Trados App Store that may be of use to you:
https://appstore.sdl.com/language/app/sdlxliff2tmx/125/
https://appstore.sdl.com/language/app/sdltm-repair/298/


[Edited at 2019-02-08 09:15 GMT]


 

DZiW
Ukraine
English to Russian
+ ...
Backup + shared/background TMs Feb 8

Considering
You spend for 100 projects, for 300 or even more..... Open, select all segs, send to TM, close, open.......... Dumb job....
I believe that the OP is talking about a process automation or a specialized utility.

While I occasionally work with Trados Freelance too and use a free GlossaryConverter tool, I still don't see how such an F may be useful. However, some more experienced users might find it like that.


 

mikhailo
Local time: 17:55
English to Russian
+ ...
TOPIC STARTER
re Feb 8

Natalie wrote:
...instead of re-creating them from scratch?


The same can be said for TM repair function of CATs.

Rodolfo Raya wrote:
and use the TMX files to feed any translation tool if the one you use gets damaged.


With TMX there is a problem of tags and their different interpretation in CATs.

Samuel Murray wrote:
And the reason you're looking for such a utility is that it is very cumbersome to do that inside Trados itself, right?


This can be said for any CAT, that stores TM in standalone file or DB.... (except Transit)


 

Samuel Murray  Identity Verified
Netherlands
Local time: 15:55
Member (2006)
English to Afrikaans
+ ...
@Mikhailo Feb 8

mikhailo wrote:
Samuel Murray wrote:
And the reason you're looking for such a utility is that it is very cumbersome to do that inside Trados itself, right?

This can be said for any CAT, that stores TM in standalone file or DB...


Well, no, we can't say that for certain unless we examine each tool.

Trados is somewhat cumbersome in this respect due to the fact that you can't add files to its SDLXLIFF import dialog by drag and drop. This means that all SDLXLIFF files must first be copied into a single folder. However, once you've copied the SDLXLIFF files to a single folder, you can add that folder to the TM import dialog, and it will run through all of the SDLXLIFF files. It's not super fast, though.

I ran a test with 1000 SDLXLIFF files totalling 2.3 GB in size, and it took Trados about 45 minutes to import 53 000 segments. Final TM size: 78 MB. 53 000 seems very little, so I'm not confident that Trados had imported all the segments (or perhaps it did not import duplicate segments). Anyway, each segment was accompanied by the user ID and date/time of segment creation, although the file names were not retained.

In this Trados test, the import speed did not decline over the course of the import -- the last files to be imported took just as long to import as the first files. One downside is that you can't break the import process halfway: if you "Cancel" midway, the entire import operation files, so it may be an idea to import only 100 files at a time.

I tried the same 1000 SDLXLIFF files with the SDLXliff2TMX utility that I linked to in a previous post. This utility does support drag and drop, so the SDLXLIFF files don't need to be all in the same folder. It also offers more options w.r.t. what you want to filter. It took 10 minutes and extracted 460 000 segments (presumably duplicate segments are retained). As with the Trados process, this utility retained user ID and date/time for each segment, but not the file name.

I would recommend that you try the SDLXliff2TMX utility. It outputs TMX, but it appears to be a TMX variant that retains the tags in a way that Trados will have retained the tags as well if you convert it to SDLTM format.

By the way, I tried a similar thing in Wordfast Pro 3, with 1200 TXML files totalling 300 MB (since TXML files do not contain the output files inside them). It took 10 minutes, and got me 370 000 segments (duplicate segments retained). Final TM size: 175 MB. Wordfast added the file name to each segment, but it wrote the same user ID for all segments and wrote the same date for all segments.


[Edited at 2019-02-08 12:56 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

CAT-Tool FEATURE REQUEST

Advanced search







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
PerfectIt consistency checker
Faster Checking, Greater Accuracy

PerfectIt helps deliver error-free documents. It improves consistency, ensures quality and helps to enforce style guides. It’s a powerful tool for pro users, and comes with the assurance of a 30-day money back guarantee.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search