Pages in topic:   < [1 2 3 4 5 6 7 8 9 10 11] >
TMLookup
Thread poster: FarkasAndras

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
question re "Edit > Remove duplicates from database" function Jun 3, 2015

Hi András (you really need a little forum of your own here on Proz, for TMLookup and LF Aligner), anyway, I have a question:

If I do: Edit > Remove duplicates from database

It says: "Only currently displayed columns are taken into account when determining duplicates."

Let's say I have 4 identical TUs, where all 4 times the src + trgt is exactly the same, but they differ by the third column (which I use to display the source of my TMXs)

... See more
Hi András (you really need a little forum of your own here on Proz, for TMLookup and LF Aligner), anyway, I have a question:

If I do: Edit > Remove duplicates from database

It says: "Only currently displayed columns are taken into account when determining duplicates."

Let's say I have 4 identical TUs, where all 4 times the src + trgt is exactly the same, but they differ by the third column (which I use to display the source of my TMXs)

(my columns in TMLookuop are usually "nl", "en" and "source")

What I am wondering is if I run Edit > Remove duplicates from database, but display only two columns, which of the 4 identical TUs will TMLookup leave untouched? Does it base its decision on some kind of timestamp? and if so, derived from the TU from the original TMX file, or from the time I imported said TMX into my .db?
Collapse


 

FarkasAndras  Identity Verified
Local time: 01:48
English to Hungarian
+ ...
TOPIC STARTER
Time Jun 3, 2015

Michael Beijer wrote:

Hi András (you really need a little forum of your own here on Proz, for TMLookup and LF Aligner), anyway, I have a question:

If I do: Edit > Remove duplicates from database

It says: "Only currently displayed columns are taken into account when determining duplicates."

Let's say I have 4 identical TUs, where all 4 times the src + trgt is exactly the same, but they differ by the third column (which I use to display the source of my TMXs)

(my columns in TMLookuop are usually "nl", "en" and "source")

What I am wondering is if I run Edit > Remove duplicates from database, but display only two columns, which of the 4 identical TUs will TMLookup leave untouched? Does it base its decision on some kind of timestamp? and if so, derived from the TU from the original TMX file, or from the time I imported said TMX into my .db?



It is designed to leave the first of the duplicates it finds (the TU that was imported first).
If this is important to you, you might want to prepare a small sample DB and run a test. Also test my assertion that dupes are identified based on the columns shown in TMLookup. That's how it's supposed to work but I'm not sure I did much testing to make sure it doesn't misbehave.


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
whaddayasay? Jun 3, 2015

If I test your assertion that dupes are identified based on the columns shown in TMLookup (the moment I have some, yes, there's that word again, time), would you consider:

(1) Adding a function where are all TUs imported from TMXs include the creationdate, and
(2) Changing the dupe deleter to (delete all oldest and) leave the latest instance?

This would allow us to keep our TMLookup db up to date.


[Edited at 2015-06-04 09:56 GMT]


 

lprd027
Local time: 13:48
German to English
+ ...
Avast AV problems Jun 4, 2015

Michael Beijer wrote:
If Avast don't get back to me soon, I might experiment with a different AV (Kaspersky?).

[Edited at 2015-01-27 17:58 GMT]


If you want to try another AV I recommend ESET NOD32, which you can try out for free.
Also, I endorse your comment that TMLookup should have its own forum.

Lars Peter


 

FarkasAndras  Identity Verified
Local time: 01:48
English to Hungarian
+ ...
TOPIC STARTER
Forum Jun 4, 2015

Well, LF Aligner has its own forum on sourceforge. I try and steer people to it if they have LFA questions so that everything is in one place. It doesn't see much traffic at all.
TMLookup gets fewer questions and comments than LFA, some here, some via email. This thread is probably enough for it.

Re: Michael, nice try for a bargain but I don't particulary need the dedupe feature to be tested...
See more
Well, LF Aligner has its own forum on sourceforge. I try and steer people to it if they have LFA questions so that everything is in one place. It doesn't see much traffic at all.
TMLookup gets fewer questions and comments than LFA, some here, some via email. This thread is probably enough for it.

Re: Michael, nice try for a bargain but I don't particulary need the dedupe feature to be tested
(1) would be a bit of a pain to implement and it would increase the db size and complexity a bit. Very unlikely. I guess you could do it by converting your tmx to a tabbed file that includes the timestamp and importing it as a separate language.
(2) requires me to change exactly two characters in the code (MIN to MAX). I can do it and it will be in the next version, which might ship next week or next year. In most cases it won't matter because, well, they are dupes anyway i.e. the relevant content is the same.
Collapse


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
sorry, I got confused (we have a 1-month old baby in the house!) Jun 5, 2015


Michael Beijer wrote:
If I test your assertion that dupes are identified based on the columns shown in TMLookup (the moment I have some, yes, there's that word again, time), would you consider:

(1) Adding a function where are all TUs imported from TMXs include the creationdate, and
(2) Changing the dupe deleter to (delete all oldest and) leave the latest instance?



FarkasAndras wrote:

Re: Michael, nice try for a bargain but I don't particulary need the dedupe feature to be tested

It was worth a try

(1) would be a bit of a pain to implement and it would increase the db size and complexity a bit. Very unlikely. I guess you could do it by converting your tmx to a tabbed file that includes the timestamp and importing it as a separate language.

OK!

(2) requires me to change exactly two characters in the code (MIN to MAX). I can do it and it will be in the next version, which might ship next week or next year. In most cases it won't matter because, well, they are dupes anyway i.e. the relevant content is the same.


Hmm, I seem to have confused a few things.

Let me explain what I am trying to achieve by way of an example:

a TU ("hallo kat = ×") changes over time:

time: nl = en
mon.: hallo kat = hello cat
Tue.: hallo kat = hello pussycat
Wed.: hallo kat = hello feline friend


I want the dupe deleter in TMLookup to always remove all but the last version, so delete Mon. and Tue. in my example and leave only Wed.

Can it do this?

[Edited at 2015-06-05 14:03 GMT]


 

FarkasAndras  Identity Verified
Local time: 01:48
English to Hungarian
+ ...
TOPIC STARTER
Maybe Jun 5, 2015

Well, it's not designed to do that. It's designed to remove dupes where the text is the same in both (or all three, four etc.) languages selected.
If you have a metadata column in your db that is the same in all the TUs, you can probably do it by selecting NL and the metadata column for display and running a dedupe that way. Again, test on a small sample db first.

If you want to do advanced filtering/duplicate removal, you migh
... See more
Well, it's not designed to do that. It's designed to remove dupes where the text is the same in both (or all three, four etc.) languages selected.
If you have a metadata column in your db that is the same in all the TUs, you can probably do it by selecting NL and the metadata column for display and running a dedupe that way. Again, test on a small sample db first.

If you want to do advanced filtering/duplicate removal, you might be better off exporting your db, running the dedupe with some other software and then reimporting. I might be able to provide that "some other software" if needed.
Collapse


 

FarkasAndras  Identity Verified
Local time: 01:48
English to Hungarian
+ ...
TOPIC STARTER
What to do Dec 14, 2015

I haven't done much coding in recent months, apart from some minor tinkering (with TMLookup and LF Aligner). Now I'm in the mood for some more serious work, if a worthy and doable objective comes up. So, I'm open to input. Are people using TMLookup? Is it worth developing? I don't get download statistics from the website but I suspect the userbase is quite small. Still, if there is interest, I might fiddle with it. What features do users want?
There are two big outstanding issues: searchin
... See more
I haven't done much coding in recent months, apart from some minor tinkering (with TMLookup and LF Aligner). Now I'm in the mood for some more serious work, if a worthy and doable objective comes up. So, I'm open to input. Are people using TMLookup? Is it worth developing? I don't get download statistics from the website but I suspect the userbase is quite small. Still, if there is interest, I might fiddle with it. What features do users want?
There are two big outstanding issues: searching multiple dbs in parallel, and ensuring Win8/Win10 compatibility. The former is an ominous job that is so unappetizing that I've been putting it off since the start of the project. The second is something I will have to look into if the project is to keep going long term. If you want to help with Win10 testing, raise your hand.
Collapse


 

Milan Condak  Identity Verified
Local time: 01:48
English to Czech
Off-topic Virtaal Dec 15, 2015

FarkasAndras wrote:

What features do users want?


I am using TMLookup, see:

http://www.condak.cz/nove/2015-10/02/cs/04.html

but I am using Virtaal, too. I have in Virtaal .db some incorrect translations. I woud like edit the items in Virtaal .db.

Thank you for all your excellent tools.

Milan


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
I'm using it! Dec 15, 2015

FarkasAndras wrote:

I haven't done much coding in recent months, apart from some minor tinkering (with TMLookup and LF Aligner). Now I'm in the mood for some more serious work, if a worthy and doable objective comes up. So, I'm open to input. Are people using TMLookup? Is it worth developing? I don't get download statistics from the website but I suspect the userbase is quite small. Still, if there is interest, I might fiddle with it. What features do users want?
There are two big outstanding issues: searching multiple dbs in parallel, and ensuring Win8/Win10 compatibility. The former is an ominous job that is so unappetizing that I've been putting it off since the start of the project. The second is something I will have to look into if the project is to keep going long term. If you want to help with Win10 testing, raise your hand.


(1) I would love to be able to just point TMLookup at a specific folder of TMXs, and have it automatically index all TMXs contained therein. If I were then to delete certain TMXs from the folder, and re-run TMLookup, it would automatically remove the deleted TMXs from its index. Basically, what LogiTerm (which costs $795.00) can do. That would be amazing.

(2) My second big feature request relates to pretranslation. I would love to be able to use TMLookup to Pretranslate files, producing a TMX. Another of LogiTerm's tricks.

Michael


 

FarkasAndras  Identity Verified
Local time: 01:48
English to Hungarian
+ ...
TOPIC STARTER
Maybe Dec 15, 2015

Michael Beijer wrote:

(1) I would love to be able to just point TMLookup at a specific folder of TMXs, and have it automatically index all TMXs contained therein. If I were then to delete certain TMXs from the folder, and re-run TMLookup, it would automatically remove the deleted TMXs from its index. Basically, what LogiTerm (which costs $795.00) can do. That would be amazing.

That shouldn't be all that hard. I have already considered some sort of auto-import feature, although that was for one file (current project DB). I guess I could implement it as file/folder (whichever is specified in the setup file) that is auto-imported on startup. It would either be reimported on every startup or when the user deletes the auto-generated db file. Probably the latter. Making sure that languages don't get mixed up is a bit of an issue, esp. if there are different TMXes and TXTs/XLSes mixed together.
One reason why it got put off before is that it would make most sense after implementing multi-db support. However, with one-button db switching, which is already implemented, it could work reasonably well even without multi-db support. Currently, F1-F4 switches between user-defined DBs. The current project DB could just be assigned to F5.

Michael Beijer wrote:
(2) My second big feature request relates to pretranslation. I would love to be able to use TMLookup to Pretranslate files, producing a TMX. Another of LogiTerm's tricks.

Michael

I don't think that's likely. What's the idea? Read a translatable file sentence by sentence and export matching segments from the db into a "project TM" tmx? In any case, this would require a fuzzy matching algorithm to be useful, and TMLookup doesn't have one. It's not likely that it will get one in the future, either.


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
:) Dec 22, 2015

FarkasAndras wrote:

Michael Beijer wrote:

(1) I would love to be able to just point TMLookup at a specific folder of TMXs, and have it automatically index all TMXs contained therein. If I were then to delete certain TMXs from the folder, and re-run TMLookup, it would automatically remove the deleted TMXs from its index. Basically, what LogiTerm (which costs $795.00) can do. That would be amazing.

That shouldn't be all that hard. I have already considered some sort of auto-import feature, although that was for one file (current project DB). I guess I could implement it as file/folder (whichever is specified in the setup file) that is auto-imported on startup. It would either be reimported on every startup or when the user deletes the auto-generated db file. Probably the latter. Making sure that languages don't get mixed up is a bit of an issue, esp. if there are different TMXes and TXTs/XLSes mixed together.
One reason why it got put off before is that it would make most sense after implementing multi-db support. However, with one-button db switching, which is already implemented, it could work reasonably well even without multi-db support. Currently, F1-F4 switches between user-defined DBs. The current project DB could just be assigned to F5.


That would be very cool!

Michael Beijer wrote:
(2) My second big feature request relates to pretranslation. I would love to be able to use TMLookup to Pretranslate files, producing a TMX. Another of LogiTerm's tricks.

Michael

I don't think that's likely. What's the idea? Read a translatable file sentence by sentence and export matching segments from the db into a "project TM" tmx? In any case, this would require a fuzzy matching algorithm to be useful, and TMLookup doesn't have one. It's not likely that it will get one in the future, either.


Yes, pretty much that.
However, no biggie if you don't add it, because I can already achieve this in CafeTran, using my TMLookup .db, which is already great. Basically, I maintain a very big TMLookup .db, of ALL my TMXs, and can pretranslate documents in CafeTran by telling CafeTran to use my TMluukup .db as its database for pretranslation. To do so I use CafeTran's special Total Recall system, which produces a TMX (of my pretranslated doc) as you described.

Michael

PS: I don't want to jinx things, but I think there would be sufficient interest to warrant a paid version of TMLookup. Throw in a few TMX editing features, and you would have quite an amazing, and unique tool. Hell, while your're at it (ho ho ho), you could of course als integrate LFAligner, to create ... the Ultimate TMX Tool for Translators: LFAligner + TMlookup + TMX Editor/Cleaner/Maintainer

[Edited at 2015-12-22 23:27 GMT]


 

Michael Beijer  Identity Verified
United Kingdom
Local time: 00:48
Member (2009)
Dutch to English
+ ...
Removing duplicates + deleting entries with older timestamps Dec 23, 2015

And another one I have been wondering about: Removing duplicates. I know we already have:

Edit > Remove duplicates from database, but no matter what I do, I still seem to be left with tons of duplicates after running it. Even if I run it with only two columns displayed.

And on a related note: it would be great if I could clean the database of updates over time, by deleting entries with older timestamps.


 

2nl (X)  Identity Verified
Netherlands
Local time: 01:48
Maybe next year Dec 23, 2015

Michael Beijer wrote:

And on a related note: it would be great if I could clean the database of updates over time, by deleting entries with older timestamps.


I think that Igor has written that this is an option that he might consider to add. If I'm not mistaken. Anyway, it's a useful feature.


 

Erwin van Wouw  Identity Verified
Netherlands
Local time: 01:48
Member (2010)
English to Dutch
Me too Dec 23, 2015

I certainly use this great tool. I would like to second the request for an auto folder index function. Thanks.

Best regards,

Erwin

FarkasAndras wrote:

I have already considered some sort of auto-import feature, although that was for one file (current project DB). I guess I could implement it as file/folder (whichever is specified in the setup file) that is auto-imported on startup.


 
Pages in topic:   < [1 2 3 4 5 6 7 8 9 10 11] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

TMLookup

Advanced search







WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search