TMXMerger - any way to see source filenames for merged TMs? Thread poster: Mercer
|
Hi, when merging a large number of small .tmx files into a larger one using TMXMerger, is there any way after that to see from which file that segment was originally from? I merged together TMX files were that were originally created from individual texts using LF Aligner. The files were merged together since OmegaT was running out of memory trying to load them all individually and seems to have an easier time when they're grouped in a few large files. I am not seeing h... See more Hi, when merging a large number of small .tmx files into a larger one using TMXMerger, is there any way after that to see from which file that segment was originally from? I merged together TMX files were that were originally created from individual texts using LF Aligner. The files were merged together since OmegaT was running out of memory trying to load them all individually and seems to have an easier time when they're grouped in a few large files. I am not seeing how that would be possible, but is there a way to do it? ▲ Collapse | | | Didier Briel France Local time: 08:21 English to French + ... Not without modifying the source code | Nov 5, 2013 |
Mercer wrote: Hi, when merging a large number of small .tmx files into a larger one using TMXMerger, is there any way after that to see from which file that segment was originally from? Without modifying the source code, I don't think so. I merged together TMX files were that were originally created from individual texts using LF Aligner. The files were merged together since OmegaT was running out of memory trying to load them all individually and seems to have an easier time when they're grouped in a few large files. Have you tried increasing the memory allocated to OmegaT? Didier | | | Samuel Murray Netherlands Local time: 08:21 Member (2006) English to Afrikaans + ... Only if you can pollute the original segments | Nov 5, 2013 |
Mercer wrote: Hi, when merging a large number of small TMX files into a larger one using TMXMerger, is there any way after that to see from which file that segment was originally from? No, not with TMXMerger. But if you can edit the original TMX files themselves, then you can add short codes to either the source or the target text of each segment, which will help you identify the origin when you see it in OmegaT's fuzzy match pane. If it is not simple enough to edit your TMX files, then it may be possible to edit the segments before or during the alignment (I have no idea whether LF Aligner allows you to edit segments before it creates the final TMX file). I don't use OmegaT often, but in my own CAT tool I often do this: I add e.g. [ENG] to the start of each segment's source text, if e.g. that segment came from an engineering text. This reduces the match percentage, though. If you don't want to reduce match percentages, you can add e.g. [ENG] to the start of each segment's target text, but then you run the risk that some of these tags will end up in your translation without you noticing it. If you do do this (i.e. add the tag to the target text), then I recommend that you add a custom "tag" in your tag validation settings. In OmegaT, go to Options > Tag Validation, and in the "Regular expression for custom tags" field, type this (without the spaces): \ [ . + ? \ ] When you do tag validation, segments with such left-over tags will be reported. | | | Didier Briel France Local time: 08:21 English to French + ... You could edit changeID or creationID instead | Nov 5, 2013 |
Samuel Murray wrote: But if you can edit the original TMX files themselves, then you can add short codes to either the source or the target text of each segment, which will help you identify the origin when you see it in OmegaT's fuzzy match pane. If it is not simple enough to edit your TMX files, then it may be possible to edit the segments before or during the alignment (I have no idea whether LF Aligner allows you to edit segments before it creates the final TMX file). I don't use OmegaT often, but in my own CAT tool I often do this: I add e.g. [ENG] to the start of each segment's source text, if e.g. that segment came from an engineering text. This reduces the match percentage, though. If you don't want to reduce match percentages, you can add e.g. [ENG] to the start of each segment's target text, but then you run the risk that some of these tags will end up in your translation without you noticing it. You could edit changeID or creationID instead. That way, you do not change the segment itself, and you can display the origin in the Fuzzy Matches pane. Didier | |
|
|
add separate field | Nov 5, 2013 |
Samuel Murray wrote: Mercer wrote: Hi, when merging a large number of small TMX files into a larger one using TMXMerger, is there any way after that to see from which file that segment was originally from? No, not with TMXMerger. But if you can edit the original TMX files themselves, then you can add short codes to either the source or the target text of each segment, which will help you identify the origin when you see it in OmegaT's fuzzy match pane. If it is not simple enough to edit your TMX files, then it may be possible to edit the segments before or during the alignment (I have no idea whether LF Aligner allows you to edit segments before it creates the final TMX file). I don't use OmegaT often, but in my own CAT tool I often do this: I add e.g. [ENG] to the start of each segment's source text, if e.g. that segment came from an engineering text. This reduces the match percentage, though. If you don't want to reduce match percentages, you can add e.g. [ENG] to the start of each segment's target text, but then you run the risk that some of these tags will end up in your translation without you noticing it. If you do do this (i.e. add the tag to the target text), then I recommend that you add a custom "tag" in your tag validation settings. In OmegaT, go to Options > Tag Validation, and in the "Regular expression for custom tags" field, type this (without the spaces): \ [ . + ? \ ] When you do tag validation, segments with such left-over tags will be reported. Why not add a text field? That will show up in your CAT and will not affect the matches you get. LF Aligner can do this (you can type in the 'Note' when you generate the TMX and it adds it to every TU). If the files are not generated by alignment, you need to add this between the opening tu tag and the opening tuv tag: <prop type="Txt::Source">this is where you put the source ID</prop> Maybe it's possible to just add this to the header instead of adding it to every TU, I don't know.
[Edited at 2013-11-05 09:26 GMT] | | | Samuel Murray Netherlands Local time: 08:21 Member (2006) English to Afrikaans + ...
Didier Briel wrote: You could edit changeID or creationID instead. That way, you do not change the segment itself, and you can display the origin in the Fuzzy Matches pane. That's a good idea. If you know how to do that, that would be a good place to put the marker. FarkasAndras wrote: Why not add a text field? That will show up in your CAT and will not affect the matches you get. LF Aligner can do this (you can type in the 'Note' when you generate the TMX and it adds it to every TU). I'm glad to know that LF Aligner has the ability to do this. | | | Thanks for the answers | Nov 5, 2013 |
Thank you for the answers, I will try these options and give an update. One of the computer it has to work on has very little RAM, so giving more memory to OmegaT was not an option. Didier Briel wrote: You could edit changeID or creationID instead. That way, you do not change the segment itself, and you can display the origin in the Fuzzy Matches pane. Thanks, it is a good idea, I am new to this and was not aware that the information showed in the OmegaT fuzzy match pane could be easily modified. I have tried now and it looks like this could work, but TMXMerger seems to get rid of all ID tags and notes, so I will try to see if there are other ways to merge the files. FarkasAndras wrote: Why not add a text field? That will show up in your CAT and will not affect the matches you get. LF Aligner can do this (you can type in the 'Note' when you generate the TMX and it adds it to every TU). If the files are not generated by alignment, you need to add this between the opening tu tag and the opening tuv tag: this is where you put the source ID Maybe it's possible to just add this to the header instead of adding it to every TU, I don't know.
[Edited at 2013-11-05 09:26 GMT] Does the LF Aligner batch mode fill the note field automatically? After it is filled how would I get it to show in OmegaT? | | | Didier Briel France Local time: 08:21 English to French + ... Configuration of the fuzzy match pane | Nov 5, 2013 |
Thanks, it is a good idea, I am new to this and was not aware that the information showed in the OmegaT fuzzy match pane could be easily modified. I have tried now and it looks like this could work, but TMXMerger seems to get rid of all ID tags and notes, so I will try to see if there are other ways to merge the files. As far as I can see in the source code, changeID is preserved, as well as changeDate. Does the LF Aligner batch mode fill the note field automatically? After it is filled how would I get it to show in OmegaT? See https://sourceforge.net/p/omegat/feature-requests/598/ Didier | |
|
|
play around with it | Nov 5, 2013 |
Mercer wrote: Thank you for the answers, I will try these options and give an update. One of the computer it has to work on has very little RAM, so giving more memory to OmegaT was not an option. Didier Briel wrote: You could edit changeID or creationID instead. That way, you do not change the segment itself, and you can display the origin in the Fuzzy Matches pane. Thanks, it is a good idea, I am new to this and was not aware that the information showed in the OmegaT fuzzy match pane could be easily modified. I have tried now and it looks like this could work, but TMXMerger seems to get rid of all ID tags and notes, so I will try to see if there are other ways to merge the files. FarkasAndras wrote: Why not add a text field? That will show up in your CAT and will not affect the matches you get. LF Aligner can do this (you can type in the 'Note' when you generate the TMX and it adds it to every TU). If the files are not generated by alignment, you need to add this between the opening tu tag and the opening tuv tag: this is where you put the source ID Maybe it's possible to just add this to the header instead of adding it to every TU, I don't know.
[Edited at 2013-11-05 09:26 GMT] Does the LF Aligner batch mode fill the note field automatically? After it is filled how would I get it to show in OmegaT? If you created the files with LF Aligner with default settings, then they should all have a Note field containing the name of the input files (such as Englishfile.doc_Frenchfile.doc). Open one of the tmx files with a text editor and see if it has a prop type="Txt::Note" field. Then open the merged tmx and see if the note field is still there (TMXMerger may have removed it). If it's there in the merged tmx, you should be able to see it in OmegaT. If you're aligning a bunch of files from scratch using the LF Aligner batch mode in V 4.04, specify an output file with --outfile="path\file.txt". Add that to every command and you'll get a single tab delimited file with all the texts in it and the source file names in the third column. You can use search and replace in a text editor to change the text if you want to. Then run the TMX maker with default settings on that file and you should get a single TMX file with all your stuff in it and correct 'Note' fields added to each TU. Check the tmx in a text editor before importing to make sure. Editing the CreationID is also a reasonable option, but if you already have TMX files with the Note fields from LF Aligner, it should be easier to just use them. This is what the Note field is for (so that you can see which source file the TM hit came from). | | |
Thanks for the link, very useful. Is there also a way to configure the search window to show notes, or only the match pane? FarkasAndras wrote: If you created the files with LF Aligner with default settings, then they should all have a Note field containing the name of the input files (such as Englishfile.doc_Frenchfile.doc). Open one of the tmx files with a text editor and see if it has a prop type="Txt::Note" field. Then open the merged tmx and see if the note field is still there (TMXMerger may have removed it). If it's there in the merged tmx, you should be able to see it in OmegaT. If you're aligning a bunch of files from scratch using the LF Aligner batch mode in V 4.04, specify an output file with --outfile="path\file.txt". Add that to every command and you'll get a single tab delimited file with all the texts in it and the source file names in the third column. You can use search and replace in a text editor to change the text if you want to. Then run the TMX maker with default settings on that file and you should get a single TMX file with all your stuff in it and correct 'Note' fields added to each TU. Check the tmx in a text editor before importing to make sure. Editing the CreationID is also a reasonable option, but if you already have TMX files with the Note fields from LF Aligner, it should be easier to just use them. This is what the Note field is for (so that you can see which source file the TM hit came from). Thanks, I ended up using the note field. I am not sure why I was losing it and the other metadata fields when using TMXMerger, but that explains why the merged files ended up being significantly smaller than the size of all the individual files. I merged the existing TMX files using a text editor to keep the note field, and it now shows in the OmegaT match pane by following the instructions Didier posted. I am happy this works, thanks! | | | Didier Briel France Local time: 08:21 English to French + ... Only the match pane | Nov 6, 2013 |
Mercer wrote: Thanks for the link, very useful. Is there also a way to configure the search window to show notes, or only the match pane? Only the match pane. Didier | | | There is no moderator assigned specifically to this forum. To report site rules violations or get help, please contact site staff » TMXMerger - any way to see source filenames for merged TMs? TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |