Tags spanning several segments / put everything between two strings in a single segment
Thread poster: CAK

CAK  Identity Verified
Germany
Local time: 01:02
English to German
+ ...
Aug 20, 2018

I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode.
... See more
I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode. Is my assumption correct that this is not possible? I'm not very good at using regular expressions, I'm afraid.

Alternatively I tried to create segmentation exceptions, with very little success. I managed to ignore the first period between two sentences after the opening bracket (or before the closing bracket, depending on using greedy or lazy matching and conditions don't seem to be supported) and had no success with line breaks whatsoever. Is this possible to do with segmentation rules at all?
Would it be possible with adjusting file filters?

[Edited at 2018-08-20 10:11 GMT]

[Edited at 2018-08-20 12:43 GMT]
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 01:02
Member (2006)
English to Afrikaans
+ ...
@CAK Aug 20, 2018

CAK wrote:
I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.


Do you mind changing the original document if the change is 100% reversible?

For example, if you were to replace all spaces that were between curly brackets with e.g. " ###" (space plus ###) (assuming that "###" does not occur anywhere else in your file), then at least you would be able to identify that text within OmegaT even though you'd still see it. For example, {The rain in Spain.} would become {The ###rain ###in ###Spain.}. Then afterwards you can just delete all ### from the target file.

What kind of a file is it -- is it a plain text file, or an MS Word file, or what? Do you have access to MS Word, by the way? Or, what kind of a text editor do you have?


 

CAK  Identity Verified
Germany
Local time: 01:02
English to German
+ ...
TOPIC STARTER
Title Aug 21, 2018

@ Samuel Murray

Thanks for your reply!
I do have MS Word and the files are of various origin. I'd prefer to not save in a text editor at all, since there are all kinds of Word clones out there with slight incompatibilities. As I understand it, OmegaT does resave the file, but doesn't touch the metadata and formatting at all or just minimally.
But then again if there is no other way, it would at least be helpful to know how to do it and even having just an optical marker
... See more
@ Samuel Murray

Thanks for your reply!
I do have MS Word and the files are of various origin. I'd prefer to not save in a text editor at all, since there are all kinds of Word clones out there with slight incompatibilities. As I understand it, OmegaT does resave the file, but doesn't touch the metadata and formatting at all or just minimally.
But then again if there is no other way, it would at least be helpful to know how to do it and even having just an optical marker isn't something I had thought about, so thanks for that.

I wonder If I could replace line breaks etc. in Word and be able to get the old formatting back without problems.
Collapse


 

Didier Briel  Identity Verified
France
Local time: 01:02
English to French
+ ...
Custom tags only work one segment at a time Aug 27, 2018

CAK wrote:

I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode. Is my assumption correct that this is not possible?

This is correct. Custom tags only work one segment at a time.

Alternatively I tried to create segmentation exceptions, with very little success. I managed to ignore the first period between two sentences after the opening bracket (or before the closing bracket, depending on using greedy or lazy matching and conditions don't seem to be supported) and had no success with line breaks whatsoever. Is this possible to do with segmentation rules at all?
Would it be possible with adjusting file filters?

What is considered a paragraph depends on the file filter. This cannot be changed by segmentation rules.
For some of the filters (e.g., the Text filter, the HTML filter or the OpenXML filter), you can use some options to change what starts a new paragraph.

Didier


 

CAK  Identity Verified
Germany
Local time: 01:02
English to German
+ ...
TOPIC STARTER
Title Aug 27, 2018

Thanks for the clarification, Didier!
I'll try to look into file filters, then.

[Edited at 2018-08-27 14:05 GMT]


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Tags spanning several segments / put everything between two strings in a single segment

Advanced search






WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search