Segmentation rule to deal with quote marks
Thread poster: Thijs Vissia

Thijs Vissia
Netherlands
Jan 8

I'm translating from English to Dutch, and I notice that when a sentence ends on a quote mark (either curly quotes or straight), with the period sitting between quotes right before the end quote (i.e. "This is a sentence." or “This is a sentence.”), the segmentation script considers it a single segment, even though the quote ends. (Instead of a situation where multiple sentences are quoted, where I might want the quote to remain in a single segment.)

I was wondering if anyone co
... See more
I'm translating from English to Dutch, and I notice that when a sentence ends on a quote mark (either curly quotes or straight), with the period sitting between quotes right before the end quote (i.e. "This is a sentence." or “This is a sentence.”), the segmentation script considers it a single segment, even though the quote ends. (Instead of a situation where multiple sentences are quoted, where I might want the quote to remain in a single segment.)

I was wondering if anyone could help me to figure out the break or exception rule for this, so that a segment separates after the quote, instead of after a full stop.

I've tried it with \.\" (in the Before field) and \s (in the After field), as well as the same with curly quotes, but both don't seem to have the desired effect.

If there's anywhere that offers more guidance about how to construct the rules, I'd also be interested in that, the manual is a bit sparse I thought.

I'm also not sure how to distinguish between a break rule and an exception (no break) rule, there's a checkmark in the Segmentation dialog but I'm not sure what the checkmark means - if it doesn't mean "select this rule for further operations". How does the dialog allow me to set a break or a no break rule?

Thanks for any help!
Collapse


 

Samuel Murray  Identity Verified
Netherlands
Local time: 19:56
Member (2006)
English to Afrikaans
+ ...
SITE LOCALIZER
@Thijs Jan 8

I suggest you re-ask your question here:
https://sourceforge.net/projects/omegat/lists/omegat-users
...since segmentation rules are perhaps more geeky than most other issues.


 

esperantisto  Identity Verified
Local time: 21:56
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Show it! Jan 8

Thijs Vissia wrote:


I've tried it with \.\" (in the Before field) and \s (in the After field), as well as the same with curly quotes, but both don't seem to have the desired effect.


For starters, ", and are three different symbols, thus \.\" won’t work for .

In order to understand why your attempts failed, share:


  • a short sample file/sample project;

  • your segmentation rules (i. e. your segmentation.conf).



Thijs Vissia wrote:
there's a checkmark in the Segmentation dialog but I'm not sure what the checkmark means - if it doesn't mean "select this rule for further operations". How does the dialog allow me to set a break or a no break rule?


Every rule set in the dialog is applied. The checkmark when ticked means that the rule makes a break. If not ticked, the rule is a joiner.

[Edited at 2020-01-08 14:37 GMT]


 

tcordonniery
France
Local time: 19:56
Break rules and exceptions Jan 8

Thijs Vissia wrote:
I've tried it with \.\" (in the Before field) and \s (in the After field), as well as the same with curly quotes, but both don't seem to have the desired effect.


That is correct, except that, as others said, \" does not cover character ”
Try with \.[\"”] instead
Ensure also that this rule appears before the rule with before = \. and after = \s ; Rules order is important because the segmenter will apply them in the order they appear, and once a rule affects a location in your phrase, no rule can affect the same location anymore.

Other option I would test: before = \. and after = [\"”], but then it is an exception, not a rule. See why in the following.

Thijs Vissia wrote:
I'm also not sure how to distinguish between a break rule and an exception (no break) rule,


An exception means that we do not want to cut, even if the rules which follow the given one say that we should.
Example:
1. Normally after a dot, we want to cut. This is a break rule which is usually at the end of the rules set.
2. However, after "Mr." you should not cut because this is an abbreviation
(example: Mr. Smith said that... ==> if we only apply rule 1, the segmenter will cut aftert the dot; the exception prevents that, but only if exception is declared before the rule)

esperantisto wrote:
Every rule set in the dialog is applied. The checkmark when ticked means that the rule makes a break. If not ticked, the rule is a joiner.


Really? Contrary of a break rule is an exception (a location where you do not want to break). I don't see what a joiner is, since segmentation rules are only used to cut segments (when you want to join them, the rule seems to be always using spaces between joined segments).


 

Thijs Vissia
Netherlands
TOPIC STARTER
Thank you Jan 8

Thanks for your responses.

I later gathered that the checkmark in front of each rule would mean an exception, or a "no-break rule". But I'll look into that further and look at the manual again, I'm not as focussed right now.

tcordonniery,
Thanks for your rewritten rule, that does appear to do the job. In my current (in fact, the default) English language segmentation rules, there doesn't seem to be a rule that this one should appear before, as you write ("the rule
... See more
Thanks for your responses.

I later gathered that the checkmark in front of each rule would mean an exception, or a "no-break rule". But I'll look into that further and look at the manual again, I'm not as focussed right now.

tcordonniery,
Thanks for your rewritten rule, that does appear to do the job. In my current (in fact, the default) English language segmentation rules, there doesn't seem to be a rule that this one should appear before, as you write ("the rule with before = \. and after = \s" isn't to be found, though this seems a little odd.)

In any case, the segment breaks now appear where I want them, hoping it doesn't cause any adverse results elsewhere.

Many thanks for the help, all!

Oh, another related question I just thought of:

I supposed that these segmentation rules are set for OmegaT as a whole, and not per project. How do I prevent such changes from messing up other translation projects, since these would presumably be re-segmented upon opening them? Does that mean I lose the target segments (when I open the project, or when I save it)? Or not?
Collapse


 

esperantisto  Identity Verified
Local time: 21:56
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
Project-specific rules Jan 9

Thijs Vissia wrote:

("the rule with before = \. and after = \s" isn't to be found, though this seems a little odd.)


It is a rule for very many languages. Thus, it is not language-specific and can be found under the Default rule set.

Thijs Vissia wrote:

How do I prevent such changes from messing up other translation projects, since these would presumably be re-segmented upon opening them?


Two options basically:

1. (Simple) Create a project-specific rule set: Project properties → Segmentation → Make segmentation rules project-specific.
2. (Not so simple) Create a separate user profile and start OmegaT to use it when you want to work with specific projects: 7. Starting OmegaT from the command line .


 

Thijs Vissia
Netherlands
TOPIC STARTER
Cheers Jan 9

esperantisto wrote:

Thijs Vissia wrote:

("the rule with before = \. and after = \s" isn't to be found, though this seems a little odd.)


It is a rule for very many languages. Thus, it is not language-specific and can be found under the Default rule set.

Thijs Vissia wrote:

How do I prevent such changes from messing up other translation projects, since these would presumably be re-segmented upon opening them?


Two options basically:

1. (Simple) Create a project-specific rule set: Project properties → Segmentation → Make segmentation rules project-specific.
2. (Not so simple) Create a separate user profile and start OmegaT to use it when you want to work with specific projects: 7. Starting OmegaT from the command line .


Great, thank you very much.

(The first part of your answer I figured out after posting, it's there as before:"\.\?\!" and after:"\s" in the Default rules, if I'm not mistaken.)


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Segmentation rule to deal with quote marks

Advanced search






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »
SDL Trados Business Manager Lite
Create customer quotes and invoices from within SDL Trados Studio

SDL Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search