OmegaT segmentation rules - splitting or merging segments in OmegaT
Thread poster: Souni
Souni
Souni
Local time: 14:27
German to English
+ ...
Jan 9, 2010

Hi,

I use OmegaT for Mac, and I would very much like to know if there is a way to split or merge segments in a source text once it has been imported/dragged and dropped into the source folder.

If there's a simple way to explain, I would also like to know what these segmentation rules refer to. I am using mostly German to English, and I activate the sentence segmentation. But every time I try to take a more active approach to my segmentation requirements, I retreat in ba
... See more
Hi,

I use OmegaT for Mac, and I would very much like to know if there is a way to split or merge segments in a source text once it has been imported/dragged and dropped into the source folder.

If there's a simple way to explain, I would also like to know what these segmentation rules refer to. I am using mostly German to English, and I activate the sentence segmentation. But every time I try to take a more active approach to my segmentation requirements, I retreat in baffled impotence! What are these exceptions? Do they mean that they will exceptionally segment, or that they will exceptionally not segment? Can anyone explain?

Thanks!

Souni
Collapse


 
Dragomir Kovacevic
Dragomir Kovacevic  Identity Verified
Italy
Local time: 14:27
Italian to Serbian
+ ...
seg. rules refer to... Jan 10, 2010

... splitting paragraph into sentences.

Expception means that you don't want a break in the middle of a sentence just because there is a word, an abbreviation: Mr. | Prof. You would naturally need a fluid sentence with these abbreviations in it, not being cut at it.

The rules for most elementary punctuation marks and abbreviations, work on the principle: mark + space. In case you have a sentence like this: "Today the sun is shinning.I'll practice some jogging" - it won'
... See more
... splitting paragraph into sentences.

Expception means that you don't want a break in the middle of a sentence just because there is a word, an abbreviation: Mr. | Prof. You would naturally need a fluid sentence with these abbreviations in it, not being cut at it.

The rules for most elementary punctuation marks and abbreviations, work on the principle: mark + space. In case you have a sentence like this: "Today the sun is shinning.I'll practice some jogging" - it won't be split, since there is no space between the two.

The most elementary punctuation mark used for splitting par. into segments: . | ! | ? | : | ; | even you can put a comma. You can examine them in the default line of seg rules in Options menu.

For German, you will find many abbreviations already present. Interruption/Exception with no ticking, means that the rule will not be used. Example: Abb\. After there is a space: \s.
In case you tick the rule, the phrase will be broken after the said word, and that is what you don't want to obtain.

Dragomir

Souni wrote:

Hi,

I use OmegaT for Mac, and I would very much like to know if there is a way to split or merge segments in a source text once it has been imported/dragged and dropped into the source folder.

If there's a simple way to explain, I would also like to know what these segmentation rules refer to. I am using mostly German to English, and I activate the sentence segmentation. But every time I try to take a more active approach to my segmentation requirements, I retreat in baffled impotence! What are these exceptions? Do they mean that they will exceptionally segment, or that they will exceptionally not segment? Can anyone explain?

Thanks!

Souni


[Edited at 2010-01-10 08:14 GMT]
Collapse


 
Vito Smolej
Vito Smolej
Germany
Local time: 14:27
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
about segmentation rules in OmegaT Jan 10, 2010

Souni wrote:
If there's a simple way to explain, I would also like to know what these segmentation rules refer to. I

As Dragomir already indicated, you have to deal with
a) a given situation, found in the stream of characters, forming the text
b) what to do at that point, split the text at that point or make an exception from a more general rule (i.e. NOT split the text).

Example: in case of "?????Dr.???? " for a) we normally would NOT want to split after this period, i.e. we need an exception from the more general rule of "split after a period and before a whitespace character" (blank, tab etc...)

The Break/Exception check box in the segmentation rules window determines whether it is a break rule (check box set) or an exception rule (check box unset).

See more in the documentation (chapter Source segmentation *).

Please note that rules are there for all the segments: you can not make a special rule that would be valid for just one specific case in the source and not for the rest. Changing or expanding the rules thus changes the whole ballgame: the input text may after a change be structured in a quite different fashion and the segments in the translation memory, you may have collected before, may not fit anymore - one of the reasons for orphan segments for instance; they are in the TM, but nowhere to be found in the source text under the new rules.

The default rules - language-specific, as you may have noticed- are an evolutionary product. This means, that there's always room for improvement. If you are missing a specific case for German, tell us about it - either here or in the Yahoo OmegaT group.

Hih

Regards

smo

* I would appreciate to hear from you about the indicated chapter, as regards its contents, legibility etc. See the PDF file in the Documentation section of Files in the OmegaT Yahoo Thread:

http://tech.groups.yahoo.com/group/OmegaT/files


 
traductorchile
traductorchile  Identity Verified
Chile
Local time: 08:27
English to Spanish
+ ...
Hope you can help Vito Feb 15, 2012

I have a text with lists of short sentences, i.e.:

Jack went up the hill
Jill didn't follow him,
Jack got in a fuss
Jill had a laugh
Jack came tumbling down.

Default segmentation rules consider these five lines as one segment. I tried to create segmentation exceptions as: /end of line...........[A-Z] and then disallow it (I understand end of line = n or r). But it doesn't work, probably because I have activated segmentation by sentence.
Wh
... See more
I have a text with lists of short sentences, i.e.:

Jack went up the hill
Jill didn't follow him,
Jack got in a fuss
Jill had a laugh
Jack came tumbling down.

Default segmentation rules consider these five lines as one segment. I tried to create segmentation exceptions as: /end of line...........[A-Z] and then disallow it (I understand end of line = n or r). But it doesn't work, probably because I have activated segmentation by sentence.
What options do I have to be able to have each line as a different sentence? Puting a sentence breaker at the end of each line, throughout the text, or is there some easier way?
Collapse


 
traductorchile
traductorchile  Identity Verified
Chile
Local time: 08:27
English to Spanish
+ ...
Sorry to bother Feb 15, 2012

Sorry I had saved the PDF as a text document so all the (line end) format had dissapeared.

I copied the PDF on to .docx and know the sentences got segmented correctly.

Sorry.


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:27
English to French
+ ...
There are specific options for text files Feb 15, 2012

traductorchile wrote:

I have a text with lists of short sentences, i.e.:

Jack went up the hill
Jill didn't follow him,
Jack got in a fuss
Jill had a laugh
Jack came tumbling down.

Default segmentation rules consider these five lines as one segment. I tried to create segmentation exceptions as: /end of line...........[A-Z] and then disallow it (I understand end of line = n or r). But it doesn't work, probably because I have activated segmentation by sentence.
What options do I have to be able to have each line as a different sentence? Puting a sentence breaker at the end of each line, throughout the text, or is there some easier way?

In Options > File Filters > Text Files > Options, you can set how end of lines will be processed for text files.

Didier


 
Paul Klassen
Paul Klassen
Canada
Local time: 09:27
French to English
Cannot get OmegaT customized segmentation to work Dec 14, 2013

I have a sentence that contains:
… Malthus’, 23 fev. 1816 …
OmegaT insists on segmenting this after fev. I tried creating an exception with:
fev\. in the "Pattern before"
\s in the "Pattern after"
Break/Exception checked
in FR-CA (which is the language I am using).

I have tried a large number of variations on this, but nothing supresses the segmentation. Any thoughts?

I am running version 3.0.4 on Windows 7.

Thank y
... See more
I have a sentence that contains:
… Malthus’, 23 fev. 1816 …
OmegaT insists on segmenting this after fev. I tried creating an exception with:
fev\. in the "Pattern before"
\s in the "Pattern after"
Break/Exception checked
in FR-CA (which is the language I am using).

I have tried a large number of variations on this, but nothing supresses the segmentation. Any thoughts?

I am running version 3.0.4 on Windows 7.

Thank you,

Paul
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:27
English to French
+ ...
Uncheck Dec 15, 2013

Paul Klassen wrote:

I have a sentence that contains:
… Malthus’, 23 fev. 1816 …
OmegaT insists on segmenting this after fev. I tried creating an exception with:
fev\. in the "Pattern before"
\s in the "Pattern after"
Break/Exception checked


If checked, it means you want to segment.

in FR-CA (which is the language I am using).

I have tried a large number of variations on this, but nothing supresses the segmentation. Any thoughts?

Try unchecking.

I am running version 3.0.4 on Windows 7.

You should upgrade to 3.0.7.

Didier


 
Paul Klassen
Paul Klassen
Canada
Local time: 09:27
French to English
No luck Dec 16, 2013

I've upgraded to 3.0.7. I had tried both checked and unchecked, but didn't realize that the one I mentioned was not the right one.

I tried retyping the phrase
Ricardo to Malthus, 23 fev. 1816
into a document (as the only text), which I then saved and loaded. OmegaT still inserts a segment split. This was all using MS Word and .docx format.
Then I did the same thing using LibreOffice, same result.

Any more ideas?


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 14:27
English to French
+ ...
Rule position Dec 17, 2013

Paul Klassen wrote:

I've upgraded to 3.0.7. I had tried both checked and unchecked, but didn't realize that the one I mentioned was not the right one.

I tried retyping the phrase
Ricardo to Malthus, 23 fev. 1816
into a document (as the only text), which I then saved and loaded. OmegaT still inserts a segment split. This was all using MS Word and .docx format.
Then I did the same thing using LibreOffice, same result.
Any more ideas?

I just tried
Before: fev\.
After: \s
Unchecked, and it worked first time, all the other rules being the default rules.

When you created your FR-CA set of rules, did you left it at the bottom?

Rules are executed in sequential order. So, your exceptions must be before the default rules, if you want them to do anything.

Didier


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


OmegaT segmentation rules - splitting or merging segments in OmegaT






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »