Any way of extracting _source_ text from a bilingual document?
Thread poster: OTMed (X)
OTMed (X)
OTMed (X)
Poland
Local time: 03:44
English to Polish
+ ...
Jun 3, 2004

We need to re-create source (English) text from a bilingual version (English>Polish) for alignment (the original English text is no longer available). Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text? The only way I can think of is to select and delete all styles but the source text. The problem is this approach takes quite a lot of manual deleting.
Have you perhaps heard of a tool that would do this trick automatically? TIA


 
Pablo Roufogalis (X)
Pablo Roufogalis (X)
Colombia
Local time: 20:44
English to Spanish
Yahoo group Jun 3, 2004

Hello.

There's a tool in the Yahoo Trados group that claims to convert a bi-lingual Trados doc in a two-column doc. Should be easy then to select and copy/paste the source text into another document.

Never used it but you may try it and report.


 
Harry Bornemann
Harry Bornemann  Identity Verified
Mexico
Local time: 19:44
English to German
+ ...
Another CAT tool Jun 3, 2004

You could export the TM in Trados text format and import this to Déjà Vu or another tool which can export its TM in tab-separated text format.
Then it will be easy to split in Excel or Access.
BR
Harry


 
Lixus (X)
Lixus (X)
Local time: 02:44
French to English
+ ...
Can you send a sample? Jun 3, 2004

you can send me a sample of text by email, 100 words?
I'm sure I'll find a tool (hand made).


 
Gerard de Noord
Gerard de Noord  Identity Verified
France
Local time: 03:44
Member (2003)
English to Dutch
+ ...
Try this macro Jun 3, 2004

You could try running this Word macro on a copy of the document if the file has been segmented by Wordfast or Trados:

Sorry, the macro didn't survive posting.

Regards,
Gerard

[Edited at 2004-06-03 14:07]


 
Alison Schwitzgebel
Alison Schwitzgebel
France
Local time: 03:44
German to English
+ ...
Bear with me on this one, but it ought to work... Jun 3, 2004

1. Save an extra copy of your bilingual document (just in case this all goes horribly wrong)

2. Clean this up into a new memory.

3. Export this memory.

4. Create a new memory WITH THE LANGUAGE PAIR REVERSED.

5. Import the memory.

6. Run the pre-translate option over your cleaned document.

7. Clean up the pre-translated document.

This is how y
... See more
1. Save an extra copy of your bilingual document (just in case this all goes horribly wrong)

2. Clean this up into a new memory.

3. Export this memory.

4. Create a new memory WITH THE LANGUAGE PAIR REVERSED.

5. Import the memory.

6. Run the pre-translate option over your cleaned document.

7. Clean up the pre-translated document.

This is how you can generally get memories you have in the wrong language direction turned around.

Theoretically it ought to work. Let me know how you get on.

HTH

Alison
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 03:44
Member (2004)
English to Polish
SITE LOCALIZER
Yet another way... Jun 3, 2004

It depends a little on the source text and its format...

That's what I would do:
1. Create a new TM.
2. Analyse the file in question.
3. Export unknown sequences to Word format.

This is assuming you're using Trados. With other tools it might be possible, too.

Several things to note:
The format of the file might not be 100% accurate. The formating in TMs for Trados is still somewhat buggy.

The repeated phrases will show u
... See more
It depends a little on the source text and its format...

That's what I would do:
1. Create a new TM.
2. Analyse the file in question.
3. Export unknown sequences to Word format.

This is assuming you're using Trados. With other tools it might be possible, too.

Several things to note:
The format of the file might not be 100% accurate. The formating in TMs for Trados is still somewhat buggy.

The repeated phrases will show up only once - you can export repetitions separately and then insert them, if needed, but it's tedious. You can also export repetitions and using them as a guide insert some markers in the biligual source, so that they are all different. (E.g. ##01##, ##02##, etc.)


Note that this way (and most of the other mentioned above) allows you to recreate only the TEXT of the original, not the document itself. If you want that (with formatting, pictures, etc.), you need to work within Word itself (or whatever software you're using - might be useful if you specify that).

[Edited at 2004-06-03 16:44]
Collapse


 
OTMed (X)
OTMed (X)
Poland
Local time: 03:44
English to Polish
+ ...
TOPIC STARTER
Thank you all for your input Jun 3, 2004

I do appreciate your assistance. What we were initially considering was aligning and switching source and target segments in some way. Having received all your suggestions, we have adopted the following strategy:
The text we have is a regular bilingual trados-segmented Eng>Pol file (Trados 5.5 freelance).
We have used the following strategy to 'reverse' the TM:
a. Translated the bilingual file with an empty Trados TM (en>pl)
b. Exported this TM as a .txt (en>pl)
c.
... See more
I do appreciate your assistance. What we were initially considering was aligning and switching source and target segments in some way. Having received all your suggestions, we have adopted the following strategy:
The text we have is a regular bilingual trados-segmented Eng>Pol file (Trados 5.5 freelance).
We have used the following strategy to 'reverse' the TM:
a. Translated the bilingual file with an empty Trados TM (en>pl)
b. Exported this TM as a .txt (en>pl)
c. Opened the txt format as a Wordfast 3.3 TM (en>pl)
d. 'Reversed' Wordfast TM to pl>eng using built-in Wordfast functionality

At this stage all seems to be working OK appart from the fact that Wordfast looses all Polish fonts. Or better said replaces Polish fonts with rather dramatic '?'.

Next step is to try one of the options you have proposed. I will keep you all posted on the fascinating 'TM reversal' story.
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 03:44
Member (2004)
English to Polish
SITE LOCALIZER
It is, indeed, fascinating... especially for a stubborn guy like me :) Jun 3, 2004

I have found a quick and quite elegant way to restore original document from a bilingual Trados Word document.

The trick is, essentially, to repeat the sequence Open Source and Restore Source repeatedly.

Three ways of doing it for now:
1. Alt+Home
2. Alt+Del
repeat ad nauseam.

Actually, with short documents it is pretty fast...

Method two:
Create a new macro, with the lines:

Application.Run "tw4winOpen.Ma
... See more
I have found a quick and quite elegant way to restore original document from a bilingual Trados Word document.

The trick is, essentially, to repeat the sequence Open Source and Restore Source repeatedly.

Three ways of doing it for now:
1. Alt+Home
2. Alt+Del
repeat ad nauseam.

Actually, with short documents it is pretty fast...

Method two:
Create a new macro, with the lines:

Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"

Assign the macro to a button. Press repeatedly.

Method three (most effective, but not recommended):
Create a new macro, with the lines:

Do
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Loop Until Selection.InRange (ActiveDocument.Paragraphs.Last.Range)

This works ONLY if the segments are not separated with pictures, etc. (empty paragraphs are OK), anything that would prevent Trados from opening the next segment. Also, there must not be empty paragraphs at the end of the doc (it must end with the last segment). Otherwise the macro does not stop and Winword has to be exited with Ctrl+Alt+Del. However, it is quick and worked quite nice on several documents I've tried.


[Edited at 2004-06-03 21:06]

[Edited at 2004-06-03 21:11]
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 03:44
Member (2004)
English to Polish
SITE LOCALIZER
A better version of the macro... Jun 4, 2004

...is here:

Do
Selection.Expand
If Selection.Characters(1).Style.NameLocal = "tw4winMark" Then
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Else
Selection.Collapse (wdCollapseEnd)
End If
Loop Until selection.InRange(ActiveDocument.Paragraphs.Last.Range)

It works with most bilingual Trados Word documents without the limitation specified above.
It does not include text
... See more
...is here:

Do
Selection.Expand
If Selection.Characters(1).Style.NameLocal = "tw4winMark" Then
Application.Run "tw4winOpen.Main"
Application.Run "tw4winRestoreSource.Main"
Else
Selection.Collapse (wdCollapseEnd)
End If
Loop Until selection.InRange(ActiveDocument.Paragraphs.Last.Range)

It works with most bilingual Trados Word documents without the limitation specified above.
It does not include text boxes, frames or headers/footers - it would need to be much more complicated than that.

BTW, is there a repository for macros in ProZ? I've got a few that might be useful...

[Edited at 2004-06-04 01:18]
Collapse


 
Aleksandr Okunev (X)
Aleksandr Okunev (X)
Local time: 04:44
English to Russian
There's such a tool Jun 4, 2004

Is there a way of a 'reverse clean-up', i.e. removing translation segments and leave source text?


This is done by Plustoys(2) utility which is available in the 'files' section of Wordfast Yahoo group. Join the group and download it. The utility does more useful things too.

Happy cleanup!


 
OTMed (X)
OTMed (X)
Poland
Local time: 03:44
English to Polish
+ ...
TOPIC STARTER
Thank you - lessons learned Jun 12, 2004

Our ambitiously labeled 'how on earth do I reverse Trados TM' project was paused for a while, so please forgive the sligthly late reply.
The task of restoring source segments from a bilingual text has developed into creating a 'reverse' TM from a bilingual doc.
Both tools posted above (thanks Jabberwock and Aleksandr) did extract the source text.
As to creating the reverse TM, following some experiments and tests we concluded with a great help from the Fusion Team (thank you A
... See more
Our ambitiously labeled 'how on earth do I reverse Trados TM' project was paused for a while, so please forgive the sligthly late reply.
The task of restoring source segments from a bilingual text has developed into creating a 'reverse' TM from a bilingual doc.
Both tools posted above (thanks Jabberwock and Aleksandr) did extract the source text.
As to creating the reverse TM, following some experiments and tests we concluded with a great help from the Fusion Team (thank you Alain!) that creating a reverse TM is as simple as exporting a TM for a language pair (x-y), creating a new TM for a reverse language pair (y-x) and importing the previously exported x-y TM!

The above method may perhaps be obvious to some (most) of you, but as it was a groundbreaking discovery for us, I do hope someone may find our lessons learned helpful.

Nevertheless let me thank you all for your valuable&helpful input. I am, as always, awed by the support I have received through Proz.

Best regards, Greg
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Any way of extracting _source_ text from a bilingual document?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »