A HTML-file with a separate "translation goes here" column
Thread poster: Harklas
Harklas
Harklas
Local time: 07:35
Oct 27, 2010

The source is an html file (well, tons of them) and it's organized as a table with a separate empty column where the customer expects the translation to be when they get it.

If the source was Excel files, I'd just export xls:s from OmegaT and copy-paste the translations into the original files and voila I'd be done.

But these are html-files, and when I edit them in Excel and save it again as .html, the design of the page looks different when viewed in a web browser. And
... See more
The source is an html file (well, tons of them) and it's organized as a table with a separate empty column where the customer expects the translation to be when they get it.

If the source was Excel files, I'd just export xls:s from OmegaT and copy-paste the translations into the original files and voila I'd be done.

But these are html-files, and when I edit them in Excel and save it again as .html, the design of the page looks different when viewed in a web browser. And editing the files in notepad isn't really an appealing idea.

So I have three options:

1) Do the copy-pasting with some more appropriate html editor than Excel. (Dreamweaver?)

2) Deliver xls files or slightly Excel-corrupted html files and just apologize for not having all the tech in place here.

3) Make OmegaT understand that it's supposed to put the translation in the next column with some help from you super-smart OmegaT people.

What do you think?
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:35
Member (2006)
English to Afrikaans
+ ...
@Harklas Oct 27, 2010

Harklas wrote:
The source is an html file (well, tons of them) and it's organized as a table with a separate empty column where the customer expects the translation to be when they get it.


The problem is that OmegaT (and all other CAT tools) don't actually understand HTML. These tools only know how to extract the translatable portions of it and replace it (in the same place) with the translation.

If you want to have the translation in the second column, you'd first have to put the source text into the second column, and then figure out how to tell the CAT tool not to translate the first column.

Because of OmegaT's simple design, OmegaT will translate duplicate strings the same way (so if you have the same sentence in two column, OmegaT will always translate both of them, if their source texts are the same. What I would do is to do find/replace in the first column to replace all spaces with something like " # " (space, hash, space) so that it is very visible in OmegaT (so you know that you should skip it) and so that OmegaT doesn't try to translate it.

What bugs me is how the client had thought that you would do the translation in the first place. If these are HTML files, did the client think that you'd translate them in an HTML editor? Remember that you can open HTML files in MS Word and in OpenOffice.org, in WYSIWIG mode, but that would also corrupt the source code a bit. The question is whether the client minds if you mess up the source code (as long as the visible text looks the same).

1) Do the copy-pasting with some more appropriate html editor than Excel. (Dreamweaver?)


Unless you can write a very fancy HTML source code find/replace macro thingy that duplicates the table column content in the source code itself, at some stage you're going to have to open and save the files in an HTML program anyway. Because you have to copy that column somehow.

How simple is the client's HTML?


 
Harklas
Harklas
Local time: 07:35
TOPIC STARTER
I'll just give them the consequences. Oct 27, 2010

The html looks pretty simple when viewed in a browser (I haven't looked at the source code); it's just a bunch of tables, so from the surface of it, it could just as well have been a simple xls file.

I think the end client intends to flesh out the html files with all the content they need on there once they have it back with localised text, and that they want to keep English in there throughout the process to know what sentences they're putting where in the final layout. I guess the
... See more
The html looks pretty simple when viewed in a browser (I haven't looked at the source code); it's just a bunch of tables, so from the surface of it, it could just as well have been a simple xls file.

I think the end client intends to flesh out the html files with all the content they need on there once they have it back with localised text, and that they want to keep English in there throughout the process to know what sentences they're putting where in the final layout. I guess they're techy people that just started to localize their product, so they haven't really thought about how things are supposed to work on my end, and the mediator just forwarded their files. In hindsight, I should have asked for a spreadsheet instead of being Mr. No-Problems-I-Fix.

But for now I guess I'll just give them my Excel-messed-up html, and I guess there'll be no problem for them to fix it, or to extract the text and import it again the way they want it.

I've asked them already and will probably get an answer when they're BIO tomorrow, but I was just looking to see if I could fix it on my own tonight as I like to have every pixel right in my delivery with minimal trouble for the client (like we all do I suppose).

Your work-around with # in all source text would make the Swedish text alright, but if I don't want to give them a funny#source#text#back I would have to edit the html anyway.

But thanks for your answer that CAT and html simply don't go together, it makes me feel less amateurish here
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 07:35
Member (2004)
English to Polish
SITE LOCALIZER
Regex-fu? Oct 27, 2010

The way I would go about it is to use a regex on the source files. For example:

<table>
<tr>
<td>First source segment</td><td></td>
</tr>
<tr>
<td>Second source segment</td><td></td>
</tr>
</table>

would be converted to:

<table>
<tr>
<td><!--First source segment--></td><
... See more
The way I would go about it is to use a regex on the source files. For example:

<table>
<tr>
<td>First source segment</td><td></td>
</tr>
<tr>
<td>Second source segment</td><td></td>
</tr>
</table>

would be converted to:

<table>
<tr>
<td><!--First source segment--></td><td>First source segment</td>
</tr>
<tr>
<td><!--Second source segment--></td><td>Second source segment</td>
</tr>
</table>

Please note that I have used the "comment" tag to hide the actual source column from the CAT. It might not always work, but you can enclose the segment in another tag that would not be extracted for translation.

[Edited at 2010-10-27 17:20 GMT]
Collapse


 
Harklas
Harklas
Local time: 07:35
TOPIC STARTER
Just out of curiosity Oct 27, 2010

Jabberwock wrote: stuff


I think I will concentrate on the actual language and leave the tech to the techy, but just out of curiosity, in what program would I open the files to do those changes? And would I do it manually line by line?

Sorry if I just made you go *sigh* with my computer illiteracy now


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:35
Member (2006)
English to Afrikaans
+ ...
Regex Oct 27, 2010

Harklas wrote:
Jabberwock wrote: stuff

...just out of curiosity, in what program would I open the files to do those changes?


jEdit does regex.
MS Word has limited regex, too.

Just be aware that different programs use slightly different regular expression syntaxes, so you can't always use the same search string in one program as the one you use in another program.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:35
Member (2006)
English to Afrikaans
+ ...
Using HTML isn't bad Oct 27, 2010

Harklas wrote:
I guess they're techy people that just started to localize their product, so they haven't really thought about how things are supposed to work on my end...


Using HTML tables isn't a bad idea, but the translator must be told what he is allowed to do or not allowed to do. It depends on who smart the client's HTML-to-originalformat converter is. Can the client's converter handle HTML tables that MS Word had barfed on? If yes, then it is good.


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 07:35
Member (2004)
English to Polish
SITE LOCALIZER
Text editing Oct 27, 2010

Harklas wrote:
I think I will concentrate on the actual language and leave the tech to the techy, but just out of curiosity, in what program would I open the files to do those changes? And would I do it manually line by line?

Sorry if I just made you go *sigh* with my computer illiteracy now


No need to apologize - you know those people complaining that translator's job became too techy? They're right

Most advanced text editors allow use of regular expressions. These are quite useful, as they allow searching for quite complex text structures and replacing them with more appropriate stuff... However, the learning curve is rather steep - actually, I don't use them often enough to get them right at the first time

Perl (programming language) might be even more appropriate for this, but this is strictly advanced stuff (even though quite rewarding in the long run...).

Naturally, starting with this stuff on a live project is not the best idea, as you cannot even assess how much effort might be required. I am sure that your client will have a solution for you that will make your life easier - extracting the text to a format more suitable for your needs.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:35
Member (2006)
English to Afrikaans
+ ...
MS Word Oct 27, 2010

Jabberwock wrote:
However, the learning curve is rather steep - actually, I don't use them often enough to get them right at the first time


Yep, it takes some trial and error to get one to work.

If the table in question had only two columns, then this would work in MS Word (I've added spaces so that ProZ.com's forum software don't turn half of it into smileys):

To duplicate second-to-last column to last column:

FIND: ( \ < t r * \ > * ) ( \ < t d * \ > ) ( * ) ( \ < \ / t d \ > ) ( \ < t d * \ > ) ( * ) ( \ < \ / t d \ > ) ( * \ < \ / t r \ > )
REPLACE: \ 1 \ 2 \ 3 \ 4 \ 5 \ 3 \ 7 \ 8
WILDCARDS: ON

To duplicate second-to-last column to last column *and* comment out the second-to-last column:

FIND: ( \ < t r * \ > * ) ( \ < t d * \ > ) ( * ) ( \ < \ / t d \ > ) ( \ < t d * \ > ) ( * ) ( \ < \ / t d \ > ) ( * \ < \ / t r \ > )
REPLACE: \ 1 < ! - - SPACE \ 2 \ 3 \ 4 SPACE - - > \ 5 \ 3 \ 7 \ 8
WILDCARDS: ON


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 07:35
English to Hungarian
+ ...
What in the... Oct 27, 2010

Harklas wrote:

The source is an html file (well, tons of them) and it's organized as a table with a separate empty column where the customer expects the translation to be when they get it.

...

So I have three options:

1) Do the copy-pasting with some more appropriate html editor than Excel. (Dreamweaver?)

2) Deliver xls files or slightly Excel-corrupted html files and just apologize for not having all the tech in place here.

3) Make OmegaT understand that it's supposed to put the translation in the next column with some help from you super-smart OmegaT people.

What do you think?


I think you're not the one who has to do the apologizing here. The client just made your life very difficult by failing to think and failing to communicate.
These two-column tables are pretty inelegant hacks that help clients get by with using translators who have no idea how to handle the real file format in question. They are a rudimentary solution, but they work... until the client does something incredibly stupid like create a table in a non-editable format such as HTML.

I would ask the client what the original format is and what format they actually need. It appears that they are making totally unreasonable assumptions about what you can and can't do. (I.e. they think you can't handle the original format when you and your CAT probably can, and they think you can conveniently type into an HTML file, which obviously nobody can.)

Of course the HTML table may be the real end format (e.g. they want to put a bilingual table/glossary on their website) but that seems pretty unlikely. Even if it is the case, it's not really fair to expect a translator to do this stuff without some prior discussion.


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


A HTML-file with a separate "translation goes here" column






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »