Trouble with DocBook
Thread poster: Jan Cerny
Jan Cerny
Jan Cerny
Austria
Feb 11, 2015

Hi,

for the past few hours I have been unsuccessfully trying to import various Docbook XML documents into OmegaT 3.1.8.

In some cases OmegaT tells me that no files in a supported format are included in my project.
Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

All in all
... See more
Hi,

for the past few hours I have been unsuccessfully trying to import various Docbook XML documents into OmegaT 3.1.8.

In some cases OmegaT tells me that no files in a supported format are included in my project.
Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2). I also tried out various DocBook XML files which are included in other tools or which I found on websites.

I even downloaded a test project from an OmegaT bug report:
http://sourceforge.net/p/omegat/bugs/636/
Importing the project attached in that bug report also leads to the "contains invalid characters" error though it plainly must have worked for another user at some time.

I fear I am missing something pretty basic here. Any suggestions would be greatly appreciated.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 00:54
English to French
+ ...
The OmegaT documentation translation kit is in DocBook Feb 11, 2015

Jan Cerny wrote:
In some cases OmegaT tells me that no files in a supported format are included in my project.

That could be the case if the DocBook header is not what is expected.
The pattern we use for DocBook 4 is -//OASIS//DTD DocBook.*

Other XML files lead to an error message where OmegaT informs me that a path in the file I want to import contains "invalid characters".
Another type of files leads to the error message "Illegal character in path at index9".

Just to be sure: try to have a very short path (e.g., c:\test\) and with no "exotic" character in the filename. That's not something specific to OmegaT, but a combination of Java and operating system limitations.

All in all I am not making much progress here. I have tried using files belonging to my own documentation environment (which I can open fine in XMetaL & which are based on DocBook 4.2).

The OmegaT documentation is based on 4.5, but that shouldn't be very different.

Or can someone perhaps provide me with a working DocBook example which I can import? I could have a look at the syntax to see why my own files do not work.

You are lucky: the OmegaT documentation translation kit contains precisely that:
https://sourceforge.net/projects/omegat/files/Other%20-%20Localization%20projects/OmegaT%203.1.8/
"Minimal" contains just one DocBook document, "Full" contains the complete documentation.

Didier


 
Jan Cerny
Jan Cerny
Austria
TOPIC STARTER
Entity problems Feb 12, 2015

Hi,

thank you a lot for your answer. The path really was the cause of the problem. It contained a [ and a ]. When I removed the square brackets it started working.

I have now begun to successfully load more and more files of my documentation environment into OmegaT.

One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of
... See more
Hi,

thank you a lot for your answer. The path really was the cause of the problem. It contained a [ and a ]. When I removed the square brackets it started working.

I have now begun to successfully load more and more files of my documentation environment into OmegaT.

One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post). Also, the output files again contain the "Coca Cola" value instead of the entity.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Is there a way around these problems?
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 00:54
English to French
+ ...
Entities Feb 13, 2015

Jan Cerny wrote:
One new problem has arisen, concerning entities:

Documentation Environment including the DTD
When I import the documentation environment including the DTD, the value of a text entity is displayed within the editor as a normal text string (e.g "Coca Cola") and translatable. Is there a way to only display the entity (e.g "&ProductName;") instead and make it write-protected?

No (I'm talking from a user point of view).

When I generate the translated documents, they contain the DTD file content in the translated document (same problem as in the bug report I linked in my first post).

If the bug report is still open, that's because it is still valid.

Also, the output files again contain the "Coca Cola" value instead of the entity.

What you see in the Editor is what you will get in the target document.

Documentation Environment without DTD
If I only import XML files, entities are represented as blanks. This is also not ideal, because you miss them while translating. The output files also contain blanks where the entities should be.

Yes, that's not a very satisfying solution.

Is there a way around these problems?

I cannot think of any that doesn't involve pre- and post-processing the documents.
For instance, replace &my-entity; with #my-entity; in source documents, and do the reverse operation in target documents.

By doing so, you can have your "entities" identified as tags in OmegaT. In Options > Tag Validation, enter #.*?; as the regular expression for custom tags.

Didier


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Trouble with DocBook






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »