Pages in topic:   [1 2] >
Current state of TBX (TermBase eXchange) support in TEnTs.
Thread poster: Michael Beijer
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 16:38
Member (2009)
Dutch to English
+ ...
Jun 24, 2011

Just curious, does anyone have any information on TBX being implemented in any of the current TEnTs? What are the chances that in these next few years we will finally see a unified standard emerge for terminology exchange?

I use memoQ, and I know that qTerm now supports TBX, but they have for some reason not included TBX support in memoQ.

Many Competing Standards

There are of course also a number of other terminology exchange format standards current
... See more
Just curious, does anyone have any information on TBX being implemented in any of the current TEnTs? What are the chances that in these next few years we will finally see a unified standard emerge for terminology exchange?

I use memoQ, and I know that qTerm now supports TBX, but they have for some reason not included TBX support in memoQ.

Many Competing Standards

There are of course also a number of other terminology exchange format standards currently knocking about...

- The developers of Swordfish have created "GlossML".
- OLIF (Open Lexicon Interchange Format).
- UTX (Universal Terminology eXchange).
- And the various flavours of TBX, such as TBX-Default, TBX-Basic, and TBX-Glossary....

Choose One

Basically, it's about time the community gets together and implements a single, standard format for easily exchanging term bases, so that we can finally put an end to all of this annoying converting and editing in text editors and/or Excel.

What do you think?
Collapse


 
Selcuk Akyuz
Selcuk Akyuz  Identity Verified
Türkiye
Local time: 18:38
English to Turkish
+ ...
Same with TMX and XLIFF Jun 24, 2011

Unfortunately there is no full compatibility of TMX and XLIFF files created by different CAT tools, and possibly it will be same for TBX files. AFAIK the best tool for importing TBX files is Across. Some colleagues reported failure with MultiTerm although in theory it should do better as additional fields can be created in MultiTerm.

MemoQ? You cannot add new fields therefore there will be data loss. Kilgray's qTerm is for LSPs and enterprises not for the freelancer. Kilgray wants a
... See more
Unfortunately there is no full compatibility of TMX and XLIFF files created by different CAT tools, and possibly it will be same for TBX files. AFAIK the best tool for importing TBX files is Across. Some colleagues reported failure with MultiTerm although in theory it should do better as additional fields can be created in MultiTerm.

MemoQ? You cannot add new fields therefore there will be data loss. Kilgray's qTerm is for LSPs and enterprises not for the freelancer. Kilgray wants a bigger share of the pie. But any improvements in TM editor?

Selcuk
Collapse


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 12:38
English to Spanish
GlossML is used with glossaries Jun 25, 2011

Michael J.W. Beijer wrote:

- The developers of Swordfish have created "GlossML".



GlossML is a format for holding glossaries; TBX is a format for holding terminology databases. Two different formats for two very different uses.

GlossML is simple, TBX isn't. TBX is overkill for holding a simple glossary.

Regards,
Rodolfo


 
mediamatrix (X)
mediamatrix (X)
Local time: 11:38
Spanish to English
+ ...
Standard? Jun 25, 2011

Michael J.W. Beijer wrote:

What are the chances that in these next few years we will finally see a unified standard emerge for terminology exchange?
...
Basically, it's about time the community gets together and implements a single, standard format for easily exchanging term bases, so that we can finally put an end to all of this annoying converting and editing in text editors and/or Excel.


I fear you have misunderstood the purpose of standards.

They are not intended, designed or otherwise conceived to ensure we, or our computers, can all understand each other. Heaven forbid!

Standards are mere hooks onto which manufacturers can hang proprietary, and preferably mutually incompatible, variants on a theme. Nothing more, nothing less.

MediaMatrix


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 17:38
English to French
+ ...
OmegaT loads TBX Jun 25, 2011

Michael J.W. Beijer wrote:
Just curious, does anyone have any information on TBX being implemented in any of the current TEnTs?

OmegaT loads glossaries in TBX format.

Didier


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 17:38
Member (2006)
English to Afrikaans
+ ...
A universal format would also be a complex format Jun 25, 2011

Michael J.W. Beijer wrote:
Just curious, does anyone have any information on TBX being implemented in any of the current TEnTs? What are the chances that in these next few years we will finally see a unified standard emerge for terminology exchange?


I think different formats have different strengths, so there is room for multiple formats.

Basically, it's about time the community gets together and implements a single, standard format for easily exchanging term bases...


What is the difference (in your opinion) between a glossary and a term base? And why would CSV not be a suitable format for your idea of what a term base would be?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 17:38
English to Hungarian
+ ...
How so? Jun 25, 2011

Rodolfo Raya wrote:

Michael J.W. Beijer wrote:

- The developers of Swordfish have created "GlossML".



GlossML is a format for holding glossaries; TBX is a format for holding terminology databases. Two different formats for two very different uses.

GlossML is simple, TBX isn't. TBX is overkill for holding a simple glossary.

Regards,
Rodolfo


In what way is TBX overkill? Are you so short on hard drive space that you can't afford to store 0.1 kB of extra data? Do you think CAT tools will struggle to process the "complex" TBX files? These formats aren't intended for human reading anyway, so it's just a matter of writing a parser, which all the CAT makers have already done...
In actual reality, the benefits of having a single unified format that all tools can handle outweigh any consideration of perceived simplicity by a factor of 10,000 to one. I don't know what GlossML is like - my guess is that it's some proprietary format that's XML-based just like TBX. All I know is that there is no need for it. There is nothing wrong with using TBX for simple two-language glossaries with no metadata. If you're hell bent on using a simpler format than TBX for your simple glossaries, then just use CSV, tab delimited txt or xls, which do actually offer some benefit over TBX: they are human readable.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 17:38
English to Hungarian
+ ...
Different formats Jun 25, 2011

Samuel Murray wrote:

Michael J.W. Beijer wrote:
Just curious, does anyone have any information on TBX being implemented in any of the current TEnTs? What are the chances that in these next few years we will finally see a unified standard emerge for terminology exchange?


I think different formats have different strengths, so there is room for multiple formats.

I profoundly disagree. The ideal solution is to create a single unified format that has can provide all the features required by the different tools, environments and usage scenarios.
People don't create their own markup language for their websites, they just use HTML, which meets everyone's needs and ensures universal compatibility. When new needs arise, a concil of wise old men draws up the HTML5 standard to cover these needs and maintain compatibility. It's not that hard...
I'm not sure if TBX meets all needs now, but if it doesn't, it's clearly better to refine and expand the TBX format than to intoduce myriads of mutually incompatible formats.
The only exception I can think of is a human-readable format for those of us who don't use a CAT, and xls works fine for that purpose.


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 12:38
English to Spanish
TBX is overkill for glossaries Jun 25, 2011

FarkasAndras wrote:
In what way is TBX overkill?


TBX is a markup framework. Two files are required for having a TBX document (a .tbx file plus a .xcs file that describes the data stored in the .tbx part).

To ship a termbase in real TBX format, you must ship 2 files.

Are you so short on hard drive space that you can't afford to store 0.1 kB of extra data? Do you think CAT tools will struggle to process the "complex" TBX files? These formats aren't intended for human reading anyway, so it's just a matter of writing a parser, which all the CAT makers have already done...


Swordfish already supports TBX format for importing terms. No problem of space or programming skills.


In actual reality, the benefits of having a single unified format that all tools can handle outweigh any consideration of perceived simplicity by a factor of 10,000 to one.


TBX has a huge problem: it can't be properly validated. It can't be considered the candidate for unified exchange.

I don't know what GlossML is like - my guess is that it's some proprietary format that's XML-based just like TBX.


Don't guess. Inform yourself before expressing an opinion. GlossML is an open format, nor proprietary.

All I know is that there is no need for it.


Once again, please be informed before expressing an opinion like that.

LISA understood that there is a real need for it. It is specifically designed for storing glossaries, something TBX could do but it was not designed for it. It was considered by LISA to be adopted as standard format for storing glossaries.

There is nothing wrong with using TBX for simple two-language glossaries with no metadata. If you're hell bent on using a simpler format than TBX for your simple glossaries, then just use CSV, tab delimited txt or xls, which do actually offer some benefit over TBX: they are human readable.


CSV is not portable. It has encoding problems. Also, there is no way to properly define what to use as delimiters and no official way to escape delimiters.

Excel files are also not portable.

GlossML was invented to replace CSV and Excel sheets in cross-platform exchange of glossaries, not to replace TBX.

Another problem with TBX is that it cannot be embedded in an XLIFF file. GlossML was designed to allow embedding glossaries in XLIFF or other XML vocabularies.

Regards,
ROdolfo


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 17:38
English to Hungarian
+ ...
Not really Jun 26, 2011

Most of what you say is interesting but ultimately irrelevant for practical purposes for most users.

For instance, your explanation of why TBX is overkill doesn't hold water:
Rodolfo Raya wrote:

TBX is a markup framework. Two files are required for having a TBX document (a .tbx file plus a .xcs file that describes the data stored in the .tbx part).

To ship a termbase in real TBX format, you must ship 2 files.

That's just not true. As you probably know, XCS (eXtensible Constraint Specification) is an optional extra file that you may ship with your TMX if you want to limit the picklist items that a user can choose etc. Most of the time, there is no need to create or send one. It's just there for the sort of extensibility I was talking about above.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 17:38
Member (2006)
English to Afrikaans
+ ...
Ahem, a simple google should take out the guess Jun 26, 2011

FarkasAndras wrote:
I don't know what GlossML is like - my guess is that it's some proprietary format that's XML-based just like TBX.


GlossML is XML and it is completely open. A simple google would confirm this, and I think we who participate in these forums owe it to our readers to do such basic research before making sweeping statements.

All I know is that there is no need for it. ... If you're hell bent on using a simpler format than TBX for your simple glossaries, then just use CSV, tab delimited txt or...


My guess (and this time I'm the one who's guessing) is that Rodolfo wanted a simple glossary format to go with his suite of translation tools, that uses the same type of format, namely XML. GlossML is just that -- it is very simple, has no scalability (none needed, after all), and does exactly what it was designed for, namely store glossary data with no risk of data loss through misencoding or misparsing.

The reasons listed on the GlossML page for not using CSV include the fact that different dialects of CSV use different delimiters (and use it differently) and has different ways of escaping characters, and different ways of dealing with character encoding. All of this amounts to a recipe for data loss if two translators share their "CSV" files and assume that the other translator's CSV application uses the same rules.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 17:38
Member (2006)
English to Afrikaans
+ ...
@Farkas and Rodolfo Jun 26, 2011

FarkasAndras wrote:
Rodolfo Raya wrote:
To ship a termbase in real TBX format, you must ship 2 files.

That's just not true. As you probably know, XCS (eXtensible Constraint Specification) is an optional extra file that you may ship with your TMX if you want to limit the picklist items that a user can choose etc.


A quick question to you two gentlemen who seem to know more about TBX than I do... is the XCS file necessary to tell the TBX reader (in a two-language TBX file) which language is the source language and which language is the target language? If not, how can a TBX reader know which is which (except for making the *sweeping* assumption that the first one must be the source text)?


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 17:38
English to Hungarian
+ ...
Source text? Jun 26, 2011

I haven't used TBX much, but I don't expect that there is such a thing as a source text. Each language is identified by a language code just like in TMX so your CAT knows which is which and serves up hits accordingly.
The TMX header does have a "source language" (srclang) field, as well as an "adminlang" field, but I'm not really sure what these do. If you import a FR-EN TMX to an EN-FR TM, then your CAT "switches" the languages, i.e. FR, which is marked as the srclang, will be the target
... See more
I haven't used TBX much, but I don't expect that there is such a thing as a source text. Each language is identified by a language code just like in TMX so your CAT knows which is which and serves up hits accordingly.
The TMX header does have a "source language" (srclang) field, as well as an "adminlang" field, but I'm not really sure what these do. If you import a FR-EN TMX to an EN-FR TM, then your CAT "switches" the languages, i.e. FR, which is marked as the srclang, will be the target language in that TM as it should be. So srclang doesn't fundamentally determine the "source" language. I'm not sure if it does anything, and I have rummaged around the bowels of TMX a fair bit. Maybe it sets a default source language that gets overridden if you import into a reversed TM, but that's just a blind guess. At a glance, TBX doesn't even seem to have anything like srclang and adminlang.
Collapse


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 12:38
English to Spanish
XCS is required. Jun 26, 2011

Samuel Murray wrote:
A quick question to you two gentlemen who seem to know more about TBX than I do... is the XCS file necessary to tell the TBX reader (in a two-language TBX file) which language is the source language and which language is the target language? If not, how can a TBX reader know which is which (except for making the *sweeping* assumption that the first one must be the source text)?



The source language is declared in the rot of the .tbx file. You can't switch it as some people do with TMX. All administrative descriptions must be entered in the declared main language.

The XCS file declares the languages used in the .tbx file. If you want to enter terms in Afrikaans, you have to declare Africaans as one of the languages contained in the .tbx part.

The "default" XCS file that is assumed when you don't ship an XCS file with your .tbx file is quite basic and declares only a few languages: en, de, es, hu, fr, it, da, nl, fi, pl, no, pt, sv, el and cs.
If you want to use a language not declared in the default .xcs file, you must create your own .xcs. If you want to use a variant of those languages, like "en-US", you also have to attach an XCS file.

One big failure of the TBX format is that the location of the XCS file is not properly specified in in the .tbx part. It can be anything, like "in the left drawer of my desk". This is one of the details that make TBX impossible to validate.

Regards,
Rodolfo


 
Rodolfo Raya
Rodolfo Raya  Identity Verified
Local time: 12:38
English to Spanish
XCS is required. Jun 26, 2011

FarkasAndras wrote:

Most of what you say is interesting but ultimately irrelevant for practical purposes for most users.

For instance, your explanation of why TBX is overkill doesn't hold water:
Rodolfo Raya wrote:

TBX is a markup framework. Two files are required for having a TBX document (a .tbx file plus a .xcs file that describes the data stored in the .tbx part).

To ship a termbase in real TBX format, you must ship 2 files.

That's just not true. As you probably know, XCS (eXtensible Constraint Specification) is an optional extra file that you may ship with your TMX if you want to limit the picklist items that a user can choose etc. Most of the time, there is no need to create or send one. It's just there for the sort of extensibility I was talking about above.


Please read the specification document for TBX. Item 2 of section 7.1 states that each TML must have a corresponding XCS file that describes all data categories used in the framework.

An example XCS file is included in the specification document and you can point your .tbx fles to it if, and only if, your data categories and language constraints match the provided sample set.

If you don't include an XCS file in your TML and if your data doesn't validate against that XCS file, your TML does not comply with TBX standard.

Regards,
Rodolfo


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Current state of TBX (TermBase eXchange) support in TEnTs.







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »