Please recommend CAT tool for translating patent PDF files from Chinese to English
Thread poster: PatentTrans
PatentTrans
PatentTrans
United States
Local time: 17:32
Chinese to English
Oct 27, 2013

Hi all, I need to make a decision fast and appreciate your thoughts. Just received a contract to translate a number of patents from simplified Chinese to English and they are in PDF format. I have the capability of doing an OCR on these. Never used CAT before but given the number of deliverables I'm going have to use one. What in your opinion is the best TM software for my situation? The only language pair I need is Chinese/English, and I'm only translating patents, and if the CAT can handl... See more
Hi all, I need to make a decision fast and appreciate your thoughts. Just received a contract to translate a number of patents from simplified Chinese to English and they are in PDF format. I have the capability of doing an OCR on these. Never used CAT before but given the number of deliverables I'm going have to use one. What in your opinion is the best TM software for my situation? The only language pair I need is Chinese/English, and I'm only translating patents, and if the CAT can handle scanned PDFs it's even better. Thanks so much in advance.

[Edited at 2013-10-27 19:55 GMT]
Collapse


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 01:32
Finnish to French
Consider PDF and CAT separately Oct 27, 2013

Those CAT tools that include support for PDF have licensed the technology from third parties: for instance, SDL uses technology from Solid, Wordfast from BCL (Wordfast Pro) and ABBYY (Wordfast Anywhere), Déjà Vu from BCL (like Wordfast Pro), while memoQ uses a freebie converter.

You would be better off selecting a CAT tool on its own merits as a CAT tool, and picking up a separate tool (or, better, separate tools) for PDF conversion. There are many different types of PDF and many
... See more
Those CAT tools that include support for PDF have licensed the technology from third parties: for instance, SDL uses technology from Solid, Wordfast from BCL (Wordfast Pro) and ABBYY (Wordfast Anywhere), Déjà Vu from BCL (like Wordfast Pro), while memoQ uses a freebie converter.

You would be better off selecting a CAT tool on its own merits as a CAT tool, and picking up a separate tool (or, better, separate tools) for PDF conversion. There are many different types of PDF and many different converters as well. No single converter is better than all others with all types of PDF: converter A may be the best for PDF X, while converter B will be better for PDF Y. This is why having several converters at your disposal in your arsenal could be a good idea.

Regarding patents and Chinese, most CAT tools should be able to handle both. Choosing the most suitable tool is a matter of personal preferences, unless you're so eager to please your clients you will pick up the tool they "require"...
Collapse


 
PatentTrans
PatentTrans
United States
Local time: 17:32
Chinese to English
TOPIC STARTER
Any CAT tools better suited for Chinese? Oct 27, 2013

Thanks. My client is not requiring me to use a CAT tool at all so this is for my own benefit. The output will be simple text. At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work. Also I've heard some CATs are not very good at segmenting Asian languages.

 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:32
Member (2009)
Dutch to English
+ ...
agree with Dominique Oct 27, 2013

You said you are in a hurry, so here is my quick answer. I suggest that you get
(1) ABBYY FineReader to convert the PDFs to .docx, and
(2) CafeTran or memoQ to translate them with.

Michael


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 01:32
Finnish to French
Five inexpensive CAT tools Oct 27, 2013

PatentTrans wrote:
At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work.

About one year ago, I made a series of short videos about five free or inexpensive CAT tools:
OmegaT
MemSource
Wordfast Anywhere
CafeTran
MetaTexis

I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.

Just take a short sample Chinese text and see how each of them fares with it.

You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market.


 
jyuan_us
jyuan_us  Identity Verified
United States
Local time: 18:32
Member (2005)
English to Chinese
+ ...
What did you mean by that? Oct 27, 2013

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.


Segments too long? too short? Improper location of a sentence to segment with?

I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it.


 
Phil Hand
Phil Hand  Identity Verified
China
Local time: 06:32
Chinese to English
You can change segmenting rules Oct 28, 2013

I guess in any CAT tool you can edit segmenting rules to suit your needs. I use SDL, and I use the function quite regularly.
For patents, you need to make sure you've got a terminology manager working with your CAT, because maintaining consistent terminology will be useful. SDL is having compatibility issues with its terminology software MultiTerm right now, so it's probably not your best choice.
Don't know what you've already got in terms of OCR, but I use a little freebie called Ha
... See more
I guess in any CAT tool you can edit segmenting rules to suit your needs. I use SDL, and I use the function quite regularly.
For patents, you need to make sure you've got a terminology manager working with your CAT, because maintaining consistent terminology will be useful. SDL is having compatibility issues with its terminology software MultiTerm right now, so it's probably not your best choice.
Don't know what you've already got in terms of OCR, but I use a little freebie called Hanwang, and it's OK.
Collapse


 
PatentTrans
PatentTrans
United States
Local time: 17:32
Chinese to English
TOPIC STARTER
I'm giving OmegaT a try Oct 28, 2013

Dominique Pivard wrote:

PatentTrans wrote:
At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work.

About one year ago, I made a series of short videos about five free or inexpensive CAT tools:
OmegaT
MemSource
Wordfast Anywhere
CafeTran
MetaTexis

I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.

Just take a short sample Chinese text and see how each of them fares with it.

You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market.



Thanks. I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started.

Also thanks a lot to everyone else who replied to my request. I'm downloading Hanwang OCR right now and will give it a try in a little bit.


 
Dominique Pivard
Dominique Pivard  Identity Verified
Local time: 01:32
Finnish to French
OmegaT resources Oct 28, 2013

PatentTrans wrote:
I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started.

Glad to hear you found the video useful! The OmegaT community has a very active mailing list on Yahoo with lots of helpful people, much more active than the corresponding ProZ forum. I strongly recommend you subscribe to the list, should you need to ask further questions about the tool.


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 00:32
French to Polish
+ ...
Splitting segments in the patent stuff... Oct 28, 2013

jyuan_us wrote:

PatentTrans wrote:
Also I've heard some CATs are not very good at segmenting Asian languages.


Segments too long? too short? Improper location of a sentence to segment with?

I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it.


In fact, the automatic segmenting is not enough here, the patent jobs need a LOT of manual segmenting.
I.e. if you start to split segments according to their meaning and some repetitive patterns, you'll be able to leverage more chunks of text.
I doubt it can be easily automated in a sound way.

Cheers
GG


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 23:32
Member (2009)
Dutch to English
+ ...
following from what Grzegorz said... Oct 28, 2013

If you are going to use the CAT tool for patents, you might want to choose a tool in which you can easily split and join segments. Patents often contain very long sentences, and as Grzegorz said, you might want to split these into smaller chunks, either to increase leverage (get more hits from your translation memories), or, simply to make them easier to handle with your poor human brain.

Michael


 
Heartsome Support
Heartsome Support
Local time: 06:32
A clean file will make your cat work Oct 29, 2013

As suggested, you have lots of choices on TM software. But when you translate a file converted from OCR, you may find there are lots of tags in your cat, this will slow your work down severely. In this case, using or not using a cat is no difference, So recommend you to clean your files first with Transtool or Codezapper. This will make your file cleaner, so you can reuse the TM effectively.

 
PatentTrans
PatentTrans
United States
Local time: 17:32
Chinese to English
TOPIC STARTER
Tags and segmenting Oct 29, 2013

I spent more time playing with OmegaT. It has a tag removal function which worked for my test document.

Regarding segmentation, for Chinese I am having a hard time coming up with good rules that can be expressed using regex, so I copied the Japanese rules and added a few, basically I'm segmenting by punctuations: comma, period, semi-colon, colon, question mark. It does not look too bad and I'll test a few more docs before committing to using it. So far the process has been smoot
... See more
I spent more time playing with OmegaT. It has a tag removal function which worked for my test document.

Regarding segmentation, for Chinese I am having a hard time coming up with good rules that can be expressed using regex, so I copied the Japanese rules and added a few, basically I'm segmenting by punctuations: comma, period, semi-colon, colon, question mark. It does not look too bad and I'll test a few more docs before committing to using it. So far the process has been smooth.

[Edited at 2013-10-29 21:57 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Please recommend CAT tool for translating patent PDF files from Chinese to English







Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »