Please recommend CAT tool for translating patent PDF files from Chinese to English Thread poster: PatentTrans
| PatentTrans United States Local time: 17:32 Chinese to English
Hi all, I need to make a decision fast and appreciate your thoughts. Just received a contract to translate a number of patents from simplified Chinese to English and they are in PDF format. I have the capability of doing an OCR on these. Never used CAT before but given the number of deliverables I'm going have to use one. What in your opinion is the best TM software for my situation? The only language pair I need is Chinese/English, and I'm only translating patents, and if the CAT can handl... See more Hi all, I need to make a decision fast and appreciate your thoughts. Just received a contract to translate a number of patents from simplified Chinese to English and they are in PDF format. I have the capability of doing an OCR on these. Never used CAT before but given the number of deliverables I'm going have to use one. What in your opinion is the best TM software for my situation? The only language pair I need is Chinese/English, and I'm only translating patents, and if the CAT can handle scanned PDFs it's even better. Thanks so much in advance.
[Edited at 2013-10-27 19:55 GMT] ▲ Collapse | | | Consider PDF and CAT separately | Oct 27, 2013 |
Those CAT tools that include support for PDF have licensed the technology from third parties: for instance, SDL uses technology from Solid, Wordfast from BCL (Wordfast Pro) and ABBYY (Wordfast Anywhere), Déjà Vu from BCL (like Wordfast Pro), while memoQ uses a freebie converter. You would be better off selecting a CAT tool on its own merits as a CAT tool, and picking up a separate tool (or, better, separate tools) for PDF conversion. There are many different types of PDF and many ... See more Those CAT tools that include support for PDF have licensed the technology from third parties: for instance, SDL uses technology from Solid, Wordfast from BCL (Wordfast Pro) and ABBYY (Wordfast Anywhere), Déjà Vu from BCL (like Wordfast Pro), while memoQ uses a freebie converter. You would be better off selecting a CAT tool on its own merits as a CAT tool, and picking up a separate tool (or, better, separate tools) for PDF conversion. There are many different types of PDF and many different converters as well. No single converter is better than all others with all types of PDF: converter A may be the best for PDF X, while converter B will be better for PDF Y. This is why having several converters at your disposal in your arsenal could be a good idea. Regarding patents and Chinese, most CAT tools should be able to handle both. Choosing the most suitable tool is a matter of personal preferences, unless you're so eager to please your clients you will pick up the tool they "require"... ▲ Collapse | | | PatentTrans United States Local time: 17:32 Chinese to English TOPIC STARTER Any CAT tools better suited for Chinese? | Oct 27, 2013 |
Thanks. My client is not requiring me to use a CAT tool at all so this is for my own benefit. The output will be simple text. At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work. Also I've heard some CATs are not very good at segmenting Asian languages. | | | Michael Beijer United Kingdom Local time: 23:32 Member (2009) Dutch to English + ... agree with Dominique | Oct 27, 2013 |
You said you are in a hurry, so here is my quick answer. I suggest that you get (1) ABBYY FineReader to convert the PDFs to .docx, and (2) CafeTran or memoQ to translate them with. Michael | |
|
|
Five inexpensive CAT tools | Oct 27, 2013 |
PatentTrans wrote: At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work. About one year ago, I made a series of short videos about five free or inexpensive CAT tools: OmegaT MemSource Wordfast Anywhere CafeTran MetaTexis I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies PatentTrans wrote: Also I've heard some CATs are not very good at segmenting Asian languages. Just take a short sample Chinese text and see how each of them fares with it. You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market. | | | jyuan_us United States Local time: 18:32 Member (2005) English to Chinese + ... What did you mean by that? | Oct 27, 2013 |
PatentTrans wrote: Also I've heard some CATs are not very good at segmenting Asian languages. Segments too long? too short? Improper location of a sentence to segment with? I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it. | | | Phil Hand China Local time: 06:32 Chinese to English You can change segmenting rules | Oct 28, 2013 |
I guess in any CAT tool you can edit segmenting rules to suit your needs. I use SDL, and I use the function quite regularly. For patents, you need to make sure you've got a terminology manager working with your CAT, because maintaining consistent terminology will be useful. SDL is having compatibility issues with its terminology software MultiTerm right now, so it's probably not your best choice. Don't know what you've already got in terms of OCR, but I use a little freebie called Ha... See more I guess in any CAT tool you can edit segmenting rules to suit your needs. I use SDL, and I use the function quite regularly. For patents, you need to make sure you've got a terminology manager working with your CAT, because maintaining consistent terminology will be useful. SDL is having compatibility issues with its terminology software MultiTerm right now, so it's probably not your best choice. Don't know what you've already got in terms of OCR, but I use a little freebie called Hanwang, and it's OK. ▲ Collapse | | | PatentTrans United States Local time: 17:32 Chinese to English TOPIC STARTER I'm giving OmegaT a try | Oct 28, 2013 |
Dominique Pivard wrote: PatentTrans wrote: At this point I don't need anything fancy but just a reasonably priced tool that allows me to avoid doing repetitive work. About one year ago, I made a series of short videos about five free or inexpensive CAT tools: OmegaT MemSource Wordfast Anywhere CafeTran MetaTexis I also made one about Across Personal Edition, which is seemingly free, but which I would only recommend to my worse enemies PatentTrans wrote: Also I've heard some CATs are not very good at segmenting Asian languages. Just take a short sample Chinese text and see how each of them fares with it. You may also want to have a look at Heartsome Translation Studio. It's made by a company based in Hong Kong (so you would think they know a thing or two about segmentation with Chinese) and their entry-level edition isn't among the most expensive on the market. Thanks. I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started. Also thanks a lot to everyone else who replied to my request. I'm downloading Hanwang OCR right now and will give it a try in a little bit. | |
|
|
OmegaT resources | Oct 28, 2013 |
PatentTrans wrote: I watched your video and installed OmegaT and did a short trial run. Seems good enough for what I need, which is basically text to text translation. Your video really helped to get me started. Glad to hear you found the video useful! The OmegaT community has a very active mailing list on Yahoo with lots of helpful people, much more active than the corresponding ProZ forum. I strongly recommend you subscribe to the list, should you need to ask further questions about the tool. | | | Splitting segments in the patent stuff... | Oct 28, 2013 |
jyuan_us wrote: PatentTrans wrote: Also I've heard some CATs are not very good at segmenting Asian languages. Segments too long? too short? Improper location of a sentence to segment with? I guess no CAT tool can segment perfectly, or always segment a text the way you want. You will just have to bear with it. In fact, the automatic segmenting is not enough here, the patent jobs need a LOT of manual segmenting. I.e. if you start to split segments according to their meaning and some repetitive patterns, you'll be able to leverage more chunks of text. I doubt it can be easily automated in a sound way. Cheers GG | | | Michael Beijer United Kingdom Local time: 23:32 Member (2009) Dutch to English + ... following from what Grzegorz said... | Oct 28, 2013 |
If you are going to use the CAT tool for patents, you might want to choose a tool in which you can easily split and join segments. Patents often contain very long sentences, and as Grzegorz said, you might want to split these into smaller chunks, either to increase leverage (get more hits from your translation memories), or, simply to make them easier to handle with your poor human brain. Michael | | | A clean file will make your cat work | Oct 29, 2013 |
As suggested, you have lots of choices on TM software. But when you translate a file converted from OCR, you may find there are lots of tags in your cat, this will slow your work down severely. In this case, using or not using a cat is no difference, So recommend you to clean your files first with Transtool or Codezapper. This will make your file cleaner, so you can reuse the TM effectively. | |
|
|
PatentTrans United States Local time: 17:32 Chinese to English TOPIC STARTER Tags and segmenting | Oct 29, 2013 |
I spent more time playing with OmegaT. It has a tag removal function which worked for my test document. Regarding segmentation, for Chinese I am having a hard time coming up with good rules that can be expressed using regex, so I copied the Japanese rules and added a few, basically I'm segmenting by punctuations: comma, period, semi-colon, colon, question mark. It does not look too bad and I'll test a few more docs before committing to using it. So far the process has been smoot... See more I spent more time playing with OmegaT. It has a tag removal function which worked for my test document. Regarding segmentation, for Chinese I am having a hard time coming up with good rules that can be expressed using regex, so I copied the Japanese rules and added a few, basically I'm segmenting by punctuations: comma, period, semi-colon, colon, question mark. It does not look too bad and I'll test a few more docs before committing to using it. So far the process has been smooth.
[Edited at 2013-10-29 21:57 GMT] ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Please recommend CAT tool for translating patent PDF files from Chinese to English Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |