https://www.proz.com/forum/post_editing_machine_translation/305806-corpus_analysis.html&phpv_redirected=1

Corpus Analysis
Thread poster: Juan Martín Fernández Rowda
Juan Martín Fernández Rowda
Juan Martín Fernández Rowda  Identity Verified
United States
Local time: 03:55
English to Spanish
+ ...
Aug 23, 2016

As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking mainly from a linguist’s perspective, the challenge is how can one make any sense of all of these millions of words? What do you do if you want to find out whe... See more
As you probably know, Statistical Machine Translation (SMT) needs considerably big amounts of text data to produce good translations. We are talking about millions of words. At the same time, SMT has the ability to translate millions of words relatively fast (VERY fast, in comparison to human translators). In this scenario, and speaking mainly from a linguist’s perspective, the challenge is how can one make any sense of all of these millions of words? What do you do if you want to find out whether a corpus is good enough to be used in your MT system? How do you know what to improve if you realize a corpus is not good? How can you know what are the main topics covered in your corpus?

It’s unrealistic to try to understand your corpus by reading every single line or word.

Corpus analysis can help you find answers to these questions. It can also help you understand how your MT system is performing and why. It can even help you understand how your post-editors are performing.

I cover some analysis techniques and tips that I believe are useful and effective to understand your corpus better in this post:

https://www.linkedin.com/pulse/corpus-analysis-part-i-juan-martín-fernández-rowda?trk=pulse_spock-articles
Collapse


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Corpus Analysis


Translation news





Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »