Looking for advice about corpus management tools Thread poster: Emanuele Vacca
| Emanuele Vacca Italy Local time: 13:33 Member (2020) English to Italian
As part of my translation method, I am building my own parallel corpus, i.e. a giant collection of multilingual documents downloaded from reliable sources that I regularly analyze in order to find the translation of specific terms/expressions and/or to read them in context. So far, I have been doing this manually: the documents are stored in different folders according to their subject field; I use my file manager’s search tool, I open the relevant documents both in the source and the target l... See more As part of my translation method, I am building my own parallel corpus, i.e. a giant collection of multilingual documents downloaded from reliable sources that I regularly analyze in order to find the translation of specific terms/expressions and/or to read them in context. So far, I have been doing this manually: the documents are stored in different folders according to their subject field; I use my file manager’s search tool, I open the relevant documents both in the source and the target language and I look for the translation in the target document. Unfortunately, this process takes a lot of time, especially when you perform it dozens of times a day. Therefore, I am looking for a software able to speed up this process a bit, ideally allowing me to type the term/expression and automatically opening all the source documents in which that term/expression is used, on the relevant page, and opening the target documents approximately on the same page where the source term/expression is located. I am aware that there are many tools, such as Sketch Engine and Wordsmith, which approximately do this job. Unfortunately, the former is a bit too expensive for me at the moment, and the problem with Wordsmith and other similar software is that they are only able to process txt files, and most of my documents are in pdf format. I have already tried to use some pdf to txt converters, but the output files are almost unreadable and thus unusable (probably because the pdf files I work with are quite complex). As you can understand, manually creating txt files from the pdf files is not a viable option because it would literally take ages. Moreover, in order for the parallel concordancers to work, each source txt file needs to be perfectly aligned with its target counterpart (and again, manually aligning them would take an eternity). As I said before, what I need is basically a tool able to open all the source pdf files in which a specific term/expression appears and to open their target counterpart on the same page in which the source term/expression is located. Is there any software capable of satisfying this need? What are your corpus management methods? All suggestions are welcome! Thank you in advance. ▲ Collapse | | | Rolf Keller Germany Local time: 13:33 English to German Manual searches can be speeded up a bit | Aug 26, 2018 |
Emanuele Vacca wrote: documents are stored in different folders according to their subject field For your pdfs you could use a pdf tool to combine several pdfs into a single pdf. I use my file manager’s search tool You have to use the indexing feature and set it up properly. Which operating system do you use? I have been doing this manually If you use Windows, you could speed up the process a bit. In Omni-Lookup I do the following: Set the cursor onto a word, hit a hotkey, and get all my glossaries, previous translations and TMs searched. Plus my offline dictionaries plus certain websites ... all this simultaneously. There is no need to convert .pdf to .txt. because Windows' index function is able to read .pdf. automatically opening all the source documents in which that term/expression is used What, if that term is contained in 150 documents? Opening all of them would take much resources & time. In Omni-Lookup you get a clickable list of the relevant documents. The filenames may help you to decide which documents you want to open. If you - upfront - restricted the search to documents of a certain subject you'd miss some hits. All of this will not help you to find the respective hits in the target language, though. But it might speed up manual searches. | | | Emanuele Vacca Italy Local time: 13:33 Member (2020) English to Italian TOPIC STARTER
Dear Rolf, Thank you so much for your reply! Unfortunately, my computer is a Mac; so I am unable to install Omni-Lookup. Why don't you develop a Mac version? Indeed, I might virtualize Windows, but my Mac would become quite slow. What do you mean by "indexing feature"? In Omni-Lookup you get a clickable list of the relevant documents. The filenames may help you to decide which documents you want to open.
I can already do this with Finder's search tool (Finder is Mac's file manager). | | |
|
|
Here is a demo: https://youtu.be/F3AaHHmjRhE And here is the macro: (When using Skim instead of Preview, perhaps the finding of the search term in the PDF can be automated. I'll look into that.) | | | Emanuele Vacca Italy Local time: 13:33 Member (2020) English to Italian TOPIC STARTER Thank you so much! | Sep 3, 2018 |
Dear Hans, What you have done is truly incredible! Thank you so much! I will try to understand and become familiar with this method as soon as possible (I have never used macros before, so it will probably take a while ). I will let you know if I have any doubt. Thank you again! | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Looking for advice about corpus management tools TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |