Looking for advice about corpus management tools
Thread poster: Emanuele Vacca
Emanuele Vacca
Emanuele Vacca  Identity Verified
Italy
Local time: 13:33
Member (2020)
English to Italian
Aug 25, 2018

As part of my translation method, I am building my own parallel corpus, i.e. a giant collection of multilingual documents downloaded from reliable sources that I regularly analyze in order to find the translation of specific terms/expressions and/or to read them in context. So far, I have been doing this manually: the documents are stored in different folders according to their subject field; I use my file manager’s search tool, I open the relevant documents both in the source and the target l... See more
As part of my translation method, I am building my own parallel corpus, i.e. a giant collection of multilingual documents downloaded from reliable sources that I regularly analyze in order to find the translation of specific terms/expressions and/or to read them in context. So far, I have been doing this manually: the documents are stored in different folders according to their subject field; I use my file manager’s search tool, I open the relevant documents both in the source and the target language and I look for the translation in the target document. Unfortunately, this process takes a lot of time, especially when you perform it dozens of times a day. Therefore, I am looking for a software able to speed up this process a bit, ideally allowing me to type the term/expression and automatically opening all the source documents in which that term/expression is used, on the relevant page, and opening the target documents approximately on the same page where the source term/expression is located.
I am aware that there are many tools, such as Sketch Engine and Wordsmith, which approximately do this job. Unfortunately, the former is a bit too expensive for me at the moment, and the problem with Wordsmith and other similar software is that they are only able to process txt files, and most of my documents are in pdf format. I have already tried to use some pdf to txt converters, but the output files are almost unreadable and thus unusable (probably because the pdf files I work with are quite complex). As you can understand, manually creating txt files from the pdf files is not a viable option because it would literally take ages. Moreover, in order for the parallel concordancers to work, each source txt file needs to be perfectly aligned with its target counterpart (and again, manually aligning them would take an eternity).
As I said before, what I need is basically a tool able to open all the source pdf files in which a specific term/expression appears and to open their target counterpart on the same page in which the source term/expression is located.
Is there any software capable of satisfying this need? What are your corpus management methods? All suggestions are welcome! Thank you in advance.
Collapse


 
Rolf Keller
Rolf Keller
Germany
Local time: 13:33
English to German
Manual searches can be speeded up a bit Aug 26, 2018

Emanuele Vacca wrote:

documents are stored in different folders according to their subject field


For your pdfs you could use a pdf tool to combine several pdfs into a single pdf.

I use my file manager’s search tool


You have to use the indexing feature and set it up properly. Which operating system do you use?

I have been doing this manually


If you use Windows, you could speed up the process a bit. In Omni-Lookup I do the following: Set the cursor onto a word, hit a hotkey, and get all my glossaries, previous translations and TMs searched. Plus my offline dictionaries plus certain websites ... all this simultaneously. There is no need to convert .pdf to .txt. because Windows' index function is able to read .pdf.

automatically opening all the source documents in which that term/expression is used


What, if that term is contained in 150 documents? Opening all of them would take much resources & time. In Omni-Lookup you get a clickable list of the relevant documents. The filenames may help you to decide which documents you want to open. If you - upfront - restricted the search to documents of a certain subject you'd miss some hits.

All of this will not help you to find the respective hits in the target language, though. But it might speed up manual searches.


 
Emanuele Vacca
Emanuele Vacca  Identity Verified
Italy
Local time: 13:33
Member (2020)
English to Italian
TOPIC STARTER
Dear Rolf, Aug 26, 2018

Dear Rolf,

Thank you so much for your reply! Unfortunately, my computer is a Mac; so I am unable to install Omni-Lookup. Why don't you develop a Mac version? Indeed, I might virtualize Windows, but my Mac would become quite slow.
What do you mean by "indexing feature"?


In Omni-Lookup you get a clickable list of the relevant documents. The filenames may help you to decide which documents you want to open.

I can already do this with Finder's search tool (Finder is Mac's file manager).


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
HoudahSpot Aug 27, 2018

Have a look at HoudahSpot:

https://www.houdah.com/houdahSpot/

Enter a search term, browse the preview pane, open a relevant pdf.

Use this macro to get the path of th
... See more
Have a look at HoudahSpot:

https://www.houdah.com/houdahSpot/

Enter a search term, browse the preview pane, open a relevant pdf.

Use this macro to get the path of the pdf:

https://forum.keyboardmaestro.com/t/finding-the-file-path-for-an-open-document-in-the-front-application/7095

Create another macro to open the parallel pdf, via the copied path. (Either store source and target in different folders or add language codes to the pdf names.)

With Skim instead of Preview, you perhaps can open the parallel pdf at the same page.

You could ask here:

https://forum.keyboardmaestro.com/latest
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Demo Aug 28, 2018

Here is a demo:

https://youtu.be/F3AaHHmjRhE

And here is the macro:

Untitled

(When using Skim instead of Preview, perhaps the finding of the search term in the PDF can be automated. I'll look into that.)


Emanuele Vacca
 
Emanuele Vacca
Emanuele Vacca  Identity Verified
Italy
Local time: 13:33
Member (2020)
English to Italian
TOPIC STARTER
Thank you so much! Sep 3, 2018

Dear Hans,

What you have done is truly incredible! Thank you so much! I will try to understand and become familiar with this method as soon as possible (I have never used macros before, so it will probably take a while ). I will let you know if I have any doubt. Thank you again!


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Looking for advice about corpus management tools






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »