Is it advisable to use DTP software to replicate a PDF's layout/look?
Thread poster: Alvaro Pavié
Alvaro Pavié
Alvaro Pavié
Chile
Local time: 14:48
English to Spanish
+ ...
Sep 11, 2019

Greetings,

On Monday I got a PDF from a client who wasn't able to convert it into a Word file format. I'm only able to translate manually, and replicating the layout myself is the only option available as I have no money to hire someone else. Is it viable to use a DTP software to recreate the layout? Is there a good DTP software out there that's also free?

Thanks!

Update: Unable to extract text, translating manually.

[Edited at 2019-09-11 19:38 GMT]


 
Kevin Fulton
Kevin Fulton  Identity Verified
United States
Local time: 14:48
German to English
DTP separate task and charged separately Sep 11, 2019

Word has a number of useful features that lend themselves to doing layout work. It's not unreasonable to try to replicate simple formatting such as columns, bullet points, etc. using Word (or other word processing program) after extracting text from a PDF file. Anything beyond that is considered a separate task and should be charged accordingly – by the hour. Publisher, which comes with some versions of MS Office, might be useful for this. Fulll-featured DTP programs tend to be expensive and d... See more
Word has a number of useful features that lend themselves to doing layout work. It's not unreasonable to try to replicate simple formatting such as columns, bullet points, etc. using Word (or other word processing program) after extracting text from a PDF file. Anything beyond that is considered a separate task and should be charged accordingly – by the hour. Publisher, which comes with some versions of MS Office, might be useful for this. Fulll-featured DTP programs tend to be expensive and difficult to learn, which is why DTP costs extra when provided as part of a translation job.Collapse


Vadim Kadyrov
Philippe Etienne
Jorge Payan
Samuel Murray
Santino Mattia
 
Dimmo Petrov
Dimmo Petrov
Local time: 21:48
English to Bulgarian
Only if the client is paying for this Sep 11, 2019

Converting a pdf file to editable format must be done by the linguist only if it's a paid task. Recreating the exact formatting is time-consuming and annoying, especially in the common cases when this is a scanned pdf file.
You must always ask the client if they can find the original file used for creating the pdf file.
In case the client agrees that you perform the conversion task, you must specify if they prefer plain text only or full recreation of the formatting.
For me, be
... See more
Converting a pdf file to editable format must be done by the linguist only if it's a paid task. Recreating the exact formatting is time-consuming and annoying, especially in the common cases when this is a scanned pdf file.
You must always ask the client if they can find the original file used for creating the pdf file.
In case the client agrees that you perform the conversion task, you must specify if they prefer plain text only or full recreation of the formatting.
For me, best software for handling pdf files is Abbyy FineReader; second one is Adobe Acrobat Pro.
Collapse


Morano El-Kholy
Armand C.
 
Philippe Etienne
Philippe Etienne  Identity Verified
Spain
Local time: 20:48
Member
English to French
I experienced the situation once Sep 11, 2019

It was an end client, the translation consumer, who wanted the same layout as the PDF, but of course didn't have the underlying InDesign file. I didn't want to spend any time learning about DTP programs or struggling with the layout for ages in Word.
After informing the client about this, they required me to handle that DTP part too, so I assigned the task to a freelance DTP specialist, transferring the costs to the client.
Somebody who masters DTP is much quicker, has the right tool
... See more
It was an end client, the translation consumer, who wanted the same layout as the PDF, but of course didn't have the underlying InDesign file. I didn't want to spend any time learning about DTP programs or struggling with the layout for ages in Word.
After informing the client about this, they required me to handle that DTP part too, so I assigned the task to a freelance DTP specialist, transferring the costs to the client.
Somebody who masters DTP is much quicker, has the right tools to get to optimal results, and can even extract the text for optimal use in your preferred CAT tool, then incorporate the translation back into the DTP file.

It's not cheap, but headaches are costlier.

If you're constrained in terms of costs, you may have a look at Infix from Iceni. I've never tried it, but it's supposed to do exactly what you ask.

Philippe
Collapse


Kevin Fulton
Armand C.
 
John Fossey
John Fossey  Identity Verified
Canada
Local time: 14:48
Member (2008)
French to English
+ ...
Infix Sep 11, 2019

It's not free after the third page, but I have sometimes successfully used Infix. It exports the text from the PDF in XML format, which can be translated in any CAT tool. The translated text is then reimported into Infix which then recreates the PDF with the translated text.

Potential pitfalls:
- If you are working in a language pair where the target text is more voluminous than the source, you will have problems with the target text not fitting the allotted space.
- The
... See more
It's not free after the third page, but I have sometimes successfully used Infix. It exports the text from the PDF in XML format, which can be translated in any CAT tool. The translated text is then reimported into Infix which then recreates the PDF with the translated text.

Potential pitfalls:
- If you are working in a language pair where the target text is more voluminous than the source, you will have problems with the target text not fitting the allotted space.
- There are often font issues, where some unusual font is embedded in the PDF. Only characters actually used in the document are embedded and if the target text contains characters that were not in the source they will be skipped or replaced with a different font. Sometimes you can find the missing font online and install it on your computer to resolve this problem.
- Text that is actually part of an image will not be exported.
Collapse


DZiW (X)
 
Alvaro Pavié
Alvaro Pavié
Chile
Local time: 14:48
English to Spanish
+ ...
TOPIC STARTER
Can't use OCR software. Sep 11, 2019

I should have pointed out that I can't use OCR software as the PDF is protected and my client couldn't do the conversion herself because of that.

I just found out that the text extracted by Calibre is all messed up, so it wasn't a real solution after all. Don't want to use web-based solutions for extracting the text since I'm not sure if the material is confidential or not (my client never told me so.)

Also, the PDF doesn't have a simple layout: It contains different-co
... See more
I should have pointed out that I can't use OCR software as the PDF is protected and my client couldn't do the conversion herself because of that.

I just found out that the text extracted by Calibre is all messed up, so it wasn't a real solution after all. Don't want to use web-based solutions for extracting the text since I'm not sure if the material is confidential or not (my client never told me so.)

Also, the PDF doesn't have a simple layout: It contains different-colored headers and subheaders, columns are divided by straight lines and some large and small images and logos. Doesn't look like the type of document I could replicate using only Word.

Money is definitely a constraint, I'm just starting my professional career and barely make enough to afford basic stuff such as transportation, food and clothing. Hiring someone else to do the job is out of the question. Besides, I'd like to take this opportunity to learn to use DTP software, as I already learned Inkscape (similar to Illustrator, but free) and my client has been pretty happy with the results, yet the company she works for will not pay more for doing all this work, but I need it nonetheless.

Lastly, I solely need advice on how to proceed with the limited means at my disposal, so please keep that in mind when replying.

Thanks.

[Edited at 2019-09-11 16:02 GMT]

[Edited at 2019-09-11 16:03 GMT]
Collapse


 
Patricia Fierro, M. Sc.
Patricia Fierro, M. Sc.  Identity Verified
Ecuador
Local time: 13:48
English to Spanish
+ ...
Abbyy FineReader Sep 11, 2019

Hi,

I have Abbyy FineReader version 14 and it exports PDFs to Word files. The format usually matches the PDF format.

Maybe you can take screenshots and convert the files. This works with protected PDF files. Abbyy FineReader can work with image files, such as what you would get when storing the screenshots by using MS Paint or similar apps.

Good luck!
Patricia

[Edited at 2019-09-11 16:24 GMT]


Alvaro Pavié
 
Jorge Payan
Jorge Payan  Identity Verified
Colombia
Local time: 13:48
Member (2002)
German to Spanish
+ ...
Print and scan Sep 11, 2019

Patricia Fierro, M. Sc. wrote:

Maybe you can take screenshots and convert the files. This works with protected PDF files. Abbyy FineReader can work with image files, such as what you would get when storing the screenshots by using MS Paint or similar apps.



My approach would be to print the file and then scan it in color. It will remove the problem with password protection and you could then use OCR software.

Customarily, I convert the text in the image to plain text and not to Word. It saves a lot of time in the DTP process.

Saludos


Philip Lees
Morano El-Kholy
 
Alvaro Pavié
Alvaro Pavié
Chile
Local time: 14:48
English to Spanish
+ ...
TOPIC STARTER
Already took screenshots. Don't know how to use Transtools properly. Sep 11, 2019

Jorge Payan wrote:

My approach would be to print the file and then scan it in color. It will remove the problem with password protection and you could then use OCR software.

Customarily, I convert the text in the image to plain text and not to Word. It saves a lot of time in the DTP process.

Saludos



I already took screenshots of the pages and saved them into bmp (24 bits) format. The quality is not the same as that of the original, but it does seem to work somewhat if I export it as a Word file. I'm still having issues with Transtools, though. Can't seem to clean tags properly as everything turns out even more messy than it was after cleaning.


 
Alvaro Pavié
Alvaro Pavié
Chile
Local time: 14:48
English to Spanish
+ ...
TOPIC STARTER
Update. Sep 11, 2019

Finally gave up on trying to extract the text from the PDF so I'm just translating on a blank .docx file and will attempt to replicate the layout once I finish translating. Is Scribus a good choice for DTP?

 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:48
Member (2006)
English to Afrikaans
+ ...
Free DTP Sep 13, 2019

Alvaro Pavié wrote:
Finally gave up on trying to extract the text from the PDF...


Even if you could use an OCR program to extract the text and save it as a Word file, the layout would not be translator-friendly, particularly with the type of document that you've been describing. When an OCR program tries to mimic the layout, it uses all kinds of tricks that make the text look good on screen but makes the document a nightmare to edit. For example, it might put every single line of a paragraph in its own little floating text box. It looks great on the screen and on paper, but it is practically untranslatable. Or, you can set the OCR program to create an edit-friendly document, but that just means that the most difficult parts of the layout isn't done by the OCR program, but left for you to do.

...so I'm just translating on a blank .docx file and will attempt to replicate the layout once I finish translating.


What you should do is to type/get the source text in plain text, then create a formatted version of the file (with the source text), and then translate that file (e.g. in a CAT tool), and then afterwards fix minor layout inconsistencies that were introduced by the process.

Is Scribus a good choice for DTP?


Look, I'm sure Scribus, Canva, MS Publisher and OpenOffice Draw etc are fine to use, but learning to use DTP isn't quick either. In most cases, however, if you use a DTP program, you would do the translation in plain text first, then create the layout, and then copy/paste the content into the DTP program. It's a lot of work.

There may be some DTP programs that allow you to create the layout first, with the source text, and then translate the DTP file (either directly or by text export/import). OmegaT can translate OpenOffice Draw files directly. Scribus files are XML-like files (though not actual XML) with the translatable content as values of the CH attribute of the ITEXT tag, so you may be able to convince some CAT tool to translate it.

But don't forget that you'd still have to fix formatting and layout problems in the DTP program afterwards that are caused by e.g. the source text and target text being of different lengths, so you still need to be an expert at fixing formatting in the DTP program.

[Edited at 2019-09-13 08:20 GMT]


 
Alvaro Pavié
Alvaro Pavié
Chile
Local time: 14:48
English to Spanish
+ ...
TOPIC STARTER
Clarification, please. Sep 13, 2019

Samuel Murray wrote:

What you should do is to type/get the source text in plain text, then create a formatted version of the file (with the source text), and then translate that file (e.g. in a CAT tool), and then afterwards fix minor layout inconsistencies that were introduced by the process.


Could you elaborate more on this please? I don't quite get what you mean by saying "create a formatted version of the file (with the source text)". My translation is ready, so all I need to do now is recreate the layout.


 
VIP9N
VIP9N
Local time: 21:48
Russian to English
+ ...
Layout, format and protected pdfs Sep 16, 2019

Alvaro Pavié wrote:

Could you elaborate more on this please? I don't quite get what you mean by saying "create a formatted version of the file (with the source text)"...


It means that when the translation of pure text is done, you have to start formatting it in order to get the translated version, which looks about the same as the original.

The only question arises: if you recreate the translation in any DTP-system, you will have at your possession the file, which will be either in the native format of that system, or an exported pdf. Are your clients ready for that?

And, by the way, if you often face protected pdfs, you better buy that small utility: https://www.pdfdecrypter.com/purchase.html
It will clear any useless "protections" and you will be able to OCR that kind of files.

Good luck


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 20:48
Member (2006)
English to Afrikaans
+ ...
@Alvaro Sep 18, 2019

Alvaro Pavié wrote:
Samuel Murray wrote:
What you should do is (1) to type/get the source text in plain text, then (2) create a formatted version of the file (with the source text), and then (3) translate that file (e.g. in a CAT tool), and then (4) afterwards fix minor layout inconsistencies that were introduced by the process.

Could you elaborate more on this please? I don't quite get what you mean by saying "create a formatted version of the file (with the source text)". My translation is ready, so all I need to do now is recreate the layout.


I think this may have been answered in another thread, but just to clarify: we can distinguish between two possible workflows for producing a translated, formatted version of a non-editable file using both CAT and DTP, namely either (A) first translate the source text, then create a formatted target file or (B) first create a formatted source file, then translate it. I prefer to use method B, but both methods have advantages and disadvantages.

Method A: CAT first, DTP second
If you use method A, i.e. take care of formatting/layout only after the translation step is done, then you don't have to be careful when using the DTP program to ensure that the file is "CAT friendly". For a file to be CAT friendly, there should be no untranslatable text, translatable text should not be converted to images, sentences must remain unbroken, and related content should preferably all stay together. Not all DTP formats support this well. For example, the current version of Scribus will always split a sentence into two separate block-level elements if you insert a line break, regardless of whether it's a hard or a soft line break. This means that Scribus can't produce CAT friendly files unless the DTP person (i.e. you) is very careful to use alternative methods to get the same effect, e.g. by setting margins on a per-line basis instead of using manual line breaks. So, if you use method A, you can finish the translation step and then focus all of your attention on the DTP step and have complete freedom to use all of the DTP program's facilities to create the exact layout that you require.

Method B: DTP first, CAT second
Conversely, if you use method B, i.e. take care of formatting/layout before the translation is started, then you have to be careful when creating the formatted file to ensure that the file is CAT friendly, because you are going to be translating it in the CAT tool (either directly or as an export/import format). The main advantage of this method is that it allows for better time management. Since you will spend very little time in the DTP program after completing the translation step (i.e. to make minor adjustments only), you can schedule your time based on how long the translation is expected to take. With method A, you don't know how long it is going to take to create a formatted version of the file in the DTP program, and therefore you are forced to complete the translation step long before the delivery deadline, in order to give yourself enough time to complete the DTP step. With method B, "translation" is essentially the last step, and you know from experience how long it's going to take to do the translation.


Armand C.
Vaclav Hruza
 
Multiverse Solutions s.r.o. (X)
Multiverse Solutions s.r.o. (X)
Local time: 20:48
Polish to English
+ ...
Fancy or real need? Jan 13, 2020

The above is the basic question for the customer.

In most cases, the customer would like to see the translation as close to the original as possible. Understandable, as they sometimes need to make a quick 'alignment' in their minds to anchor the translation.

In these cases, replicating the original layout is not really needed. You may make a rough 'copy' of the layout in any text editor you use. Setting up simple tables, cleaning up styles, customising font and paragrap
... See more
The above is the basic question for the customer.

In most cases, the customer would like to see the translation as close to the original as possible. Understandable, as they sometimes need to make a quick 'alignment' in their minds to anchor the translation.

In these cases, replicating the original layout is not really needed. You may make a rough 'copy' of the layout in any text editor you use. Setting up simple tables, cleaning up styles, customising font and paragraph parameters are all part of the standard translation process. At least, in my work.

Borderline tasks include overwriting captions on images, charts and other graphical elements. It takes time and extra software plus some skills. For this, I request (usually) a small surcharge, eg equivalent to 2-3 pages of translation. No big deal for either party, and the work is done properly.

High-end layout work comes when the customer needs to print their documents: product specs sheets, SDS cards, manuals, etc. The final versions has to be tidy, legible, practical, and basically should convey the concept of the customer being a professional business. All this requires skills and software. Which translates, literally, to higher prices.

Real layout and typesetting work may be more expensive than translation, and it should be far more expensive when unique design is welcome. Plus IP rights. This is the part where most customers back off. In theory, because the work is 'too expensive'. Compare the prices of their products with your DTP proposal and you will soon discover strange things about human nature.

False DTP work by translators is a fascinating subject in itself. Ultimately, if you agree to do things from another field, and for free, you will ruin both their market and yours. The customer will be happy to get for free what they convert into hard cash. However, if you are good at this false DTP work, you may be flooded with such 'orders', instead of focusing on pricey translations.
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Is it advisable to use DTP software to replicate a PDF's layout/look?






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »