ASCII Text
Thread poster: Arabic & More
Arabic & More
Arabic & More  Identity Verified
Jordan
Arabic to English
+ ...
Mar 7, 2012

I will soon be working on a translation project that requires the text be rendered in "plain ASCII text using UTF-8 encoding." The guidelines specify that all files be delivered in plain text, and "not in some proprietary format (like MS Word)."

Just want to make sure I understand which program I should be using and how to use the correct encoding. Will I need to use Notepad, or is there actually a way to do this in MS Word? How do I make sure I am using UTF-8 encoding?

... See more
I will soon be working on a translation project that requires the text be rendered in "plain ASCII text using UTF-8 encoding." The guidelines specify that all files be delivered in plain text, and "not in some proprietary format (like MS Word)."

Just want to make sure I understand which program I should be using and how to use the correct encoding. Will I need to use Notepad, or is there actually a way to do this in MS Word? How do I make sure I am using UTF-8 encoding?

Thank you in advance for any pointers.
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 04:52
English to Hungarian
+ ...
Oh dear Mar 7, 2012

Amel Abdullah wrote:

I will soon be working on a translation project that requires the text be rendered in "plain ASCII text using UTF-8 encoding."

Whoever wrote that has no idea what they are talking about.
ASCII is a character encoding, and UTF-8 is another character encoding. In simple terms, ASCII only contains the letters of the English alphabet, numbers, and a couple of other characters. UTF-8 contains every character in every language on the planet (or close enough, anyway). UTF-8 is an extension of ASCII, i.e. the characters that are present in ASCII are encoded in the same way in UTF-8.

Anyway, you can generate UTF-8 txt files in both Notepad and MS Word. In Word, pick Save as..., then "Plain text" from the dropdown list and pick Unicode (UTF-8) from the "Other" encoding list.
In Notepad, you also need to use Save as... and pick UTF-8 from the encodings list. Neither of the two save in UTF-8 by default, unfortunately.

BTW there are two types of UTF-8 files: with a BOM or without BOM. Both Word and Notepad will generate UTF-8 files with a BOM. If you suspect that the files will be machine processed, let your client know your files have a BOM.


 
Natalie
Natalie  Identity Verified
Poland
Local time: 04:52
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
As far as I understand Mar 7, 2012

by "ASCII" they meant "plain text"... one of my customers had the same habit of naming things.

 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 04:52
English to Hungarian
+ ...
Not exactly Mar 7, 2012

Natalie wrote:

by "ASCII" they meant "plain text"... one of my customers had the same habit of naming things.


As I posted above, ASCII is one type (encoding) of plain text. It's not the default encoding in Windows, either (and the default depends on the localization settings of Windows). It's quite likely that the person who asked for an ASCII file didn't actually want an ASCII file. They likely wanted a txt file and had no idea about encodings and the mess they can end up with if they don't keep track of their encodings.


 
Natalie
Natalie  Identity Verified
Poland
Local time: 04:52
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
I know... Mar 7, 2012

... perfectly well what is ASCII. I am just trying to find a reasonable explanation to what was meant by the requirement of delivering a "plain ASCII text using UTF-8 encoding"

 
Neil Coffey
Neil Coffey  Identity Verified
United Kingdom
Local time: 03:52
French to English
+ ...
Send them a sample file? Mar 7, 2012

As others have pointed out, your client may have got slightly confused with the terminology. However, I think if you paste your text into Wordpad and save the resulting document as a "Unicode text document" (it's one of the options) then the client will get what they want.

ASCII can be seen as the most basic way of representing text on a computer. It can essentially only encode very "basic" characters: letters without accents, numbers, non-sexed quotes and one or two other symbols.<
... See more
As others have pointed out, your client may have got slightly confused with the terminology. However, I think if you paste your text into Wordpad and save the resulting document as a "Unicode text document" (it's one of the options) then the client will get what they want.

ASCII can be seen as the most basic way of representing text on a computer. It can essentially only encode very "basic" characters: letters without accents, numbers, non-sexed quotes and one or two other symbols.

For many languages and applications these days, this isn't sufficient-- either because you want to represent accented characters or because you want other things such as sexed quotes, a wider range of punctuation symbols etc. UTF-8 (which is one of the "Unicode" encodings) is essentially a standard which does allow many of these other types of character, but builds upon ASCII-- the first few characters of the Unicode standard may directly on to ASCII characters.

Or put another way, you can see UTF-8 as an "extension" of ASCII if that makes things easier: if you have text in ASCII format, then it is automatically in UTF-8, but not vice versa.

So... by mentioning both, it's not clear whether the client means that you can only use the "basic" set of characters that are part of ASCII, or whether you can also use accented letters etc. If that's not obvious from the project/language pair, then I'd check with them.

What I would suggest is that you send them a sample paragraph/section saved in Wordpad as I mentioned and ask them if that is what they require.

[Edited at 2012-03-07 15:13 GMT]
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 04:52
English to Hungarian
+ ...
"Unicode" Mar 7, 2012

Neil Coffey wrote:

As others have pointed out, your client may have got slightly confused with the terminology. However, I think if you paste your text into Wordpad and save the resulting document as a "Unicode text document" (it's one of the options) then the client will get what they want.


I never use wordpad, but it's more than likely that when you pick "Unicode", it will save in UTF-16 instead of UTF-8. I'm not even sure if it's UTF-16BE or LE, but it doesn't matter for our purposes.
In a hamfisted attempt at making things "simpler", Microsoft calls its favourite UTF-16 flavour "Unicode" and it calls the sysetm's (localization-dependent) default "ANSI". The first is just sloppy and imprecise, the second is simply untrue and misleading.

So, if you need to save in UTF-8, look for either "UTF-8" or "Unicode (UTF-8)" in the encoding list. What Microsoft calls "Unicode" is not UTF-8.


 
Neil Coffey
Neil Coffey  Identity Verified
United Kingdom
Local time: 03:52
French to English
+ ...
Wordpad Unicode option... Mar 8, 2012

FarkasAndras wrote:
I never use wordpad, but it's more than likely that when you pick "Unicode", it will save in UTF-16 instead of UTF-8.


Hmmm seems you're right. Trust Microsoft to opt for the stupidest option. OK, load into notepad, which has an explicit "UTF-8" option.


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 04:52
English to Hungarian
+ ...
MS and encoding Mar 8, 2012

Neil Coffey wrote:

FarkasAndras wrote:
I never use wordpad, but it's more than likely that when you pick "Unicode", it will save in UTF-16 instead of UTF-8.


Hmmm seems you're right. Trust Microsoft to opt for the stupidest option. OK, load into notepad, which has an explicit "UTF-8" option.

Yes, character encoding as a whole is already a disgusting, filthy, smelly mess, but MS managed to make the situation a lot worse with its bizarre naming conventions. They use the term "ANSI" for someting that's not an ANSI standard and not even the same thing across different computers. And then they call an encoding "Unicode", which is like calling a language "Romance" without specifying which Romance language they mean.
It's all historic relics, the "ANSI" naming convention is actually a leftover from DOS. Everyone should forget these silly encodings left over from the 60s and 70s already and just use UTF-8. I don't understand why Notepad doesn't save in UTF-8 by default... maybe in Win8.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

ASCII Text






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »