Tales of the Parodyverse >> View Post
·
Post By
Manga Shoggoth

Member Since: Fri Jan 02, 2004
Posts: 391
In Reply To
HH

Subj: Re: Nobody should question your life choices.
Posted: Tue Dec 15, 2015 at 06:28:11 pm EST (Viewed 696 times)
Reply Subj: Re: Nobody should question your life choices.
Posted: Tue Dec 15, 2015 at 03:24:24 pm EST



    Quote:
    Not following you there. When I look at the source in notepad I see the three-character string that the PVB software has translated the original characters unto.


That's the point. Notepad is a windows product and is doing at least some of the character conversions. At the very least it is doing a Unicode conversion. What you see as three characters is actually stored as 8 bytes.

If you are using basic ASCII you use a single byte per character, and the letters of the alphabet all map on to specific numbers. To use foreign characters you have to muck around with code pages whereby a subset of the characters are changed depending on which code page you are using.

In UTF-8 (which is what we are using here) the standard alphabetics are still represented by the same single bytes (to be backward compatable with pure ASCII) , but the more esoteric characters are represented by multiple bytes - usually two or three, but can go up to five. Thus:

York’s

is represented by the following bytes (in Hexadecimal, read straight out of FrHed):

Y - 59 (Same as Standard ASCII)
o - 6F (Same as Standard ASCII)
r - 72 (Same as Standard ASCII)
k - 6B (Same as Standard ASCII)
’ - C3 A2 E2 82 AC E2 84 A2 (Actually 8 bytes describing three characters)
s - 73 (Same as Standard ASCII)

What has happened is that C3 A2 E2 82 AC E2 84 A2 is probably also the sequence to identify a single quote in one or other character set. Somewhere along the line a character conversion goofed and the bytecode was misinterpreted as, and now saved as, ’.

If you look using an editor such as PFE (or, indeed, FrHed, which explicitly operates at the byte level) you will see the underlying characters.

(The above is why I stay well away from character set issues - what you are reading there is the very simple version! Jason may threaten to unload the complex version on you.)


    Quote:
    How can that help with the details? Is it something I could use?


It would allow you to identify the sequence(s) of bytes being used (as above), which you could then feed in to the editing software. It is easy enough to use, and you would only be reading the files, not trying to edit them.

For example, you would want to replace the byte string C3 A2 E2 82 AC E2 84 A2 with an apostrophe.





As is always the case with my writing, please feel free to comment. I welcome both positive and negative criticism of my work, although I cannot promise to enjoy the negative.

Posted with Mozilla Firefox 42.0 on Windows 7
On Topic™ © 2003-2024 Powermad Software
Copyright © 2003-2024 by Powermad Software