Tales of the Parodyverse >> View Post
·
Post By
HH said it was an old summer house

In Reply To
Manga Shoggoth

Member Since: Fri Jan 02, 2004
Posts: 391
Subj: Nobody should question your life choices.
Posted: Tue Dec 15, 2015 at 10:41:53 am EST (Viewed 3 times)
Reply Subj: At the moment I smell largely of varnish. Low oudor varnish.
Posted: Tue Dec 15, 2015 at 09:33:14 am EST (Viewed 708 times)



    Quote:
    It looks like these are things like smart quotes (generated by Word, amongst other things). The only thing you can really do is identify the exact binary representation of the character in the offending web page, and do a search and replace using that.


The way it is represented in text is:

Substitutes ’ for ' (that's a-circumflex, euro, trademark if the board auto-translates it to something else)

Substitutes “ for " (that's a-circumflex, euro, joined oe for opening speech marks)

Substitutes ” for " (that's a-circumflex, euro, box with a smiley face in it for closing speech marks)

Substitutes €¦? for ... (that euro, vertical line broken in the middle, question mark for ellipses)

Unfortunately, when I try to fix this using a global find/replace it almost solves the problem pages (leaving only one square box character instead of the whole jumble) but adds that same character to all the pages that were fine before. My problem is seperating out which of the vast number of pages on the site need treatment, lifting them out of their existing file structure, treating them, then replacing them where they were.



    Quote:
    Also, you need to replace the longer strings before the shorter ones in case the the shorter string appears within the longer one - that will really muck things up...



    Quote:
    It may be possible to fix this by switching the page character set but it hasn't worked for me so far in my tests.


When patience allows I'll try copying over some of the garbled body text to a non-garbled page body and see if it sorts itself.





    Quote:
    This one is easy, by the look of it...



    Quote:
    Each HTML page has a {body} tag in it - this needs to be changed to {body bgcolor="black"}.


I'll see what can be done about that. I'll need to check I'm not doubling up on older pages where that line is already there, or else identify exactly when the change in scripting happened and try and find a way of doing a date-sensitive find/replace; except again it seems to be intermittent, dating back to at least June 2008.


    Quote:
    When doing the search, make sure the strings are specified with both open and closing angle brackets just in case the body tag already has other elements (which will usually include a bgcolor element)


Noted. This time round I didn't try and adapt the macro you sent me many years ago but turned to a bulk text editor program called "FAR - Find and Replace".


    Quote:
    Depends on how you are doing the search and replace, and exactly what is going wrong with the working pages. If it is a straight file-age-related thing then you could copy the offending files to a working directory, update them, then copy them back.


Looking back at the archived version of UT 310-327 it looks like I manually edited them individually to cut out all the board-specific gubbings at the top of the page (links to make replies, print etc. and a repeat of the title) and replace it with a much simpler header and first line of body text. I may have to look at doing that for other pages, but its a very tedious job given the volume. Unfortunately the one line I do need to keep from the header is the title line, so a bulk edit gets even trickier.


    Quote:
    As noted, you may also be hitting the case where some of the odd strings are subsets of the other - for example if one code was ABCD and the other was ABC and you needed to map ABCD to WXYZ and ABC to MNO, you would have to remap ABCD first. If you remap ABC first you would end up with a lot of MNODs.


I'd have to identify the longer string first, though, and that's where I'm coming up blank.






Posted with Mozilla Firefox 42.0 on Windows XP
On Topic™ © 2003-2024 Powermad Software
Copyright © 2003-2024 by Powermad Software