One of my visitors recently told me that whenever he updated a page on his website with an unspecified software, all the pound symbols ("£") on his page turned into question marks ("?") in his web browser. He would then have to manually fix it with a web editor (in his case, Dreamweaver). He wanted to know why this was the case, and how he could fix it.
The short answer to the question, for those who know the technical lingo, is that there's a character encoding mismatch. If you have no idea what I just said in the previous sentence, please read on for the long answer. You should also read on if you want to know how to solve the problem.
I'm sure many of you know (or at least have heard) that at a very low level, computers store data as numbers (loosely speaking). When you type something into your web page, like the word "Chris", it is not actually stored as a sequence of alphabetic characters. Under the hood, those letters are probably stored as the numbers 67, 104, 114, 105, 115, where the number 67 represents "C", 104 "h", 114 "r", 105 "i", and 115 "s". When your web browser shows that page, it translates those numbers back into the letters of the alphabet for display, so that we don't have to be a digital superhero to figure out what it means.
But how does any computer program know that 67 actually means "C" or vice versa? After all, it's not like when you look at the letter "C", you can say "I can feel in my bones that this is the number 67". Indeed, the mapping of a character to a particular number is arbitrary, and is done by convention. That is to say, some people got together and agreed that they would take 67 to mean the letter "C". Any software that followed that particular convention would automatically encode a "C" as 67 when they saved the file, and decode 67 as "C" when displaying it. Such a convention for encoding characters is, unsurprisingly, known as a "character encoding".
Unfortunately, there is not one character encoding that is used by everyone around the world. There are many. And therein lies the crux of the problem.
While a number of modern character encodings in use today map the basic English alphabet (and numerals) to the same number, the same cannot be said of the non-alphabetic (and non-numeric) characters. And the pound symbol ("£") is one of those that has a different value depending on which encoding you use.
For example, in one commonly-used character encoding system, known as the ISO 8859-1, the pound is encoded as the number 163. In another, known as Unicode UTF-8, it is translated as a sequence of two digits, 194 and 163. If your web page is in English, you are likely to be using one or the other of these 2 encodings, since web editors frequently use these by default. Very often, ISO 8859-1 is preferred by older software, and UTF-8 by newer ones.
If a piece of software saved "£" as 163 using ISO 8859-1, but another piece of software thinks that the web page was saved using UTF-8, the latter will not interpret 163 as "£", since under UTF-8, it will expect the page to say "194, 163" if it meant "£". Indeed, under UTF-8, a solitary 163 is not really a printable (ie, displayable) character. When a web browser encounters a character it cannot display, it either shows a question mark ("?") or a rectangular block (depending on which browser you use).
From the (all too brief) description given by my visitor, it seems like this was what had happened.
There are numerous situations that could have led to this mismatch of encodings. Here are a couple of common ones.
One possibility is that you used 2 different programs to update your site. One of them assumed the page was in ISO 8859-1 (or some other encoding) and the other in UTF-8 (or yet another encoding).
Another possibility is that you put a special marker (called a "meta tag") in your web page telling the web browser that you used a particular character encoding. But your editor either doesn't support that encoding or uses a different encoding by default, and saved your file accordingly.
This is more common than you think. For example, let's say you used Notepad, a Windows text editor, to create (or update) your web page. And you inserted the following HTML meta tag:
Many websites include such a line in their web page, including this very page on thesitewizard.com that you're
reading. Even if you don't know how to read HTML, I'm sure you'll have guessed that it declares your page
to be formatted with HTML (the part that says '
content="text/html') and that it is encoded as UTF-8
(the part that says "
So far so good. Except that Notepad, by default, saves files in what it calls the ANSI character encoding (technically, it's actually "Windows-1252"), which is similar to ISO 8859-1 but not 100% identical. If you used "£" in that document, it will be saved as 163, since that is the correct character code under Windows-1252 (and ISO 8859-1), but not UTF-8. Remember: Notepad is not a web browser or even a web editor. It doesn't know anything about HTML tags and does not read your document to say, "Oh, hello, there's something here that says 'UTF-8'. Maybe I should save it with that encoding."
When a web browser loads that page, it will notice that the document declares itself to be in UTF-8 (due to the HTML tag you inserted). As such, it will interpret and display everything according to the UTF-8 scheme. This leads to the problem being discussed, since, as you will recall from above, 163 in UTF-8 is not the pound sign.
Perhaps, at this point, you may want to protest, "But Chris, I didn't even touch the wretched pound when I updated my page. Why did it change?"
When editing software save a web page, they have to save the entire file to the disk, even if you only changed a small part of it. It cannot simply save the paragraph you altered or deleted or added. Since it had to rewrite the entire file, it will encode everything in whatever character encoding it's set up to use. So even if your page had been correctly encoded in UTF-8 (or whatever) before, by the time the operation is over, it will have a new encoding.
The quick fix is to insert "£" (without the quotes) instead of "£". By design, when a web browser encounters the sequence of characters "£", it will display it as "£". However, for this to work, if you are using a visual web editor, you will have to insert "£" as HTML code or you will just end up with the text "£" being displayed instead of "£".
For example, if you use the BlueGriffon web editor, you will need to follow the procedure I describe in How to Insert HTML Code in BlueGriffon to type "£". Likewise, if you use Dreamweaver, use the method given in How to Insert Raw HTML Code in Dreamweaver. Having said that, it's quite unlikely that you will need to do this if the only software you use to update your web page is Dreamweaver or BlueGriffon, since these programs save the file under the correct character encoding (unless you manually altered the meta tag instead of changing it using the editor's configuration options).
Anyway, as I mentioned in the title to this section, this is just a quick fix. It works fine, and will solve the immediate display issue, but it doesn't remove the real problem that led to this glitch, which is that of conflicting character encodings being specified and used. Eventually, this issue will resurface when you insert other characters that have different underlying values, such as the copyright symbol.
The root problem is that the encoding declared (or expected) in your file is not the one actually used. The better solution is thus to make sure that the declaration matches the encoding.
I think the most common scenario is that you have a program that saves in ISO 8859-1 (or Windows-1252), but your page says it's in UTF-8. It may say this because you explicitly put a meta tag there yourself, or because you used yet another program that inserted it.
If you use Notepad, and you have a meta tag that says UTF-8, either change it to say Windows-1252, or even better, get Notepad to save in UTF-8 format. To do the latter, in the "Save As" dialog box, look for the field labelled "Encoding" (near the bottom of the dialog box). By default, it should say "ANSI". Click the down arrow for that drop-down box and select "UTF-8" instead. (Note: I have only tested this with the Notepad that comes with Windows 7. I don't know if the ones that come with earlier or later versions of Windows have this same facility.)
Those who use a different editor will also need to specify the encoding, especially if you use Windows, since many text editors on that system seem to use Windows-1252 by default. (This is not true of visual web editors though. For example, Dreamweaver uses UTF-8 by default, and KompoZer ISO 8859-1.) Unfortunately, I can't be specific about how you can alter this setting since the way to do it varies from software to software. I suggest that you first check to see if the "Save As" dialog box has an "Encoding" (or "Format") option. If not, try the configuration options of that editor (click the different menus and look for line that says "Options" or "Configuration" or "Properties") and search for "Encoding", "File format", "Language", "Character set" or words to that effect.
If you use 2 or more programs to maintain the same page, you will need to synchronise ("synchronize" if you use a different variant of English) those programs so that they all agree on the character encoding. In general, if you have a choice of whether to use ISO 8859-1 or UTF-8, it's probably best to choose UTF-8, since the latter provides a lot more characters (having been designed to handle all human languages).
Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at https://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.
This article is copyrighted. Please do not reproduce or distribute this article in whole or part, in any form.
It will appear on your page as: