Although you may not think twice about the filenames you use for your web page, there are certain considerations about naming your files that you may want to take into account when creating a new page. These are no more than broad guidelines; there is no "web authority" as such that forces webmasters to name their files in a certain way. However, from experience, I have found that certain types of filenames work better than others. This article provides some tips on the naming of files for your website that may help you avoid problems with the operation of your site in the future.
This article does not actually tell you how to create a web page. It deals only with naming conventions for the files on your website. If you want a tutorial on making a website, please read the The Beginner's A-Z Guide to Starting/Creating Your Own Website instead.
This article is also not about creating good titles for your web pages. That's an entirely different topic altogether.
One of the common beginner's mistakes when naming filenames is to put spaces in the filenames. Obviously, modern operating systems like Windows, Linux, Mac OS X, etc, allow filenames to contain spaces. On your own computer, having spaces in the filenames improve the general usability of your system, since the spaces separate out the words in the name and let you quickly locate the file you want at a glance (provided of course you named your files sensibly).
However, spaces in web files are problematic. Let's take a file named "lousy web page filename.html" as an example. How do you form a web address ("URL") from such a filename? Webmasters new to URLs may think that "http://www.example.com/lousy web page filename.html" is the right form, but they will be wrong. Web browsers and search engines do not expect spaces in URLs. Every space has to be replaced by "%20" (without the quotes). The correct URL for such a file should thus be "http://www.example.com/lousy%20web%20page%20filename.html".
The problem comes when you manually add a link to that page from another page, and you forget to replace all the spaces with the encoded "%20" form. Don't think that this is an unlikely event. I can't even begin to count the number of new webmasters who have written to me asking why they get a 404 Page Not Found error when they link to another page on their site that they know exists. When they tell me the filename of that page, the answer becomes obvious.
Avoid this problem by not using spaces at all in your filenames. Filenames with spaces work fine on your own hard disk. When you create files for the web, the mistakes that can happen with space-filled filenames, as well as the tedium involved replacing all those spaces with "%20", is just not worth the trouble.
If you use a Windows machine, you may have got used to the fact that "MyWebPage.html" refer to the same file as "mywebpage.html". This is not the case with all operating systems. For example, Unix-based systems like Linux and FreeBSD consider them to be two different files.
This affects webmasters in a few ways:
If your web host uses a case-sensitive file system and you indiscriminately refer to your files using whatever capitalization (case) you wish, you will find that some of your links will point nowhere.
Even if your web host uses Windows for its web server, who's to say that you won't move to a different web host in some distant future, one that uses a different operating system? If you find yourself in such a situation, going through all your files with a fine tooth comb to fix all your links is not an enviable task.
Perhaps you think that you'll be careful to always refer to the file using the correct case. That's what you think now. But a few years down the road, will you remember what the case is? You may link to that file with yet another new case combination. And when you test the link, it will work fine if your site is still hosted on a Windows server, so you may not know that you have created a link that will potentially break in the future.
Do you really think that others referring to your site will use your case system? Remember that your site does not exist in a vacuum. Others will link to your site, or talk about it in the web forums. Will they bother to learn a complicated mixed-case system so that they can refer to the page with accuracy?
If your page gets linked to from either your site or other sites using different case combinations, search engines will treat each combination as a distinct page. You will run into the duplicate content problem that I mentioned in my article on How to Create a Search Engine Friendly Website.
The simplest way to avoid all these problems is to just stick to using small letters (lowercase) in your filenames.
In my article on How to Make Your WordPress Blog Search-Engine-Friendly, I mentioned some of the benefits of having the title of your web page or at least the key portions of it as part of your URL. The main thing from that article that is relevant to the discussion here is, instead of naming your files "page1.html", "page2.html" and so on, name them "title-of-your-webpage.html".
A filename with your title, or at least the main keywords from your title, serves at least 2 purposes:
When someone posts about your site in a forum or their blog, very often they'll just dump the URL into that post. If your filename is sufficiently descriptive, a reader looking at the post will be able to decide whether or not to click the link to visit your web page.
The link to your site containing the title also gives the search engines a clue about what your page is about. The hint, along with the text occurring on your web page, will help the search engine decide whether to return it in the results when someone searches for a relevant topic.
However, it isn't wise to just load every important word you can think of into your filename. This makes your filenames look very "spammy". Internet-savvy users, seeing such a link in the forums, are less likely to click on such links. Long filenames also have their own problems, which leads me to the next point.
Although most operating systems allow extremely long filenames, it's best not to make them excessively long.
If you have a page that has a URL like "http://www.example.com/example-of-a-filename-that-is-extremely-lengthy.html", and some one refers to that page in a web message board, many forum software will shorten that long filename so that it fits within the confines of the browser window. The URL is thus shortened to something like "http://www.example.com/example-of-a-filename...html". At the same time, the software usually also turns the text into a link pointing to the correct full address. Which is fine. So far.
The problem comes when somebody else tries to copy that link to some other post by simply dragging their mouse across the actual text, copying and pasting. This results in a new link with an embedded ellipsis. Such a link will obviously not be pointing to your web page.
In view of this, you may want to restrain yourself from creating overly long filenames, no matter how descriptive you think they may be. If your page title is very long, just include the main words. You can do this, for example, by dropping things like the articles ("a", "an", "the") and prepositions ("to", "on", "from", etc).
Since you shouldn't use spaces in your filenames, how should you separate the words? The astute reader will probably have noticed that the example filenames I provide in this tutorial have their words separated by the hyphen character. The hyphen character is regarded by search engines as a word separator, much the way a space is. It doesn't have the disadvantages of the space character, however, in that you don't need to encode it in a URL. As such, it is a good character to use as a word separator.
Incidentally, you should not use the underscore character ("_") to separate words. Although the underscore visually separates words to humans, at this time, many search engines just see it as another letter of the alphabet. As such, if you write a word like "joined_word", the engine will not see it as two words "joined" and "word" but as a single word that has an embedded underscore.
Before you write to me to tell me that there are pages on howtohaven.com, thesitewizard.com and thefreecountry.com that violate one or more of these rules, let me say that when I wrote earlier that I've learned "from experience" that some filenames are better than others, I wasn't kidding. To put it another way, when I mentioned some of the undesirable consequences of lousy names, those consequences aren't hypothetical scenarios I conjured up in my head.
Over time, I have changed some of those bad filenames when I redesigned my sites (albeit with their own disastrous consequences), but I'm aware I still have many pages with problematic and silly filenames. They will probably stay as they are, since they have garnered a number of links to them over time, and I'm tired of creating cures that are worse than the disease.
On the bright side, of course, if I didn't make quite so many mistakes over the years, I wouldn't have been able to write this and help other webmasters. Since you have the benefit of my hindsight, you can of course avoid the problem altogether.
Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at http://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.
This article is copyrighted. Please do not reproduce this article in whole or part, in any form, without obtaining my written permission.
It will appear on your page as: