If your site is one of those websites where only a few pages seem to be indexed by the search engines, this article is for you. It describes how you can provide the major search engines with a list of the all the pages on your website, thus allowing them to learn of the existence of pages which they may have missed in the past.
How do you know which pages of your site has been indexed by a search engine and which not? One way is to use "site:domain-name" to search for your site. This works with Google, Bing and Yahoo, although not with Ask.
For example, if your domain is example.com, type "site:example.com" (without the quotes) into the search field of the search engine. From the results list, you should be able to see all the pages which the search engine knows about. If you find that a page from your site is not listed, and you have not intentionally blocked it using robots.txt or a meta tag, then perhaps that search engine does not know about that page or has been unable to access it.
Here's what to do, when you discover that there are pages not indexed by the search engine.
The first thing to do is to check your robots.txt
file, and make sure it complies with the
rules of a robots.txt file.
Many webmasters, new and old, unintentionally block a search engine from a part of their site by having
errors in their robots.txt
file.
Another thing you might want to do is to make sure that your web page does not have a meta tag that prevents a robot from indexing a particular page. This may occur if you have ever put a meta "noindex" tag on the page, and later wanted it indexed but forgot to remove it.
The major search engines, Google, Bing, Yahoo and Ask, all support something known as a Sitemap file. This is not the "Site Map" that you see on many websites, including thesitewizard.com. My Site Map and others like it are primarily designed to help human beings find specific pages on the website. The sitemap file that uses the Sitemap protocol is, instead, designed for search engines, and is not at all human-friendly.
Sitemaps have to adhere to a particular format. The detailed specifications for this can be found at the sitemaps.org website. It is not necessary to use every aspect of the specification to create a site map if all you want is to make sure the search engines locate all your web pages. Details on how to create your own sitemap will be given later in this article.
As a result of the sitemap protocol, an extension to the robots.txt
file has been agreed by the search engines. Once you have
finished creating the sitemap file and uploaded it to your website, modify your robots.txt
file to include the following line:
You should change the web address ("URL") given to the actual location of your sitemap file. For example, change "www.example.com" to your domain name and "name-of-sitemap-file.xml" to the name that you have given your sitemap file.
If you don't have a robots.txt
file, please see my
article on robots.txt for more information on how to create one. The article can be found at
https://www.thesitewizard.com/archive/robotstxt.shtml
The search engines that visit your site will automatically look into your robots.txt
file before spidering your site.
When they read the file, they will see the sitemap file listed and load it for more information. This will enable them to
discover the pages that they have missed in the past. In turn, this will hopefully send them to index those files.
A sitemap file that follows the Sitemap Protocol is just a straightforward plain text file. You can create it using any ordinary plain text editor. If you use Windows, Notepad can be used. If you use a Mac, try TextEdit. Do not use a word processor like Microsoft Office, Wordpad or Word. For Windows users (Windows Vista, 7, 8.1 and later versions), you can start up Notepad by clicking the Start menu (or Start screen), and typing "notepad" (without the quotes), then clicking the "Notepad" line that appears.
You will notice that a sitemap file begins with the text
and ends with
Those portions of the sitemap file are invariant. All sitemaps have to begin and end this way, so you can simply copy them from my example to your own file.
Next, notice that every page on the website (that you want indexed in the search engine) is listed in the sitemap, using the following format:
where http://www.example.com/
should be replaced by the URL of the page you want indexed. In other words,
if you want to add a page, say, http://www.example.com/sing-praises-for-thesitewizard.com.html
to your website,
just put the web address for that page between <url><loc>
and </loc></url>
,
and place the entire line inside the section demarcated by <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
and </urlset>
.
To make your job simpler, copy the entire example sitemap that I gave in the example above into an empty Notepad window (or TextEdit on the Mac). Then replace all the example URLs with your own page addresses, adding any more that you like, and you're done.
If you are wondering "But where do I copy it to? Should I paste it in the <head>
section or the
<body>
section?", it means you didn't read my instructions above. Close whatever program you have
running that allowed you to see all those things and made you confused. Start up Notepad per my instructions. The
window should be empty, without any content at all. Paste my example into that empty window. Then modify the lines as
mentioned above.
Save the file under any name you like. Most people save it with a ".xml" file extension. If you don't have any particular preference, call it "sitemap.xml". If you use Notepad instead of a decent text editor, you should note the tips I gave in my article on how to save a file without the .txt extension in Notepad, otherwise you will encounter other problems.
Remember to update your robots.txt
file as mentioned earlier to include the URL of your sitemap file, so that
the search engines can learn of the existence of the file.
Note: a sitemap file cannot have more than 50,000 URLs (web addresses) nor be bigger than 50 MB. If yours is bigger than that, you'll have to create multiple sitemap files. Please see the Sitemaps site on how this can be done.
If you have pages on your website that seem to be omitted from the search engine indices, following the tips in this article will help you make sure that the search engines learn of all the pages on your web site. Of course, whether they actually go about spidering and listing them is another matter. However, with the sitemap file, you can at least know that they are aware of all the available pages on your site.
Copyright © 2008-2018 by Christopher Heng. All rights reserved.
Get more free tips and articles like this,
on web design, promotion, revenue and scripting, from https://www.thesitewizard.com/.
Do you find this article useful? You can learn of new articles and scripts that are published on thesitewizard.com by subscribing to the RSS feed. Simply point your RSS feed reader or a browser that supports RSS feeds at https://www.thesitewizard.com/thesitewizard.xml. You can read more about how to subscribe to RSS site feeds from my RSS FAQ.
This article is copyrighted. Please do not reproduce or distribute this article in whole or part, in any form.
It will appear on your page as:
How to Get Search Engines to Discover (Index) All the Web Pages on Your Site