Research tools - laptops and handhelds for mobile usability testing

isham research

Missing pages - 404 errors from search engine bots

The author is a Google Bionic Poster and Top Contributor based in Sheffield UK. The opinions here are not Google's

A lot of webmasters believe that simply deleting a page from their server means it's gone forever. And then they start seeing 404 errors in their server logs or in search engine reports and wonder what is happening.

First of all - such 404 'errors' do no harm at all - they're merely advisory notices and can safely be ignored. They won't cause any search engine penalties and their frequency will gradually reduce.

But in fact, this is how the web is supposed to work. The IETF publishes what are known as RFCs which are the de facto 'laws' of the Internet, and HTTP 1.1 is the subject of one - RFC 2616. Section 10 is relevant to 404s:

RFC 2616 Status Code Definitions

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

Note the emphasized text. Check further down the codes listed and you'll find 410 Gone:

10.4.11 410 Gone

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.

Whatever some people have said in the past (and it's now very difficult to get them to repeat it) a 404 will never be treated as a 410, no matter how often it is repeated. All that happens is that the frequency of attempts to reach such pages falls off, perhaps to a little as once every six months. And there are thousands of search engines - RFC 2616 defines their common interface with websites.

A 410 may be far from ideal in a business environment. If an e-commerce site has a useful custom 404 response page, it may be better to leave the 404 - a 410 essentially throws the hit away. Another alternative is a 301 redirect to a business-relevant page.

See also:

HTTP Error: 410 Gone

If you discover you need to build a large .htaccess file to contain all the names currently producing 404s, remember that you can download a CSV file from Google's Webmaster Tools. Any reasonable editor will be able to change this into directives. Note that the 410 response code is relatively new (introduced with HTTP 1.1 in 2003) - many people are not aware of its existence or function and a very few older clients do not support it.

Contact by email or use mobile/SMS 07833 654800