Google in its webmaster technical guidelines has suggested webmasters to make sure the web server supports the If-Modified-Since HTTP header. If-Modified-Since HTTP header feature will allow your web server to tell Google whether your content has changed since they last crawled your site. Supporting this If-Modified-Since HTTP header feature saves you bandwidth and overhead.
The If-Modified-Since request-header field is used with a method to make it conditional, if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body.
If-Modified-Since = "If-Modified-Since" ":" HTTP-date
An example of the field is: If-Modified-Since: Mon, 02 Jul 2010 08:24:42 GMT
A GET method with an If-Modified-Since header and no Range header requests that the identified entity be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases:
- If the variant has been modified since the If-Modified-Since date, the response is exactly the same as for a normal GET.
- If the variant has not been modified since a valid If-Modified-Since date, the server should return a 304 (Not Modified) response.
- If the request would result in anything other than a 200 (OK) status, or if the passed If-Modified-Since date is invalid, the response is exactly the same as for a normal GET. A date which is later than the server's current time is invalid.
However there is a difference when dealing with static pages and dynamic pages. In static pages, If-Modified-Since header is normally set by the server, but in dynamic pages when you use a server-sided scripting language like PHP then you must set these on your own.
By default, pages generated with PHP are not cached by browsers or proxies, as they are generated new every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.
Whenever a page is requested by a browser, the server response includes a Last-Modified header in the response which indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies generally take the Last-Modified time from the cached version and populate an If-Modified-Since request header with it. If the page has not changed since then, the server should respond with a 304 response code to indicate that the cached version is still valid, rather than sending the page content again.
To handle this correctly for PHP pages requires two things:
- Identifying the last modification time for the page, and checking the request headers for the If-Modified-Since.
- Timestamps
There are two components to the last modification time, the date of the data used to generate the page and the date of the script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been changed the generated page may be different. PHP code incorporates both by defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is more recent than the script. The last modification time is then used to generate a Last-Modified header, and returned to the caller. Here is the function that adds the Last-Modified header. It uses both getlastmod() and filemtime(__FILE__) to determine the script modification time, on the assumption that this function is in a file included from the main script, and we want to detect changes to either.
function setLastModified($last_modified=NULL)
{
$page_modified=getlastmod();
if(empty($last_modified) || ($last_modified < $page_modified))
{
$last_modified=$page_modified;
}
$header_modified=filemtime(__FILE__);
if($header_modified > $last_modified)
{
$last_modified=$header_modified;
}
header('Last-Modified: ' . date("r",$last_modified));
return $last_modified;
}
Handling If-Modified-Since
If the If-Modified-Since request header is present, then it can be parsed to get a timestamp that can be compared against the modification time. If the modification time is older than the request time, a 304 response can be returned instead of generating the page.
In PHP, the HTTP request headers are generally stored in the $_SERVER superglobal with a name starting with HTTP_ based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE entry, which corresponds to the If-Modified-Since header. We can check for this with array_key_exists, and parse the date with strtotime. There's a slight complication in that old browsers used to add additional data to this header, separated with a semicolon, so we need to strip that out (using preg_replace) before parsing. If the header is present, and the specified date is more recent than the last-modified time, we can just return the 304 response code and quit — no further output required. Here is the function that handles this:
function exitIfNotModifiedSince($last_modified)
{
if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER))
{
$if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"]));
if($if_modified_since >= $last_modified)
{
header("HTTP/1.0 304 Not Modified");
exit();
}
}
}
You can use both the functions together or you can use the functions separately if that better suits your needs:
exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency
exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages
Benefits of If-Modified-Since HTTP header
Site Owners: The main benefits are related to bandwidth usage, sites that make use of the If-Modified-Since HTTP header, generally save a lot of bandwidth.
Search Engines: Search engines that use this header, will also benefit, in that, they won't have to waste resources crawling pages that didn't change. And thats why some of them may say to make sure that your server supports the If-Modified-Since HTTP header.
1 comments:
Just a note that your Last-Modified header should carry an HTTP-Date (RFC-1123) in UTC otherwise the browser may mistake it for UTC and send back the wrong time in the IF_MODIFIED_SINCE header.
header('Last-Modified: ' . gmdate(DATE_RFC1123, $lastModified ) );
Post a Comment