martes, 24 de enero de 2012

.htaccess Files for the Rest of Us

.htaccess files are used to configure Apache, as well a range of other web servers. Despite the .htaccess file type extension, they are simply text files that can be edited using any text-editor. In this article, we’ll review what they are, and how you can use them in your projects.

Please note that .htaccess files don't work on Windows-based systems, although they can be edited and uploaded to a compatible web server, and on Linux-based systems they are hidden by default.

In order to work with htaccess files locally, to see how they work and generally play around with them, we can use XAMPP (or MAMP) on the Mac – a package that installs and configures Apache, PHP and MySQL. To edit these .htaccess files on Mac, we should use a text editor that allows for the opening of hidden files, such as TextWrangler.

A .htaccess file follows the same format as Apache’s main configuration file: httpd.conf. Many of the settings that can be configured using the main configuration file can also be configured with them, and vice versa.

A setting configured in an .htaccess file will override the same setting in the main configuration file for the directory which contains the file, as well as all of its subdirectories.

They are sometimes referred to as dynamic configuration files because they are read by the server on every request to the directory they are contained within. This means that any changes to an .htaccess file will take effect immediately, without requiring a reboot of the server, unlike changes made to the global configuration file. It also means that you pay a slight performance hit for using them, but they can be useful when you don't have access to the server's main configuration file.

So now we all know what .htaccess files are, how they are edited and worked with, and some of their pros and cons, let's look at how they can be used and some of the cool stuff they can do.


Redirects and URL Rewriting

A popular use of .htaccess files is to perform redirects or rewrite URLs. This can help with SEO following a domain name change, or file-structure reorganisation, or can make long unsightly URL more friendly and memorable.

Redirections

A redirection can be as simple as the following:

 Redirect 301 ^old\.html$ <a href="http://localhost/new.html" target="_blank">http://localhost/new.html</a>  

This sets the HTTP status code to 301 (moved permanently) and redirects all requests to old.html transparently to new.html. We use a regular expression to match the URL to redirect, which gives us a fine degree of control to ensure only the correct URL is matched for redirection, but adds complexity to the configuration and administration of it. The full URL of the resource being redirected to is required.

Rewrites

A rewrite rule can be as simple as this:

 RewriteEngine on RewriteRule ^old\.html$ new.html 

In this example, we just provide a simple file redirect from one file to another, which will also be performed transparently, without changing what is displayed in the address bar. The first directive, RewriteEngine on, simply ensures that the rewrite engine is enabled.

In order to update what is displayed in the address bar of the visitor's browser, we can use the R flag at the end of the RewriteRule e.g.

 RewriteRule ^old\.html$ <a href="http://hostname/new.html" target="_blank">http://hostname/new.html</a> [r=301] 

The r flag causes an external redirection which is why the full URL (an example URL here) to the new page is given. We can also specify the status code when using the flag. This causes the address bar to be updated in the visitor's browser.

One of the possible uses for URL rewriting I gave at the start of this section was to make unsightly URLs (containing query-string data) friendlier to visitors and search engines. Let's see this in action now:

 RewriteRule ^products/([^/]+)/([^/]+)/([^/<WBR>]+) product.php?cat=$1&brand=$2&<WBR>prod=$3 

This rule will allow visitors to use a URL like products/turntables/technics/sl1210, and have it transformed into product.php?cat=turntables&<WBR>brand=technics&prod=sl1210. The parentheses in between the forward slashes in the above regular expression are capturing groups – we can use each of these as $1, $2 and $3 respectively. The [^/]+ character class within the parentheses means match any character except a forward-slash 1 or more times.

In practice, URL rewriting can be (and usually is) much more complex and achieve far greater things than this. URL rewriting is better explained using entire tutorials so we won't look at them in any further detail here.


Serving Custom Error Pages

It's just not cool to show the default 404 page anymore. Many sites take the opportunity offered by a file not found error to inject a little humour into their site, but at the very least, people expect the 404 page of a site to at least match the style and theme of any other page of the site.

Very closely related to URL rewriting, serving a custom error page instead of the standard 404 page is easy with an .htaccess file:

 ErrorDocument 404 ";/404.html"; 

That's all we need; whenever a 404 error occurs, the specified page is displayed. We can configure pages to be displayed for many other server errors too.


Restricting Access to Specific Resources

Using .htaccess files, we can enable password protection of any file or directory, to all users, or based on things like domain or IP address. This is after all one of their core uses. To prevent access to an entire directory, we would simple create a new .htaccess file, containing the following code:

 AuthName ";Username and password required"; AuthUserFile /path/to/.htpasswd Require valid-user AuthType Basic 

This file should then be saved into the directory we wish to protect. The AuthName directive specifies the message to display in the username/password dialog box, the AuthUserFile should be the path to the .htpasswd file. The Require directive specifies that only authenticated users may access the protected file while the AuthType is set to Basic.

To protect a specific file, we can wrap the above code in a <files> directive, which specifies the protected file:

 Files ";protectedfile.html";> AuthName ";Username and password required"; AuthUserFile /path/to/.htpasswd Require valid-user AuthType Basic </Files> 

We also require an .htpasswd file for these types of authentication, which contains a colon-separated list of usernames and encrypted passwords required to access the protected resource(s). This file should be saved in a directory that is not accessible to the web. There are a range of services that can be used to generate these files automatically as the password should be stored in encrypted form.


Block Access to Certain Entities

Another use of .htaccess files is to quickly and easily block all requests from an IP address or user-agent. To block a specific IP address, simply add the following directives to your .htaccess file:

 order allow,deny deny from 192.168.0.1 allow from all 

The order directive tells Apache in which order to evaluate the allow/deny directives. In this case, allow is evaluated first, then deny. The allow from all directive is evaluated first (even though it appears after the deny directive) and all IPs are allowed, then if the client's IP matches the one specified in the deny directive, access is forbidden. This lets everyone in except the specified IP. Note that we can also deny access to entire IP blocks by supplying a shorter IP, e.g. 192.168.

To deny requests based on user-agent, we could do this:

 RewriteCond %{HTTP_USER_AGENT} ^OrangeSpider RewriteRule ^(.*)$ http://%{REMOTE_ADDR}/$ [r=301,l] 

In this example, any client with a HTTP_USER_AGENT string starting with OrangeSpider (a bad bot) is redirected back to the address that it originated from. The regular expression matches any single character (.) zero or more times (*) and redirects to the %{REMOTE_ADDR} environment variable. The l flag we used here instructs Apache to treat this match as the last rule so will not process any others before performing the rewrite.


Force an IE Rendering Mode

Alongside controlling how the server responds to certain requests, we can also do things to the visitor's browser, such as forcing IE to render pages using a specific rendering engine. For example, we can use the mod_headers module, if it is present, to set the X-UA-Compatible header:

 Header set X-UA-Compatible ";IE=Edge"; 

Adding this line to an .htaccess file will instruct IE to use the highest rendering mode available. As demonstrated by HTML5 Boilerplate, we can also avoid setting this header on files that don't require it by using a <FilesMatch directive like so:

 <FilesMatch ";\.(js|css|gif|png|jpe?g|pdf|<WBR>xml|oga|ogg|m4a|ogv|mp4|m4v|<WBR>webm|svg|svgz|eot|ttf|otf|<WBR>woff|ico|webp|appcache|<WBR>manifest|htc|crx|xpi|<WBR>safariextz|vcf)$";>;   Header unset X-UA-Compatible </FilesMatch> 

Implement Caching

Caching is easy to set up and can make your site load faster.

Caching is easy to set up and can make your site load faster. 'Nuff said! By setting a far-future expires date on elements of sites that don't change very often, we can prevent the browser from requesting unchanged resources on every request.

If you're running your site through Google PageSpeed or Yahoo's YSlow and you get the message about setting far-future expiry headers, this is how you fix it:

 ExpiresActive on ExpiresByType image/gif                 ";access plus 1 month"; ExpiresByType image/png                 ";access plus 1 month"; ExpiresByType image/jpg                 ";access plus 1 month"; ExpiresByType image/jpeg                ";access plus 1 month"; ExpiresByType video/ogg                 ";access plus 1 month"; ExpiresByType audio/ogg                 ";access plus 1 month"; ExpiresByType video/mp4                 ";access plus 1 month"; ExpiresByType video/webm                ";access plus 1 month"; 

You can add different ExpiresByType directives for any content that is listed in the performance tool you're using, or anything else that you want to control caching on. The first directive, ExpiresActive on, simply ensures the generation of Expires headers is switched on. These directives depend on Apache having the mod_expires module loaded.


Enabling Compression

Another warning we may get in a performance checker refers to enabling compression, and this is also something we can fix simply by updating our .htaccess file:

 FilterDeclare   COMPRESS FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/html FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/css FilterProvider  COMPRESS  DEFLATE resp=Content-Type $text/javascript FilterChain     COMPRESS FilterProtocol  COMPRESS  DEFLATE change=yes;byteranges=no 

This compression scheme works on newer versions of Apache (2.1+) using the mod_filter module. It uses the DEFLATE compression algorithm to compress content based on its response content-type, in this case we specify text/html, text/css and text/javascript (which will likely be the types of files flagged in PageSpeed/Yslow anyhow).

In the above example we start out by declaring the filter we wish to use, in this case COMPRESS, using the FilterDeclare directive. We then list the content types we wish to use this filter. The FilterChain directive then instructs the server to build a filter chain based on the FilterProvider directives we have listed. The FilterProtocol directive allows us to specify options that are applied to the filter chain whenever it is run, the options we need to use are change=yes (the content may be changed by the filter (in this case, compressed)) and byteranges=no (the filter must only be applied to complete files).

On older versions of Apache, the mod_deflate module is used to configure DEFLATE compression. We have less control of how the content is filtered in this case, but the directives are simpler:

 SetOutputFilter DEFLATE AddOutputFilterByType DEFLATE text/html text/css text/javascript 

In this case we just set the compression algorithm using the SetOutputFilter directive, and then specify the content-types we'd like to compress using the AddOutputFilterByType directive.

Usually your web server will use one of these modules depending on which version of Apache is in use. Generally, you will know this beforehand, but if you are creating a generic .htaccess file that you can use on a variety of sites, or which you may share with other people and therefore you don't know which modules may be in use, you may wish to use both of the above blocks of code wrapped in <IfModule module_name> directives so that the correct module is used and the server doesn't throw a 500 error if we try to configure a module that isn't included. You should be aware that it's also relatively common for hosts that run a large number of sites from a single box to disable compression as there is a small CPU performance hit for compressing on the server.


Summary

We looked at some of the most common uses for .htaccess files, and reviewed how we can achieve certain tasks that, as website builders/maintainers, are of particular interest to us. As is the case with any introductory tutorial of this nature, the subjects we've covered are presented as introductions to a particular topic. There are many other options and configurations than we have been able to look at, so I'd strongly recommend further reading on any subject that is of particular interest.



No hay comentarios:

Publicar un comentario