Sunday, August 22, 2010

URL Rewriting

What is URL Rewriting and Why is it needed?

URL Rewriting is a means or a mechanism for a site to provide the desired page, requested by the user in a convenient form which does not correspond to physical structure on the site.

e.g. User may request for a page http://www.product.com/Books.aspx while in reality, as per the structure in your site, you may be expecting http://www.product.com?Products.aspx?category=Books.

The reason for this variation could be : a.) Because your site has done some movements or shuffling of the items b) Or to make the URLs short or more understandable to your users c) To improve your hits on search engine and indexing by crawlers too.

How to do it?

Various ways that one can go about and do it.
1. Write a ISAPI filter, which would do the URL rewriting task
2. Write a HTTP Handler/Module to do the URL rewriting.
3. Use the URLMAPPING feature of ASP.NET 2.0

URLMAPPING feature is a neat feature of ASP.NET 2.0 in which we could specify the incoming URL and the URL to be mapped to, in the web configuration file.

Ofcourse there are other hassles that one would need to take care, irrespective of the approach:

1. Images and CSS files specifying relative paths and hence breaking on client side where the path is a virtual one and not the actual.

To circumvent this issue, one can use the ~syntax to reference files from the root of the application for example

2. Postbacks: If the request URL contains a form, the action property of this HTML element is auto-populated with the converted URL. And now when the postback occurs, it will send to the server the actual URL in action property and not the friendly/desired one.

Hmm…Problem…. No worries. Every problem has a solution and some have more than one.

One way for working around this problem is defining your own forms class by deriving it from System.Web.HtmlControls.HtmlForm class and then overriding the RenderAttribute() method and then doing everything in that method as the baseclass does, except not rendering the action attribute. You will need to compile this class and then add it to Web Application’s Reference folder. Then instead of using the HTMLFORM class, use this newly created form class in each of your ASP.NET pages. Cumbersome… Is it not?

Another approach and seems elegant too. This uses the ASP.NET 2.0 Control Adapter extensibility architecture to customize the rendering of the & control and override the value of the action attribute with the provided value. No changes are required to be done in your .ASPX pages. Just add a .browser file to your /app_browsers folder that registers the Control Adapter class, which would be used to output the new “action” attribute

3. Specifying only Folder Path: Sometimes the users may not specify the entire path but just the folder path e.g. http://www.MyBlogs/2009/ with the expectation to see all the entries of 2009. Now IIS will try to search for a DEFAULT.APSX OR DEFAULT.HTM within the directory mentioned, before handing over the request to the respective ISAPI extension. But in this case there is no folder likes 2009 nor may it be feasible to create a folder for each year with a default file in it.

Also if the user requests http://www.MyBlogs/2009/11/ then IIS would try to search for this folder with a default file in it. It may not be feasible or rather cumbersome to create a directory for each year and each month with default file.

Hence a neater approach would be, even if visiting URL /2004/03/, IIS will faithfully hand off the request to the ASP.NET engine even if there does not exist a /2004/03/ directory. Using this approach, however, makes the ASP.NET engine responsible for handling all types of incoming requests to the Web server, including images, CSS files, external JavaScript files, Macromedia Flash files, and so on.
.Text, an open-source blog engine, can be configured to have all requests mapped to the ASP.NET engine. It can handle serving all file types by using a custom HTTP handler that knows how to serve up typical static file types (images, CSS files, and so on).

IIS 7 specifies a way by which all requests will be directed to an HTTP module.

URL Rewriting in HTTP Modules

URL Rewriting can also be achieved via HTTP Modules and HTTP handlers.

HTTP modules respond to the specific events raised by HTTPApplication like Begin Request, Authenticate Request, Authorize Request, etc. Modules can be configured to respond to any of the events raised by the application before the request is transferred to the Handlers.

Now changing the URL can be done by creating a custom HTTP Module which handles the desired event(s) and uses the RewritePath() method of HttpContext class to rewrite the URL in the request object. Seems Simple… not really.. Changing the URL in the Request object may have an impact on the processing carried out by the existing modules provided by ASP.Net e.g.( FormsAuthenticationModule, FileAuthorizationModule, UrlAuthorizationModule). Example, if URL Rewriting is done in AuthorizeRequest Event, FileAuthorization module would not work as the requested URL would not have been replaced till then and hence FileAuthorization would be working on an incorrect file in the URL and would mostly not object.

Thus if you're not using any authentication, then it doesn't matter if URL rewriting happens in BeginRequest, AuthenticateRequest, or AuthorizeRequest.

If you are using forms authentication and are not using Windows authentication, place the URL rewriting in the AuthorizeRequest event handler. Finally, if you are using Windows authentication, schedule the URL rewriting during the BeginRequest or AuthenticateRequest events.

URL Rewriting in HTTP Handlers
URL Rewriting can be accomplished via HTTP Handler or HTTP Handler factory. HTTP handler is the class responsible for generating specific content for a type of request. To perform URL rewriting through an HTTP handler, we can create an HTTP handler factory whose GetHandler() method checks the requested path to determine if it needs to be rewritten. If it does, it can call the passed-in HttpContext object's RewritePath() method, as discussed earlier. Finally, the HTTP handler factory can return the HTTP handler returned by the System.Web.UI.PageParser class's GetCompiledPageInstance() method. (This is the same technique by which the built-in ASP.NET Web page HTTP handler factory, PageHandlerFactory, works.)Since all the HTTP modules are initialized before HTTP handlers, the FileAuthorizationModule will be triggered earlier to the URL being rewritten. So if Windows Authetication and file authorization are being used, one will have to use the HTTP Modules approach for URL Rewriting.In both the approaches that is HTTP Modules and HTTP Handler, the rules for rewriting could be stated in the Configuration file i.e. the pattern to look for and the string to replace could be stated in the Configuration file. This would make the application more susceptible to changes in the rules for rewriting. Also the pattern to look for could be expressed easily using regular expressions.

References
1. http://msdn.microsoft.com/en-us/library/ms972974.aspx
2. http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx
3. List of some good search engine optimization suggestions
a. SSW Rules to Better Google Rankings
b. how URLs can affect top search engine ranking.
4. Already built HttpModules available on the web for free URL Rewriting
a. UrlRewriter.net
b. UrlRewriting.net

5. ISAPIRewrite to enable Extension-less URL Rewriting for IIS5 and IIS6

a. Helicon Tech's ISAPI Rewrite: ISAPI Rewrite full product version for $99

b. Ionic's ISAPI Rewrite: Free download (both source and binary)

No comments:

Post a Comment