Thursday, August 26, 2010

Isolation levels in Transactions

Recently I was revisiting my understandings on the Isolation levels in transaction, a topic which I had read long time back and had started to evaporate in my memory.

Read a couple of very interesting articles on the subject (references below) and have summarized my understandings on the subject in this article.

Need for Isolation levels in transactions

A transaction is a series of actions that need to be executed in entirety and the obvious reason for this is data consistency.

Every transaction acquires some resources to do some work on them. Now based upon the need of the actions in the transactions, the desired resources can be acquired either exclusively for the transaction, thus blocking any other transaction that may need the same resources OR if the resources are acquired in sharing mode then other transactions can also use the same resources thus increasing the concurrency but potentially risking data(resource) consistency.

Isolation Levels

Traditionally there are 4 isolation levels but since SQL Server 2005, two more have been added, which are more optimistic in nature (the traditional 4 are labeled as pessimistic in nature).

Read Uncommitted

On specifying this Isolation level for a transaction, the transaction reads data which may have been modified by another transaction but has not been committed yet. This could lead to dirty reads, in case the transaction modifying the data rolls back the update.

Not sure when exactly this type of Isolation level is used but it is there.

Read Committed

This is the default isolation level for all transactions in SQL SERVER (atleast till 2005).

This isolation level ensures that there are no dirty reads i.e. if a record/data is being updated by another transaction then the Transaction with Read Committed will be blocked till the previous transaction commits the data.

The advantages with this Isolation level are that there are no dirty reads and the concurrency high, since once the data is read lock is released and other transactions can use the same resource/data.

The disadvantageous are that since only shared locks are obtained while reading the data and which too are released once the data is read, there can be lost updates i.e. Tx1 can read a data and after some time update it. And in meantime if Tx2 too reads the same data and updates it then only the last update will be retained, others would be lost.

Read Repeatable

We can solve the problem of lost updates and make our reads repeatable with Read Repeatable Isolation level.

This Isolation level ensures that the data read in a transaction with this Isolation level cannot be changed until the transaction is complete.

Voila! Seems like all issues resolved.. not really. Well updates will be restricted on the data being referred or used by this transaction but Inserts matching the criteria of the data being used in the tx can still happen. This is called as Phantom reads.

Serializable

This is the stringiest of all the Isolation levels i.e. Phantom reads too will not occur; no Inserts matching the criteria of the data being used in the tx with this Isolation level can happen.

Yes all issues in above 3 txs are resolved and consistency is ensured fully but a price of very low concurrency.

Read Committed Snapshot (Since SQL Server 2005)

This and the below one are 2 new Isolation levels introduced in SQL SERVER 2005 and more optimistic in nature, with the goal of maximizing concurrency along with data consistency. These 2 Isolation levels use a technique called as Row Versioning.

Read Committed Snapshot Isolation level allows a transaction to read the original value of a record/data that may have been updated by another txn but not yet committed. But if Txn2 ties to read the values, which have been updated by Txn1, it (Txn2) will be shown the updated values.

In fact the Default Isolation Level has been changed from Read Committed to Read Committed Snapshot.

Snapshot (Since SQL Server 2005)

This Isolation level is same as Read Committed Snapshot except with the difference that Txn2 will see the original copy of the data throughout its life, even if midway Txn1 updates and commits it.

Even if inserts are made matching the criteria of data being read in TXN2, they will never be visible to TXN2 ever.

This ISOLATION LEVEL also prevents lost updates i.e. if a txn2 reads the original value, which is being updated by txn1 and later if txn2 tries to update the values that have been modified by txn1, an error will be thrown to txn2.

Locks

In order to operate at the desired Isolation levels, the transactions will need to acquire certain locks. Below are various types of locks that are generally acquired:

Sharing: This type of lock is acquired generally during reads. Multiple Shared locks can be acquired by various transactions. And during the time these locks are ON, exclusive locks would have to wait till all shared locks are released.

Exclusive: This lock is generally used during updates and no other locks are allowed on the data holding this lock, until it is released.

Intent: This lock specifies that there is an intent to acquire Exclusive lock on a data already having Shared lock and thus disallowing any other further Shared locks, till the Exclusive lock is obtained on the data.

Schema: This lock is used by the DB and compiling queries to ensure that the Schema is not modified when these locks are ON.

Dirty Reads

Lost-Updates/
Nonrepeatable reads

Phantom reads

Concurrency model

Conflict Detection

Read Uncommitted

Yes

Yes

Yes

Pessimistic

No

Read Committed

No

Yes

Yes

Pessimistic

No

Repeatable Read

No

No

Yes

Pessimistic

No

Serializable

No

No

No

Pessimistic

No

Snapshot

No

No

No

Optimistic

Yes

Read Committed Snapshot

No

Yes

Yes

Optimistic

No

References

1. http://www.sqlfundas.com/post/2009/11/26/Database-Lock-Block-Dead-Lock-e28093-What-is-it-Why-is-it-What-to-do-about-it-e28093-Part-1.aspx

2. http://www.sqllion.com/2009/07/transaction-isolation-levels-in-sql-server/

3. http://www.sql-server-performance.com/articles/dba/isolation_levels_2005_p1.aspx



Sunday, August 22, 2010

URL Rewriting

What is URL Rewriting and Why is it needed?

URL Rewriting is a means or a mechanism for a site to provide the desired page, requested by the user in a convenient form which does not correspond to physical structure on the site.

e.g. User may request for a page http://www.product.com/Books.aspx while in reality, as per the structure in your site, you may be expecting http://www.product.com?Products.aspx?category=Books.

The reason for this variation could be : a.) Because your site has done some movements or shuffling of the items b) Or to make the URLs short or more understandable to your users c) To improve your hits on search engine and indexing by crawlers too.

How to do it?

Various ways that one can go about and do it.
1. Write a ISAPI filter, which would do the URL rewriting task
2. Write a HTTP Handler/Module to do the URL rewriting.
3. Use the URLMAPPING feature of ASP.NET 2.0

URLMAPPING feature is a neat feature of ASP.NET 2.0 in which we could specify the incoming URL and the URL to be mapped to, in the web configuration file.

Ofcourse there are other hassles that one would need to take care, irrespective of the approach:

1. Images and CSS files specifying relative paths and hence breaking on client side where the path is a virtual one and not the actual.

To circumvent this issue, one can use the ~syntax to reference files from the root of the application for example

2. Postbacks: If the request URL contains a form, the action property of this HTML element is auto-populated with the converted URL. And now when the postback occurs, it will send to the server the actual URL in action property and not the friendly/desired one.

Hmm…Problem…. No worries. Every problem has a solution and some have more than one.

One way for working around this problem is defining your own forms class by deriving it from System.Web.HtmlControls.HtmlForm class and then overriding the RenderAttribute() method and then doing everything in that method as the baseclass does, except not rendering the action attribute. You will need to compile this class and then add it to Web Application’s Reference folder. Then instead of using the HTMLFORM class, use this newly created form class in each of your ASP.NET pages. Cumbersome… Is it not?

Another approach and seems elegant too. This uses the ASP.NET 2.0 Control Adapter extensibility architecture to customize the rendering of the & control and override the value of the action attribute with the provided value. No changes are required to be done in your .ASPX pages. Just add a .browser file to your /app_browsers folder that registers the Control Adapter class, which would be used to output the new “action” attribute

3. Specifying only Folder Path: Sometimes the users may not specify the entire path but just the folder path e.g. http://www.MyBlogs/2009/ with the expectation to see all the entries of 2009. Now IIS will try to search for a DEFAULT.APSX OR DEFAULT.HTM within the directory mentioned, before handing over the request to the respective ISAPI extension. But in this case there is no folder likes 2009 nor may it be feasible to create a folder for each year with a default file in it.

Also if the user requests http://www.MyBlogs/2009/11/ then IIS would try to search for this folder with a default file in it. It may not be feasible or rather cumbersome to create a directory for each year and each month with default file.

Hence a neater approach would be, even if visiting URL /2004/03/, IIS will faithfully hand off the request to the ASP.NET engine even if there does not exist a /2004/03/ directory. Using this approach, however, makes the ASP.NET engine responsible for handling all types of incoming requests to the Web server, including images, CSS files, external JavaScript files, Macromedia Flash files, and so on.
.Text, an open-source blog engine, can be configured to have all requests mapped to the ASP.NET engine. It can handle serving all file types by using a custom HTTP handler that knows how to serve up typical static file types (images, CSS files, and so on).

IIS 7 specifies a way by which all requests will be directed to an HTTP module.

URL Rewriting in HTTP Modules

URL Rewriting can also be achieved via HTTP Modules and HTTP handlers.

HTTP modules respond to the specific events raised by HTTPApplication like Begin Request, Authenticate Request, Authorize Request, etc. Modules can be configured to respond to any of the events raised by the application before the request is transferred to the Handlers.

Now changing the URL can be done by creating a custom HTTP Module which handles the desired event(s) and uses the RewritePath() method of HttpContext class to rewrite the URL in the request object. Seems Simple… not really.. Changing the URL in the Request object may have an impact on the processing carried out by the existing modules provided by ASP.Net e.g.( FormsAuthenticationModule, FileAuthorizationModule, UrlAuthorizationModule). Example, if URL Rewriting is done in AuthorizeRequest Event, FileAuthorization module would not work as the requested URL would not have been replaced till then and hence FileAuthorization would be working on an incorrect file in the URL and would mostly not object.

Thus if you're not using any authentication, then it doesn't matter if URL rewriting happens in BeginRequest, AuthenticateRequest, or AuthorizeRequest.

If you are using forms authentication and are not using Windows authentication, place the URL rewriting in the AuthorizeRequest event handler. Finally, if you are using Windows authentication, schedule the URL rewriting during the BeginRequest or AuthenticateRequest events.

URL Rewriting in HTTP Handlers
URL Rewriting can be accomplished via HTTP Handler or HTTP Handler factory. HTTP handler is the class responsible for generating specific content for a type of request. To perform URL rewriting through an HTTP handler, we can create an HTTP handler factory whose GetHandler() method checks the requested path to determine if it needs to be rewritten. If it does, it can call the passed-in HttpContext object's RewritePath() method, as discussed earlier. Finally, the HTTP handler factory can return the HTTP handler returned by the System.Web.UI.PageParser class's GetCompiledPageInstance() method. (This is the same technique by which the built-in ASP.NET Web page HTTP handler factory, PageHandlerFactory, works.)Since all the HTTP modules are initialized before HTTP handlers, the FileAuthorizationModule will be triggered earlier to the URL being rewritten. So if Windows Authetication and file authorization are being used, one will have to use the HTTP Modules approach for URL Rewriting.In both the approaches that is HTTP Modules and HTTP Handler, the rules for rewriting could be stated in the Configuration file i.e. the pattern to look for and the string to replace could be stated in the Configuration file. This would make the application more susceptible to changes in the rules for rewriting. Also the pattern to look for could be expressed easily using regular expressions.

References
1. http://msdn.microsoft.com/en-us/library/ms972974.aspx
2. http://weblogs.asp.net/scottgu/archive/2007/02/26/tip-trick-url-rewriting-with-asp-net.aspx
3. List of some good search engine optimization suggestions
a. SSW Rules to Better Google Rankings
b. how URLs can affect top search engine ranking.
4. Already built HttpModules available on the web for free URL Rewriting
a. UrlRewriter.net
b. UrlRewriting.net

5. ISAPIRewrite to enable Extension-less URL Rewriting for IIS5 and IIS6

a. Helicon Tech's ISAPI Rewrite: ISAPI Rewrite full product version for $99

b. Ionic's ISAPI Rewrite: Free download (both source and binary)