ASP URL Rewriting like MOD_Rewrite

ASP Best Practices

Unix has Mod_Rewrite but IIS has nothing comparable. Here's how to accomplish complex URL rewrites with IIS


Date : 2006-05-10
Remember when you wrote your first dynamic web application? You were so proud of all of the IDs you were passing back and forth, maybe even your session ids in your querystring that only you knew what they were for. Well, time goes by and we learn that users donít like ugly URLs and neither do search engines. Itís true that search engines are getting better at deciphering dynamic query strings and they will probably continue to get better at it. That being said most people look for a way to clean up their URLs. Unix servers have MOD_REWRITE that you can use to reassign a URL to a script and pull the URI apart into variables that can be passed to the underlying script with very little trouble. So, where does that leave us ASP or .NET developers running IIS. Well, IIS6 introduced remapURL v1.0 which looks and works like a 0.001 version of a tool that didnít set itís sights very high. With this tool you can redirect one static URL to another. That isnít going to clean up our dynamic query strings.

That leaves us looking for a MOD_REWRITE replacement for IIS6. The most important feature and the one left out of remapURL is the regex pattern matching. We need to be able to pull a URI stem apart into pieces that we can use. For instance the URIs here on BestCodingPractices are dynamic. Each article or page has an ID that is used to identify it and allow the page to display the correct content.

So this link: www.bestcodingpractices.com/naming_conventions-2.html actually ends up going to www.bestcodingpractices.com/page.php?id=2 . The pretty URL with the extra keywords in it is so much nicer to look at and gives an SEO boost as well.

In order to accomplish this there are a 3 basic steps that are needed. First of all it is important to stop thinking of this as a "URL Rewriting" operation. URLs are not rewritten for you, they are redirected, remapped maybe but not rewritten. So your basic steps are:
Redirect static looking URLs to their Dynamic counterparts.
Change the URLs on your site to point to the Static looking URL.
Bonus Step: 301 redirect old dynamic URLs to their new Static versions.

Letís go through each of these steps.
First we need to redirect. For this step there are 2 different general methods in use. The first requires no additional software, you simply setup a script as the 404 error page that checks the URL they were after and then uses that to direct them to the correct content. In order to know exactly which server variable to check to see the previous URL I would use something like this:
Code:
<table border=1>
<%
for each x in Request.ServerVariables
  response.write("<tr><td>" & x & "</td><td>" &_
    Request.ServerVariables(x) & "</td></tr>")
next
%>
</table>


I have seen this method work and I have seen servers that once you ended up at the error page there was no way to determine which page they had come from. So check this before you jump in too deeply with this method.

Assuming this works for you once you have the old URL you can use RegExp to tear it apart and get the ID, or Title, or whatever value it is you have encoded into your URL to tell your script which page they are requesting.

That is the first method and I got it out of the way because I think the other method makes so much more sense but it requires some software. I have been using ISAPI_Rewrite for years and have never had any trouble with it. It works almost identically to MOD_Rewrite so if you are familiar with the Unix version it will be fairly easy for you to get up and running with ISAPI_Rewrite. With ISAPI_Rewrite you have a file called httpd.ini that you use to describe your redirect rules. Here is an example of a common rule:

Code:
  RewriteRule ?/[a-z0-9_]+/([0-9]+?).html$ /show_page.asp?id=$1


Without trying to teach a lesson in RegEx pattern matching letís walk through this rule. The "?" at the beginning causes this rule to be stuck to the beginning of the URL. That makes it where this rule would match:
www.bestcodingpractices.com/keyword_rich_title_text/122.html and would be redirected to:
www.bestcodingpractices.com/show_page.asp?id=122
but it would not match:
www.bestcodingpractices.com/forum/keyword_rich_url/122.html

Ok, lets move on. Next we see /[a-z0-9_]+/ that section is going to match "/any_lower_case_characters_and_numbers_and_the_underscore/". It also requires the forward slash on either side of it. This part is ignored because when it comes down to it your show_page.asp does not care what the title of the page is, it just wants the id. Which brings us to our ID ([0-9]+?) this grabs any string of one or more numeric characters and stores it as $1 (because itís the first content enclosed in parenthesis). Next we see .html you escape the period because in RegEx a period matches any character except newline (which should never exist in a URL anyway). Then the trailing $ forces the pattern to stick to the end of the string. Using the dollar sign at the end does not allow for a query string at the end of your URL but weíll get to that later.

Ok, now that weíve walked through a simple pattern Iíll show you some other great things that can be done with this and give some examples. One SEO method that is popular right now is to setup Content Silos that are all served by one script.
For Example:
Code:
  RewriteRule ?/(asp|php|cfm|net)/[a-z0-9_]+?-([0-9]+?).html$ /show_page.asp?id=$2&cs=$1

This pattern would match:
www.bestcodingpractices.com/asp/keyword_rich_url-245.html and redirect it to
www.bestcodingpractices.com/show_page.asp?id=245&cs=asp

Notice I use "-" between the title text and the ID, many people use "/" instead but some SEO experts will tell you that the less "/" you have in a URL the more weight will be given to the final page, so it may make sense to avoid using "/" where possible.

Another useful pattern would be:
Code:
  RewriteRule ?/(asp|php)/[a-z0-9_]+?-([0-9]+?).html$ /show_list.asp?id=$2&cs=$1
  RewriteRule ?/(asp|php)/[a-z0-9_]+?-([0-9]+?).html[?](.*)$ /show_list.asp?id=$2&cs=$1&$3

These two rules work together, the first would match:
/php/keywords_in_a_url-3243.html and would redirect that to:
/show_list.asp?id=3243&cs=php

But what if show_list.asp has an option to change the sorting of the list and returns that as a querystring, or has pagination?

The second rule takes care of that, it would match:
/php/keywords_in_a_url-3243.html?sort=date&page=2 and would redirect that to:
/show_list.asp?id=3243&cs=php&sort=date&page=2
Allowing your paging and sorting to go on acting as they had before.

Be sure to test your rules carefully and be sure that your rules are not so broad that they will match URIs that you donít want them to. Notice, for instance, that Iím not using ".*?" in my RegEx patterns. Using "." Can be too broad and end up matching URLs that you donít want it to. Using the more restrictive pattern will require some more work while creating the URLs but will give you much more control over what is matched and have greater success matching the URLs that you want to.

Now that we know how to handle the shiny new URLs that weíre going to make, we need to make them. So on to our second step.

Change the URLs on your site to point to the Static looking URL.

Before you start cramming all manner of text into your URL you need to remember that not all characters are even allowed in a URL and other characters can throw off your RewriteRule if you didnít expect them, or had used that character as a delimiter. I highly recommend that you create a function right away that converts a title, or other text into content that is useable as part of a URL and stick with it. Here is an example to get you started.

Code:
function title2url(str)
  dim re
  str = trim(str)
  s = instr(str,"  ")               Ď remove double spaces
  do while s > 0       
    str = replace(str,"  "," ")
    s = instr(str,"  ")
  loop                                  Ď Ö all of them
  str = lcase(str)                  Ď SEO guru says "User lowercase characters in your URLs"
  str = replace(str," ","_")   Ď Replace spaces with underscore
  set re = new RegExp
  re.Pattern = "[^0-9a-z_]"  Ď Only allow lowercase characters, numbers and the underscore
  re.Global = True
  title2url = re.Replace(str, "")
  set re = nothing
end function


Iím sure the function you decide to use will be different than this one, but this gives you an idea. The reason you should create this function early on is because if you have to change this later you could loose PageRank on a page when you realize you need to remove the "&" from your url that used to read /this-&-that/123.html and then you realize the "&" is messing with your URL matching so you remove the "&"now itís /this_that/123.html. If you had thought it through to begin with you would have replaced "&" with "and" and would have ended up with /this_and_that/123.html.

The first step, then, to actually changing the URLs on your site to point to your nifty new keyword rich URLs is to decide what characters are going to have which meaning. Now you need to go through your site and change all of your URLs to work with the patterns you have setup. For instance you have a script that creates a dynamic URL like this:
Code:
url = "show_page.asp?id=" & RS("page_id")
[code]

That would need to be changed to:
[code]
  url="/" & title2url(RS("page_title")) & "-" & RS("page_id") & ".html"

And would be matched by the rule:
Code:
  RewriteRule ?/[a-z0-9_]+-([0-9]+?).html$ /show_page.asp?id=$1


Now our final step:
301 redirect old dynamic URLs to their new Static versions

It really makes the change from dynamic urls to static urls smoother if you forward your old dynamic urls to their new static counterparts. Otherwise search engines are bound to find two copies of each page and may result in issues with duplicate content. The idea then is to check what the URL was that it came in on and if itís the old URL, then redirect with HTTP_Status 301. The 301 redirect tells search enginges that this page has moved permanently and there is no reason to ever check the old URL again. Not that they will not check again, but they have been warned and they will get the idea eventually.

There are many benefits to using static, keyword rich URLs and content silos on your sites and hopefully the pattern set out here will help you get started doing that yourself.

Purchase ISAPI_Rewrite Here

Comments :

mattfield 2006-06-26 #11

I downloaded the software and modified the httpd.ini file, but I'm not sure what to do to the .asp file to make the function work.
I'd like to change files the look like default.asp?hotel_ID=12 to be SEO friendly. Do I need to add something to the head of the .asp file? Do I need to load the ISAPI in IIS?

BeachBum 2006-06-26 #12

Mattfield,

Let's move this to the forum so we can discuss it more easily.
http://www.bestcodingpractices.com/forums.php?m=posts&q=21

Thanks

  • Search For Articles