You can download SilverStripe from this page: http://www.silverstripe.com/silverstripe-web-application-framework/. After downloading please install CMS according to installation instructions provided by vendor.
Actually, SE-friendly feature is enabled by default. But SilverStripe does not originally support IIS and ISAPI_Rewrite. On the first page of the installation you may see this text:
If you open your web-site after SilverStripe installation, you will see the note that mod_rewrite is not enabled. Actually, SilverStripe tries to find mod_rewrite (only), and if the search is successful, SilverStripe automatically creates rules in .htaccess file, and you get CMS working with SE-friendly URLs!
Thus under IIS we need to do some hacking. Let’s just disable the check for mod_rewrite.
Please go to SilverStripe installation folder and open rewritetest.php file. Please find this code:
$testrewriting = file_get_contents($location);
if($testrewriting == 'OK') {
return true;
}
And comment is out as follows:
//$testrewriting = file_get_contents($location);
//if($testrewriting == 'OK') {
return true;
//}
So we just commented the check and left only return true and .htaccess configuration file required to run SEF URLs is already in place.
Done! Now you can use SilverStripe with pretty SE-friendly URLs powered by ISAPI_Rewrite.
You can download MW from this page: http://www.mediawiki.org/wiki/Download. After downloading please install MW according to installation instructions provided by the vendor.
Default MediaWiki URLs look like www.yousite.com/index.php?title=TITLE, but you can change them by using two configuration variables in LocalSettings.php:
$wgScript and $wgArticlePath.
In this article we will try to make
simple SEO-friendly urls like www.yousite.com/TITLE.
So, please open LocalSettings.php document (it is located in MediaWiki root folder) and append this code:
$wgScript = "";
$wgArticlePath = "$wgScript/$1";
Please note! The $wgScript variable must contain path to your MW folder. For example, if your MW installation is
C:\inetpub\wwwroot\mediawiki
,
then you must write $wgScript = "/mediawiki"
, or if your MW installation is
C:\inetpub\wwwroot\
, then you must write $wgScript = ""
.
Now please create .htaccess file in the installation folder of MediaWiki and put these rules into it:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?title=$1 [NC,L,QSA]
Done! Now you can use MediaWiki with pretty SE-friendly URLs.
]]>
You can download Drupal from official web site: http://drupal.org/. Please install Drupal according to installation instructions provided by vendor.
Please note! In the installation wizard you can see the configuration for enabling SE-friendly URLs, but with IIS you can’t use it because Drupal blocks controls as shown on this picture:
Please just leave it as is, it is possible to change this setting later.
Please find the file %PATH_TO_DRUPAL%/sites/default/settings.php, where %PATH_TO_DRUPAL% is the root of Drupal’s installation folder. Open this document and append the code given below:
$conf[ 'clean_url' ] = 1;
Then please open .htaccess file located in the root folder of Drupal. There are many rules but only some of them are intended for ISAPI_Rewrite, others are for Apache Web Server.
Please remove all rules from .htaccess file and put these:
RewriteEngine on
# Modify the RewriteBase if you are using Drupal in a subdirectory or in a
# VirtualDocumentRoot and the rewrite rules are not working properly.
# For example if your site is at http://example.com/drupal uncomment and
# modify the following line:
# RewriteBase /drupal
#
# If your site is running in a VirtualDocumentRoot at http://example.com/,
# uncomment the following line:
# RewriteBase /
# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
Actually these are default Drupal rules for Apache mod_rewrite configuration and we just cut them from default .htaccess file.
Done! Now you can use Drupal CMS with pretty SEO-friendly URLs.
You can download WordPress from this web page: http://wordpress.org/download/.
Please don’t use versions lower than 2.6.1, because they will not work properly with ISAPI_Rewrite 3.
After downloading please install WordPress according to the installation instructions by the vendor. Note that in the installation wizard you may allow your blog to appear in search engines. This action enables indexing and will probably be important for you.
By default SE-friendly URLs are disabled. To enable this feature please go to WordPress administration panel. On the main page of administration panel please select “Settings” tab:
Now please select “Permalinks” tab. On the opened page you can see some default schemes of prettifying your URLs.
You may choose necessary scheme but notice that WordPress doesn’t use .htaccess file and ISAPI_Rewrite to process these permalinks. You will need to write your own rules.
Let’s write some simple rules for WordPress using ISAPI_Rewrite. Open “Permalinks” configuration as described above. Select “Custom Structure” radio box, and in adjacent field write /%post_id%/%postname% as shown on this picture:
Please save changes. %post_id% and %postname% are special tag names. You can read more about tags here: http://codex.wordpress.org/Using_Permalinks. So we have configuration for URLs like http://domain.com/1/my-great-post, and now we need to write some rules in .htaccess file. Please create .htaccess file in the root folder of WordPress and write the following code into it:
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(\d+)/[^/]+/?$ index.php?p=$1 [NC,L]
These rules rewrite requests like /1/my-great-post to those like index.php?p=1, but only if no such physical file of folder exist.
Done! Now you can use your WordPress blog with pretty URLs!
You can download Mambo from this page: http://mambo-code.org/gf/project/mambo/frs/. After downloading please install Mambo according to installation instructions provided by vendor.
By default SE-friendly URLs are disabled. To enable this feature please go to Mambo administration panel. On the main page of administration panel please select “Global Configuration”
On the opened page you can see few tabs. Please find “SEO” tab and select it:
Now you can see the configuration for SEO. Please select “yes” for option “Search Engine Friendly URLs”. Then apply and save your settings. Also please go to Mambo installation folder and rename htaccess.txt file to .htaccess.
Now all URLs on your site are SE-friendly. But the navigation will not work properly. It happens because Mambo uses variable $_SERVER[ ‘REQUEST_URI’ ] which is not supported by IIS.
You will need to make a little fix in the Mambo’s source code. Please copy this code to clipboard:
if ( isset( $_SERVER[ 'HTTP_X_REWRITE_URL' ] ) ) { $_SERVER[ 'REQUEST_URI' ] = $_SERVER['HTTP_X_REWRITE_URL' ]; }
Now you need to find index.php file in the root of Mambo installation folder. Please open this document with any editor and paste the above code at the top of index.php as shown on this picture:
Done! Now you can use Mambo CMS with pretty SE-friendly URLs. This article describes how to use Mambo with its default rules in .htaccess file, but of course you may replace them.
]]>You can download Joomla from official web site: http://joomla.org/download.html. After downloading please install Joomla according to installation instructions provided by vendor.
By default SE-friendly URLs are disabled. To enable this feature please go to the administration panel of Joomla. On the main page of administration panel please select “Global Configuration”
On the opened page you can see the group-box called “SEO Settings”. Please select “yes” for options “Search Engine Friendly URLs” and “Use Apache mod_rewrite”. Then apply and save your settings. Also please go to Joomla installation folder and rename htaccess.txt file to .htaccess.
Now all URLs on your site are SE-friendly. But there are some hurdles left. When you type different URLs in the address bar you always end up with the main page. It happens because Joomla uses variable $_SERVER[ ‘REQUEST_URI’ ] which is not supported by IIS.
You will need to add a little fix in the Joomla’s source code. Please copy this code to clipboard:
if ( isset( $_SERVER[ 'HTTP_X_REWRITE_URL' ] ) ) { $_SERVER[ 'REQUEST_URI' ] = $_SERVER['HTTP_X_REWRITE_URL' ]; }
Now you need to find index.php file in the root of Joomla installation folder. Open this document with any editor supporting Unix format (e.g. WordPad). Note: you can’t use Notepad to edit index.php because Notepad does not support saving in Unix format. Paste code above at the top of index.php as shown on this picture:
Done! Now you can use Joomla CMS with pretty SE-friendly URLs. This article describes how to use Joomla with its default rules in .htaccess file but of course you may replace them.
]]>
In the simplest case there are only two request processing contexts:
Server context is executed first and after that, if further processing is allowed (no redirect or proxy happened), root folder config is processed (if present).
Picture 1. Server-wide configuration (httpd.conf)
Picture 2. Per-site configuration (.htaccess)
Picture 3. Processing order for the configs on Pictures 1 and 2
***
Now let’s make it more complicated—we’ll have the rules in the root folder and in Directory1. The processing order then becomes:
DirectiveA
DirectiveB
Picture 4. Processing order in case of several .htaccess files
For the request to http://localhost/index.html only the first context is applied, while for http://localhost/Directory1/index.html (and other requests to deeper subfolders) the merged context 1+2 is executed. In our case it’s:
DirectiveA
DirectiveB
Thus, the child context complements and refines the parent one (but not the server one). This is true for nearly all Apache/Ape modules EXCEPT mod_rewrite. It’s one of a kind and behaves differently.
Historically, or for convenience purposes, mod_rewrite contexts do not complement but COMPLETELY OVERRIDE each other. So, if we have two configs
RewriteRule a b
RewriteRule b с
the resulting config to be applied to the request will be
RewriteRule b c
and NOT
RewriteRule a b
RewriteRule b с
which may be unobvious for newbies.
For experts! mod_rewrite has an option allowing to change this behavior and inherit the parent rules:
- /.htaccess
RewriteRule a b
- /Directory1/.htaccess
# inherit parent rules RewriteOptions inherit RewriteRule b с
makes up the following merged config:
RewriteRule b с # parent rules are appended to the end of the merged config! RewriteRule a b
<Directory> section is equivalent in meaning to writing rules in the .htaccess located inside this directory. The only difference is that <Directory> lives in httpd.conf.
If there are both <Directory> section and .htaccess for the same directory, they are merged; if the directives inside them interfere, the .htaccess directives are preferred.
Picture 5. Processing order when there are both .htaccess and <Directory> for the same location
Let’s see how the configs are merged for the request to http://localhost/Directory1/Directory2/index.html if each directory has both <Directory> section and corresponding .htaccess file.
Picture 6. Processing order when there are several .htaccess files and several <Directory> sections which are applicable for the same request
httpd.conf
<Directory C:/inetpub/wwwroot/>
DirectiveDirectoryA
</Directory>
<Directory C:/inetpub/wwwroot/Directory1/>
DirectiveDirectoryB
</Directory>
<Directory C:/inetpub/wwwroot/Directory1/Directory2/>
DirectiveDirectoryС
</Directory>
/.htaccess
DirectiveA
/Directory1/.htaccess
DirectiveB
/Directory1/Directory2/.htaccess
DirectiveC
The following logics is applied to form the merged config:
The resulting sequence of directives will be:
DirectiveDirectoryA
DirectiveA
DirectiveDirectoryB
DirectiveB
DirectiveDirectoryC
DirectiveC
Usually directives’ order is not so important, but not in case with mod_rewrite; that’s why understanding the principles of configs merging may dramatically reduce development and debugging times.
Note! <DirectoryMatch> sections are applied not to all parts of the request (see above) but only to the deepest part, and all matches are searched for, for example, if there are two sections:
<DirectoryMatch C:/inetpub/wwwroot/Directory1/Directory*/>
and
<DirectoryMatch C:/inetpub/wwwroot/Directory1/*/>
then both of them get into the merged config.
One should remember that everything written inside server context is applied to all requests and for all sites. Sometimes it may be necessary to limit the scope of directive to one or several sites and that’s the case to use <VirtualHost> section.
<VirtualHost> can reside in httpd.conf only. It is merged with server config, i.e. complements it. In case of both .htaccess and <VirtualHost> section for the specific location, the latter has higher priority and can reject server settings for the specific site (in our case localhost).
#httpd.conf
ServerDirective
<VirtualHost localhost>
VirtualHostDirectiveA
</VirtualHost>
Note! mod_rewrite offers another way to restrict scope for the rules to specific host – RewriteCond %{HTTP_HOST}.
The difference is that RewriteCond %{HTTP_HOST} must appear before each RewriteRule, while <VirtualHost localhost> groups all rules for localhost together and affects all of them. Compare:
RewriteCond %{HTTP_HOST} localhost
RewriteRule . index.php [L]
RewriteCond %{HTTP_HOST} localhost
RewriteRule about$ about.php [L]
and
<VirtualHost localhost>
RewriteRule . index.php [L]
RewriteRule about$ about.php [L]
</VirtualHost>
On the other hand, the limitation of <VirtualHost> is that it can’t be used in .htaccess.
Picture 7. Processing order when <VirtualHost> section is present in httpd.conf
Note! <VirtualHost> sections are NOT merged together – if there are several <VirtualHost>s matching the request, the one with the best match is applied. E.g.:
<VirtualHost localhost> <VirtualHost localhost:80> <VirtualHost *>
For request to localhost:80/page.html the second line will be executed, whereas for localhost/page.html the first one will fire.
If <Directory> section is specified inside <VirtualHost> (which is possible), the processing order is as follows: <Directory> section of the main server config is accounted first, then <Directory> inside <VirtualHost> and after all – .htaccess.
Thus, the use of <Directory> section outside <VirtualHost> will lead to application of its (<Directory>) rules to all sites (in case they use this shared folder).
Picture 8. Processing order when there are <VirtualHost> and <Directory> sections as well as .htaccess
These two behave similar to <DirectoryMatch> but are used for file names, not for the full path.
E.g., for http://localhost/Directory1/index.html#top they will find the correspondence in file system C:\inetpub\wwwroot\Directory1\index.html and will merge all <FilesMatch> sections valid for this file name (e.g. <FilesMatch *.html> and <FilesMatch index.*> will be merged).
Note! <Files> and <FilesMatch> may reside in .htaccess as well!
Are applied to the corresponding virtual path, which for http://localhost/Directory1/index.html#top is /Directory1/index.html.
Let’s now put it all together. Here’s the final sequence of sections:
***
Seems every aspect of configs processing has been covered. We understand that this article may look somewhat sophisticated, but we are sure there are enthusiasts who’ll find it helpful.
It may happen that you get the “File not loaded” message in error.log or changes to the configuration file are not applied. Generally the issue in this case is with NTFS permissions.
RewriteEngine on
RewriteRule .? - [F]
Make any request to the site. If result is “403 Forbidden”, ISAPI_Rewrite works OK. If you get “404 Page not found”, something might be wrong.
#enabling rewrite.log
RewriteLogLevel 9
#enabling error.log
LogLevel debug
Warning! Plesk creates new user for each application pool, so it is necessary to grant Create & Write permissions to each of them.
Don’t forget to add logging directives (see clause 1 above) into your httpd.conf file.
Note: do not enable logging on live server – it is destined only for debugging purposes.
If this is your case, you are likely to use page-relative paths for your images and CSS and the problem occured because of altered base path.
Please consider changing the paths to root-relative (e.g. <img src="/image.jpg">
) or absolute from. Or specify correct base path (e.g. <base href="/">
) .
For example you use the rule like:
RewriteEngine on
RewriteRule ^index\.php$ default.aspx [NC,R]
And when you request the url with query string like http://www.site.com/index.php?param=value you get the result http://www.site.com/default.aspx?param=value instead of desired http://www.site.com/default.aspx (without initial query string).
This happens because by default ISAPI_Rewrite attaches initial query string to rewritten url (if rewritten url doesn’t have its own query string). To avoid this you must add a question mark at the end of substitution pattern:
RewriteEngine on
RewriteRule ^index\.php$ default.aspx? [NC,R]
The problem is that you put a query string part of the url into RewriteRule statement and ISAPI_Rewrite 3 processes query string apart from the rest of the url in RewriteCond %{QUERY_STRING}
statement.
So this won’t work:
RewriteEngine on
RewriteRule ^index\.php\?param=(\d+)$ default.asp?param2=$1? [NC,L]
And this will work:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^param=(\d+)$ [NC]
RewriteRule ^index\.php$ default.asp?param2=%1? [NC,L]
Sometimes you don’t now the exact number of parameters to be passed and want a universal rule to deal in all cases. Here’s a simple example for a situation when you may have either one or two parameters:
RewriteEngine on
RewriteRule ^folder/(\d+)(?:/(\d+))?/?$ index.asp?param1=$1?2¶m2=$2 [NC,L]
This rule will accept requests like http://www.site.com/folder/111/ and http://www.site.com/folder/111/222/ (with or without trailing slash) and direct them respectively to http://www.site.com/index.asp?param1=111 and http://www.site.com/index.asp?param1=111¶m2=222.
If it is necessary to exclude some folders from being processed by ISAPI_Rewrite, and to redirect others to for example index.asp
, the following piece of code will be helpful:
RewriteEngine on
RewriteBase /
RewriteRule ^(?!(?:exfolder1|exfolder2|etc)/.*).+$ index.asp [NC,R=301,L]
RewriteEngine on
RewriteBase /
RewriteMap mapfile txt:mapfile.txt
RewriteMap lower int:tolower
RewriteRule ^products/([^?/]+)\.asp$ productpage.asp?productID=${mapfile:${lower:$1}}
Note: entries in your mapfile.txt must also be lowercase.
Note: In builds 3.1.0.62 and higher it’s possible to make case-insensitive comparison by simply adding [NC] flag after mapfile definition:
RewriteMap mapfile txt:mapfile.txt [NC]
RewriteCond ${mapfile:$1|NOT_FOUND} !NOT_FOUND
RewriteMap mymap txt:map.txt
RewriteMap mylower int:tolower
RewriteCond %{REQUEST_URI} ^/([^/]+)/?$
RewriteCond ${mymap:${mylower:%1}|NOT_FOUND} !NOT_FOUND
RewriteRule .? ${mymap:${mylower:%1}} [NC,L]
Line-by-line explanation:
1 line: we declare map file ‘mymap’ which will read content from map.txt file.
2 line: we declare map file ‘mylower’, which provides access to pre-defined internal function ToLower. It converts input string to lower case.
3 line: we are catching the request URI of the specified pattern.
4 line: we are trying to get a value from map file, using requested URI in lower case as a key.
5 line: if the value is found, RewriteRule fires.
mapfile.txt
param1=value1¶m2=value2 keyword1
etc.
revmapfile.txt
keyword1 param1=value¶m2=value2
etc.
And the rules resolving the issue look like:
RewriteEngine on
RewriteBase /
RewriteMap mapfile txt:mapfile.txt [NC]
RewriteMap revmapfile txt:revmapfile.txt [NC]
RewriteCond %{QUERY_STRING} (.+)
RewriteRule ^index\.asp$ ${mapfile:%1}? [NC,R=301,L]
RewriteCond ${revmapfile:$1|NOT_FOUND} !NOT_FOUND
RewriteRule ^([^/]+)$ index.asp?${revmapfile:$1} [NC,L]
It’s not likely, but if you encounter some troubles (un)installing ISAPI_Rewrite 3, firstly, please try to re-download the installation package (it could be corrupted during downloading) and start installation again. If this doesn’t help, please run installation from the command line with the following keys to generate (un)installation log:
For installation:
msiexec /i rewriteXXX.msi /l* log.txt
For uninstallation:
msiexec /x rewriteXXX.msi /l* log.txt
This log will be helpful for investigation of your errors.
It is often needed to prevent your site from being accessed from specific IP addresses or ranges of IP addresses. Here’s an example of blocking two ranges: 203.207.64.0 – 203.208.19.255 203.208.32.0 – 203.208.63.255
ISAPI_Rewrite code is:
RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^203\.20(?:7\.(?:[6-9][0-9]|\d{3})|8\.(?:1?[0-9]|4|5[0-9]|3[2-9]|6[0-3]))\.\d{1,3}$
RewriteRule .? - [F]
The issue occurs because by default Visual Studio uses it’s internal web server, not IIS. To resolve the issue please right-click on your project, go to Property pages -> Start Options -> Server -> select Use custom server and in Base URL field specify the path to your site.
The issue is that IISPassword and ISAPI_Rewrite use the same configuration file name – .htaccess, but unlike ISAPI_Rewrite, IISPassword fails if any unknown directives are found in it. So to avoid conflicts you need to change configuration file name for either product. To change configuration file name for ISAPI_Rewrite use AccessFileName
directive in httpd.conf file.
Here’s the solution to get rid of lots of web-spiders:
RewriteEngine on
RewriteCond %{HTTP:User-Agent} (?:WebBandit|2icommerce|\
Accoona|ActiveTouristBot|adressendeutschland|aipbot|Alexibot|Alligator|\
AllSubmitter|\almaden|anarchie|Anonymous|Apexoo|Aqua_Products|asterias|\
ASSORT|ATHENS|AtHome|Atomz|attache|autoemailspider|autohttp|b2w|bew|\
BackDoorBot|Badass|Baiduspider|Baiduspider+|BecomeBot|berts|Bitacle|Biz360|\
Black.Hole|BlackWidow|bladderfusion|BlogChecker|BlogPeople|BlogsharesSpiders|\
Bloodhound|BlowFish|BoardBot|Bookmarksearchtool|BotALot|BotRightHere|\
Botmailto:[email protected]|Bropwers|Browsezilla|BuiltBotTough|Bullseye|\
BunnySlippers|Cegbfeieh|CFNetwork|CheeseBot|CherryPicker|Crescent|charlotte/|\
ChinaClaw|Convera|Copernic|CopyRightCheck|cosmos|Crescent|c-spider|curl|Custo|\
Cyberz|DataCha0s|Daum|Deweb|Digger|Digimarc|digout4uagent|DIIbot|DISCo|DittoSpyder|\
DnloadMage|Download|dragonfly|DreamPassport|DSurf|DTSAgent|dumbot|DynaWeb|e-collector|\
EasyDL|EBrowse|eCatch|ecollector|edgeio|[email protected]|EirGrabber|EmailExtractor|EmailCollector|\
EmailSiphon|EmailWolf|EmeraldShield|Enterprise_Search|EroCrawler|ESurf|Eval|Everest-Vulcan|\
Exabot|Express|Extractor|ExtractorPro|EyeNetIE|FairAd|fastlwspider|fetch|FEZhead|FileHound|\
findlinks|FlamingAttackBot|FlashGet|FlickBot|Foobot|Forex|FranklinLocator|FreshDownload|\
FrontPage|FSurf|Gaisbot|Gamespy_Arcade|genieBot|GetBot|Getleft|GetRight|GetWeb!|Go!Zilla|\
Go-Ahead-Got-It|GOFORITBOT|GrabNet|Grafula|grub|Harvest|HatenaAntenna|heritrix|HLoader|\
HMView|holmes|HooWWWer|HouxouCrawler|HTTPGet|httplib|HTTPRetriever|HTTrack|humanlinks|\
IBM_Planetwide|iCCrawler|ichiro|iGetter|ImageStripper|ImageSucker|imagefetch|imds_monitor|\
IncyWincy|IndustryProgram|Indy|InetURL|InfoNaviRobot|InstallShieldDigitalWizard|InterGET|\
IRLbot|Iron33|ISSpider|IUPUIResearchBot|Jakarta|java/|JBHAgent|JennyBot|JetCar|jeteye|jeteyebot|JoBo|\
JOCWebSpider|Kapere|Kenjin|KeywordDensity|KRetrieve|ksoap|KWebGet|LapozzBot|larbin|leech|LeechFTP|\
LeechGet|leipzig.de|LexiBot|libWeb|libwww-FM|libwww-perl|LightningDownload|LinkextractorPro|Linkie|\
LinkScan|linktiger|LinkWalker|lmcrawler|LNSpiderguy|LocalcomBot|looksmart|LWP|MacFinder|MailSweeper|\
mark.blonin|MaSagool|Mass|MataHari|MCspider|MetaProductsDownloadExpress|MicrosoftDataAccess|MicrosoftURLControl|\
MIDown|MIIxpc|Mirror|Missauga|MissouriCollegeBrowse|Mister|Monster|mkdb|moget|Moreoverbot|mothra/netscan|\
MovableType|Mozi!|Mozilla/22|Mozilla/3.0(compatible)|Mozilla/5.0(compatible;MSIE5.0)|MSIE_6.0|MSIECrawler|\
MSProxy|MVAClient|MyFamilyBot|MyGetRight|nameprotect|NASASearch|Naver|Navroad|NearSite|NetAnts|netattache|\
NetCarta|NetMechanic|NetResearchServer|NetSpider|NetZIP|NetVampire|NEWTActiveX|Nextopia|NICErsPRO|ninja|\
NimbleCrawler|noxtrumbot|NPBot|Octopus|Offline|OKMozilla|OmniExplorer|OpaL|Openbot|Openfind|OpenTextSiteCrawler|\
OracleUltraSearch|OutfoxBot|P3P|PackRat|PageGrabber|PagmIEDownload|panscient|PapaFoto|pavuk|pcBrowser|\
perl|PerMan|PersonaPilot|PHPversion|PlantyNet_WebRobot|playstarmusic|Plucker|PortHuron|ProgramShareware|\
ProgressiveDownload|ProPowerBot|prospector|ProWebWalker|Prozilla|psbot|psycheclone|puf|PushSite|\
PussyCat|PuxaRapido|Python-urllib|QuepasaCreep|QueryN|Radiation|RealDownload|RedCarpet|RedKernel|\
ReGet|relevantnoise|RepoMonkey|RMA|Rover|Rsync|RTG30|Rufus|SAPO|SBIder|scooter|ScoutAbout|script|\
searchpreview|searchterms|Seekbot|Serious|Shai|shelob|Shim-Crawler|SickleBot|sitecheck|SiteSnagger|\
SlurpyVerifier|SlySearch|SmartDownload|sna-|snagger|Snoopy|sogou|sootle|So-net"bat_bot|SpankBot"bat_bot|\
spanner|bat_bot|SpeedDownload|Spegla|Sphere|Sphider|SpiderBot|sproose|SQWebscanner|Sqworm|Stamina|\
Stanford|studybot|SuperBot|SuperHTTP|Surfbot|SurfWalker|suzuran|Szukacz|tAkeOut|TALWinHttpClient|\
tarspider|Teleport|Telesoft|Templeton|TestBED|TheIntraformant|TheNomad|TightTwatBot|Titan|\
toCrawl/UrlDispatcher|True_Robot|turingos|TurnitinBot|TwistedPageGetter|UCmore|UdmSearch|\
UMBC|UniversalFeedParser|URLControl|URLGetFile|URLyWarning|URL_Spider_Pro|UtilMind|vayala|\
vobsub|VCI|VoidEYE|VoilaBot|voyager|w3mir|WebImageCollector|WebSucker|Web2WAP|WebaltBot|\
WebAuto|WebBandit|WebCapture|webcollage|WebCopier|WebCopy|WebEMailExtrac|WebEnhancer|WebFetch|\
WebFilter|WebFountain|WebGo|WebLeacher|WebMiner|WebMirror|WebReaper|WebSauger|WebSnake|Website|\
WebStripper|WebVac|webwalk|WebWhacker|WebZIP|WellsSearch|WEPSearch00|WeRelateBot|Wget|WhosTalking|\
Widow|WildsoftSurfer|WinHttpRequest|WinHTTrack|WUMPUS|WWWOFFLE|wwwster|WWW-Collector|Xaldon|Xenus|\
Xenus|XGET|Y!TunnelPro|YahooYSMcm|YaDirectBot|Yeti|Zade|ZBot|zerxbot|Zeus|ZyBorg) [NC]
RewriteRule .? - [F]
Hope this article has answered some of your questions and helped you master some of ISAPI_Rewrite basic features. We are going to add more unobvious issues to this list to make your enjoy every rule you write.
Best wishes,
HeliconTech Team
mod_rewrite processes (verifies, changes, adds and deletes) any incoming (request) headers(highlighted in yellow). Usually most of the work is done on URL part of request. It is divided by browser into parts that are then transferred to server in separate headers (first line of the request and Host: header). That’s why in it’s unlikely to meet the whole URL in mod_rewrite directives.
Despite its versatility mod_rewrite is NOT capable of processing response headers and response body.
mod_rewrite module possesses 10 directives but 2 of them are used much more often that others: RewriteRule and RewriteCond.
Whenever you want to put RewriteRule directives into some context, you must start your config with the following directive:
RewriteEngine on
All mod_rewrite directives but RewriteCond and RewriteRule (and their extended alternatives RewriteHeader and RewriteProxy) may be written in any order and even several times, in such case the module will accept only the last (close to the bottom of the config) value.
RewriteRule directives are processed from top to bottom, RewriteCond directives refer only to one subsequent RewriteRule. (see detals below).
As an example we’ll take small .htaccess in the root of the site:
RewriteEngine on
RewriteRule robots\.txt robots.asp [NC]
and robots.asp:
<H1>Asp function "Now"</H1>
<%=now()%>
This rule replaces requested robots.txt (this file may not even exist on the server) with real dynamic robots.asp file. [NC] flag makes strings comparison case-insensitive. Clients requesting robots.txt have no idea of what’s happening on server, i.e. they don’t know that instead of static robots.txt they will get dynamically generated response (e.g. as on the picture below).
To understand the logic of config processing let’s make it a little more complicated and add the rule rewriting default.htm with default.asp to the end of .htaccess. How will this config be processed?
RewriteEngine on
RewriteRule robots\.txt robots.asp [NC]
RewriteRule default\.htm default.asp [NC]
For requests to any file in the root of the site the sequence will be the following:
or
Processing stops.
In other words: all requested URLs are compared with all rules regardless of whether one of the rules has already matched or not. This leads to excessive actions. That’s why we’ll add [L] flag that terminates rules processing if the current rule matched.
RewriteEngine on
RewriteRule robots\.txt robots.asp [NC,L]
RewriteRule default\.htm default.asp [NC,L]
That’s why it’s better to place frequently used rules at the top of the config.
Important note!
We’ll wander off the topic a little and explain one imperceptible thing about strings comparison in regular expressions. If ^ character is not specified at the beginning of match pattern, regexp mechanism will look for the substring starting from all possible positions in the string.
Example:
We request default.htm and regular expression is robots\.txt. Regexp mechanism compares:
and only after that regexp mechanism will inform you that the rule was not matched!
But if we add ^ at the beginning and $ at the end of each RewriteRule,
RewriteEngine on
RewriteRule ^robots\.txt$ robots.asp [NC,L]
RewriteRule ^default\.htm$ default.asp [NC,L]
comparison will only occur once:
that is much faster. We strongly recommend to add ^ wherever possible.
It’s often necessary to apply additional conditions to the rule. RewriteCond directive is destined for such purposes.
Say we need to dynamically generate gif and jpg images and there are two scripts for that -render_gif.asp and render_jpg.asp. mod_rewrite config will look like this:
RewriteEngine on
RewriteRule ^(.*)\.gif$ render_gif.asp?file=$1 [NC,L]
RewriteRule ^(.*)\.jpg$ render_jpg.asp?file=$1 [NC,L]
Now we’ll add a condition: if requested gif or jpg file physically exists on the disk, return real file. This check may be performed using the following directive:
RewriteCond %{REQUEST_FILENAME} !-f
And the config will become:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)\.gif$ render_gif.asp?file=$1 [NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)\.jpg$ render_jpg.asp?file=$1 [NC,L]
We want to drag your attention to the following:
Let’s take the config that allows to prevent hotlinking. This config blocks requests to gif and jpg files which don’t have referrer value or their referrer doesn’t start with http://www.example.net:
RewriteEngine on
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://www\.example\.net [NC]
RewriteRule \.(jpe?g|gif)$ - [F]
The config is processed in the following order:
Comments:
The following picture illustrates the result of the above rule. As you can see, for request /test.jpg referrer did not start with http://www.example.com and the result was “403 – Forbidden”.
Let’s take an example allowing to emulate several sites on one IIS site while placing them into different directories:
RewriteEngine on
RewriteCond %{HTTP:Host} ^(www\.)?(.+)
RewriteRule (.*) /%2$1 [L]
Say the request is http://example.net/robots.txt
Now all files requested on http://example.net/ will be searched not in C:\inetpub\wwwroot\ but in C:\inetpub\wwwroot\example.net\ folder.
In green areas you may and should use the following types of variables:
Yellow areas require regular expressions. $n and %n are not possible.
Here’s the config allowing redirection of non-www requests to www:
RewriteEngine on
RewriteCond %{HTTPS} (on)?
RewriteCond %{HTTP:Host} ^(?!www\.)(.+)$ [NC]
RewriteCond %{REQUEST_URI} (.+)
RewriteRule .? http(?%1s)://www.%2%3 [R=301,L]
a. This part (?%1s) checks whether %1 matched and if yes, it adds “s” character. http(?%1s) -> https
Please pay attention!
b. Then https://www.%2%3 -> https://www.example.com%3
c. Then https://www.example.com%3 -> https://www.example.com/index.php
RewriteRule directive deals only with the part of the request AFTER host name and BEFORE QueryString.
E.g. the request is http://localhost/test?param=foo (only part in bold is processed).
And the rule is:
So, there are 4 variants of passing initial QueryString:
RewriteRule ^/test$ /test.asp
the result is /test.asp?param=foo
RewriteRule ^/test$ /test.asp?bar=foo
the result is /test.asp?bar=foo.
RewriteRule ^/test$ /test.asp?bar=foo [QSA]
the result is /test.asp?bar=foo¶m=foo.
RewriteRule ^/test$ /test.asp?
And the result will be: /test.asp.
If one needs to work with QueryString parameters more selectively, you should use RewriteCond directive and %{QUERY_STRING} variable (remember that RewriteRule directive doesn’t match QUERY_STRING?).
Example
Redirect /index.php?id=123 to /index/123. The config is:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^id=(.*)$
RewriteRule ^/index.php$ /index/%1? [NC,R=301,L]
Please notice that browser does not send anchor information to the server (everything that comes after # character). That’s why it’s absolutely impossible to write a rule that will use this info. Nevertheless, browser will correctly process relative links, ‘cause rewriting is absolutely transparent for it.
Hope we convinced you that mod_rewrite is much easier to use that you thought. We are happy if we managed to shed some light on this scary-looking question and you got more understanding of the issue. Next article about mod_rewrite will tell you the story of context merging and distributed configurations.
Best wishes, HeliconTech Team
]]>