Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

rewriting index.htm and index.php 1

Status
Not open for further replies.

glimbeek

Programmer
Nov 30, 2009
83
NL
I had a rewrite rule that rewrote index.htm to index.php. Worked like a charm but looking back on it I wanted them both to be redirected to
I did this using the following code I found using Google:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm\ HTTP/
RewriteRule ^index\.htm$ [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ [R=301,L]

Now I tested this and it seems to be working just fine. However I have no idea what the above code means, as it wasnt explained on the page I found it.

I'm specificly wondering about the fact that there's "[A-Z]{3,9}" before index.htm, what does this do and why is it in front of index.htm? Could someone explain the (whole) code for me?
 
Those portions are called regular expressions. Regular expressions are a logical syntax that is used for pattern matching. For example, ^[A-Z]{3,9}\/index\.htm\HTTP/ means look for (exactly) 3 to 9 letters that are capital A to Z at the start of a line followed by /index.htm.

The basics of regular expressions is quite simple as there are only about a dozen meta characters that have special meanings, such as ^ which means start of the line, [] which is called a character class and - specifying the range of characters. The difficulty with regular expressions becomes generating a string that matches what you want while missing the stuff you don't as the effects can be subtle.

Regular expressions are used by many different tools and applications. As you noticed Apache is one, GREP, SED, AWK, and VI are some others. The thing is that the syntax supported by each tool is slightly different, which adds to the complexity. If you are interested more, the book Mastering Regular Expressions is well written and gives exhaustive detail about them.

In the rest of your script above, the rewrite rule is the pattern to look for and it is replaced with the text specified in the rewrite condition.

 
Thanks for your reply. I think I get it, except for the following:

Why did the person who wrote this code use {3,9}?
And with the [A-Z] the rewriterule works for lowercase aka index.htm but not INDEX.HTM or INDEX.htm or index.HTM.
 
In regular expressions there are symbols that operate on the previous symbol to represent a repetition amount. In your example, {3,9} means that the previous symbol must occur at least three times and no more than 9 times. Hence in your rewrite rule, there must be between 3 and 9 alphabetic characters. Others include * for 0 or more occurrences, + for 1 or more occurrences, and ? for 0 or 1 occurrence.

Regarding why it works for lower case and not upper case, I am not sure. I would guess that it is part of Apache's interpretation of the regular expression. As I mentioned, most applications differ slightly in how they handle expressions.
 
Thanks for the explanation. I figured as much and I also asked on a different forum.

The moderator on that forum told me the [A-Z] and {3,9} has to do with the types of request.

According to him, because i'm using "THE_REQUEST" the following things: OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE en CONNECT can be "used" for lack of a better word..
and the {3,9} combined with the [A-Z] looks at those types of requests. So the need to be atleast 3 chars but at the most 9 chars. However, the same moderator didnt understand why you would use {3,9} and not {3,7} because according to him there arent any types of requests longer then 7 chars.
 
Try adding the No Case option to your conditions

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm\ [NC]
HTTP/RewriteRule ^index\.htm$ [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ [NC]
HTTP/RewriteRule ^index\.php$ [R=301,L]
 
Thanks for your reply.

That would "solve" the uppercase lowercase problem, which without the use of [NC] worked good enough. I'm just kinda stuck with the {3,9} bit now. Don't understand why the guy uses it in the first place and why he uses 9 instead of 7.
 
I believe that Noway2 has already answered that
Noway2 said:
means that the previous symbol must occur at least three times and no more than 9 times
For whatever reason, that's what the original author was using for his condition - that might not neccessarily be the condition that you need.
 
For whatever reason, that's what the original author was using for his condition - that might not neccessarily be the condition that you need."

I was kinda hoping someone could explain the reason behind using it not just what it does.
 
I assume your confusion is on the {3,9} part, which sets length limits on the match. Specifically, I am guessing that you are wondering why a limit of 9 was chosen if the commands that it is trying to match are 7 characters? The only thing I can think of is perhaps the extra characters are to allow for a CR-LF or other character, but I am not sure why. I don't think that it is hurting anything, though.

One thing to note on regular expressions: most regex engines are "greedy", which means that they will attempt to match the biggest string that they possibly can, even if there is a smaller string contained within that is the desired match. By placing the character limits, it is helping to guarantee that the match is constrained. This is reasonable from a security purpose as it helps to prevent a run-away buffer overflow by a malicious request string.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top