A web app can be dropped into a directory anywhere on a web server, without any configuration or user intervention, and still be capable of rewriting application URLs with mod_rewrite. Install scripts are not necessary. We don't need any hard-coded paths for RewriteBase in the .htaccess file. Here's how to do it.
URLs get rewritten in Apache .htaccess files all the time. Sometimes, though, the developer defining the rewrite rules doesn't know where the .htaccess file will eventually end up. It might be right below webroot, or somewhere deep down in the directory structure of the web server. And that's a problem: knowing the exact location is vital for making the rewrite rules work.
Suppose that we are developing an application which is directly distributed to end users - some kind of Wordpress clone, perhaps. We don't want to see the application file, index.php, appear in our URLs. But we have no idea where our end users are going to install the app. There are no guarantees that they will put it in docroot, or any specific subfolder. Neither do we know how the server is set up - maybe it is a shared server using mass virtual hosting. What we do know, however, is that our users would run away screaming if they read about any of this in a setup readme or the FAQ.
So here's the catch. Because our rewrite rules must work in any kind of environment, we need to use the
RewriteBase directive in our .htaccess file and set it to the location of the current directory, relative to docroot.
RewriteBase
doesn't accept any kind of variables or a calculated result. We'd have to hard-code the location of our app here, yet we have no way of knowing what that would be.
I have struggled with this problem for a while, and it turns out there is a solution after all. Takes four lines. Explaining the thinking behind it takes more than four lines, though.
In this example, we are going to hide index.php from application URLs. More precisely, we will rewrite public URLs, which don't show index.php, to internal ones which do.
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond $0#%{REQUEST_URI} ([^#]*)#(.*)\1$
RewriteRule ^.*$ %2index.php [QSA,L]
We can't set the RewriteBase dynamically, so we set it once and for all to the root URL of the server:
RewriteBase /
. This provides consistency, but it also means that we have to establish the url-path to the current directory ourselves and prefix it to the rewritten URL.
So which directory are we in? Let's assume a
REQUEST_URI
of /some/path/app-root/virtual/stuff. Our htaccess is in app-root. If we grab the virtual part - virtual/stuff - and remove it from the
REQUEST_URI
, we are left with the url-path to our app directory.
Capturing the virtual part is straightforward and can happen in the rewrite rule itself.
RewriteRule ^.*$ ...
makes it available in the
$0
variable.
Now we do our little string operation and remove the virtual part from the request URI. We don't have string commands for that, but RewriteCond can match strings and capture substrings. So we'll add a RewriteCond with the sole purpose of extracting the url-path to the current directory. Other than that, the "condition" should stay out of our way and always be true.
We can use the
$0
variable from the RewriteRule in the RewriteCond because mod_rewrite actually processes a ruleset backwards. It starts with the pattern in the rule itself, and if it matches, goes on to check the conditions. So the variable is available by then.
While the test string in the RewriteCond can use the variable, the actual condition regex can't. Inside the condition, we can only use internal back-references. So first, we assemble a test string "[virtual part][some separator][request uri]". The '#' char makes a good separator because it won't show up in the URL. Next, we match it against a condition of
([^#]*) - anything up to the separator, captures the virtual part
# - the separator
(.*?) - anything in the request uri up to what we've captured in group one,
grabs the current directory url-path
\1$ - group one again, ie the virtual part of the request uri
So here's the full condition:
RewriteCond $0#%{REQUEST_URI} ([^#]*)#(.*?)\1$
.
The second captured group in the RewriteCond regex is our location. We just need to prefix it to the rewritten URL with a
%2
reference. That leaves us with
RewriteRule ^.*$ %2index.php [QSA,L]
. Voilà.
This solution works with ordinary virtual hosts as well as mass virtual hosting (using VirtualDocumentRoot). By implication, other aliased locations should be fine as well.
Apache 1.3 is
still around [not anymore] and it chokes on the RewriteCond pattern. Apache 1.3 doesn't support the ungreedy modifier - the "?" in
(.*?)
- and complains about the condition being syntactically invalid. The presence of the .htaccess file causes an internal server error (500).
Luckily, Apache 2 matches ungreedy by default, so the modifier can be omitted. Reducing the second condition to
RewriteCond $0#%{REQUEST_URI} ([^#]*)#(.*)\1$
takes care of the error without changing the behaviour in Apache 2.
Even though the construct doesn't totally nuke Apache 1.3 anymore, it still doesn't work as intended and may produce all sorts of subsequent errors which can be hard to track down. In my opinion, the best way to deal with it is by bailing out before the rewriting takes place. Discussing it in detail is beyond the scope of this post, but it is easy enough. This snippet, inserted before the URL rewriting, does the trick:
RewriteRule .* - [E=APACHE_VER:Apache13]
# Condition matches only in Apache 2 because
# Apache 1.3 doesn't recognize the \w wildcard
RewriteCond word \w$
RewriteRule .* - [E=APACHE_VER:Apache2]
RewriteCond %{ENV:APACHE_VER} Apache13
RewriteRule .* - [L]
Because we have skipped the rewriting in Apache 1.3, it is up to the application to react to it and inject index.php into the URLs, visibly, when the HTML is generated.
In the example above, the location of the current directory was captured in a RewriteCond back reference,
%2
, and used immediately in a RewriteRule. But there is an even better approach: the location can be stored in an environment variable (suggested by commenter #7 - thank you!). That way, detecting the current directory becomes a self-contained step. It is disentangled from any actual rewriting, with cleaner and more readable rewrite rules as a result.
This is what our original example looks like with an environment variable being set:
RewriteBase /
# Store the current location in an environment variable CWD
RewriteCond $0#%{REQUEST_URI} ([^#]*)#(.*)\1$
RewriteRule ^.*$ - [E=CWD:%2]
# Just by prefixing the environment variable, we can safely rewrite anything now
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*$ %{ENV:CWD}index.php [QSA,L]
Rewrite rules always come across as slightly arcane, but this is quite a readable representation of what we have been up to. The actual transformation in the last line, at least, will make sense to most readers now. To my knowledge, what we have arrived at here is the cleanest way to solve the problem.
There is one more issue to consider. Depending on the kind of URLs the application has to handle, the #
separator might turn up in its encoded form %23
. For instance, if a visitor uses the #
character in a site search and if search terms are turned into part of the URL instead of being passed as a query string, that would break the URL rewriting, as commenter #13 pointed out (search/%23oops
).
The solution is to pick a separator which won't show up in a legitimate request. A backslash, as suggested, might be an option if #
is not suitable, though at the expense of readability. For most, the #
separator will likely be fine.
With these steps, it is finally feasible to make a web app independent of the location where it is installed, while at the same time keeping the full ability to rewrite URLs and paths as needed. Best of all, it all works without resorting to a config script or manual user intervention. Just drop the app somewhere and let it do its thing. Users will be grateful, and so will your support mail inbox.
(Last updated December 2024)
Hi,
I was searching for a proper rewrite rule to use in fixing URLs not rendering properly on a new site we just setup and found your post.
For instance, when the following URL is clicked on our website: https://www.websiteworthexplorer.com/facebook.com.html it loads the index page instead of rendering the appropriate URL which is suppose to display FB's worth.
To my understanding, a RewriteBase /InstallationDirectory needs to be placed in the .htaccess file either at the root level or at the directory level where the script we are using is installed.
The website in question is an addon domain and the PHP script was written to be installed on the root level, otherwise the .htaccess file needs to be modified if installed in subdomains by including a RewriteBase /InstallationDirectory in .htaccess file - but am not familiar with how to write this rewrite rules.
So am wondering if one of the rewrite rules you provided above can do the trick? Thanks in advance!
You can give it a shot, it might do what you want. But then again, you clearly know which directories you are working in. So I'd suggest to keep it simple and set an explicit RewriteBase, rather than use the solution I have outlined above.
Hey Michael,
Thank you for the feedback, was able to rectify issue with the rewrite rules 2 days later. It was a relief.
All the best.
It works, but causes infinite loops on Error404 if .htaccess is really in a / root directory not in a subdirectory.
I'm not sure I have completely understood what you are up to, but how did you even get a 404? If a URL doesn't match anything in the file system, the request is redirected to the application - that's the whole point. Your app might generate 404s programatically, of course, but those are forwarded to the client without intervention by Apache.
In case the application file itself isn't there, you do get a loop and a 500 in the end. Fair enough, I think. After all, it's not just some file which is missing - the app is broken.
That said, let me know if I'm missing something here. Perhaps I haven't had enough coffee this morning ;) Something you did with
ErrorDocument
in Apache, perhaps?Sorry! Your solution works fine, was all my fault. had a view() and a getContent() method which called each other again and again if a content wasnt found.
No problem! Glad to hear it works.
Thanks for this Michael, it has given me more of an understanding of the workings of .htaccess and mod_rewrite than I've found elsewhere, although I'm a bit of a novice on these subject (hence the Google search for your page), so would appreciate if you could help me understand the following;
I'm developing a beta site which is currently in a subdirectory of the root of the current live site. Let's say, for instance, that's foo.com/beta - what I'm trying to do is some pretty basic SEF for one of the pages rather than the whole site, so when someone enters the URL foo.com/beta/section/XX it actually calls the page foo.com/beta/page.php?id=XX - the 'section' part of the public URL is purely SEO, so doesn't need to be passed to the page.php script, it's only the ID of the article (XX) to pull from the database that needs to make it through.
Your article would appear to be a solution for me to make the rewrite portable, but I don't know enough about the regex or rewrite conditions to tweak it to achieve my specific goal - any tips?
Thanks,
Pete
That is pretty much what I have described in the example, except that you'd like to get rid of the first subdir in the process. So your RewriteRule expression must capture
^(any-non-slash-chars/(anything))$
.That way, the section before the slash gets thrown away, but the "anything" part - your id - is captured in a subgroup and ends up in the
$2
variable.In proper regex terms, the first part - capturing any non-slash characters - translates to
[^/]+
, and the second part to.*
. So your RewriteRule should look like this:The other lines stay as they are. Put it in your beta directory, and you are good to go.
hi,
seems quite interesting, but # are comment indicator... so i guess, thats the reason why it doesn't work on my apache :)
ah sorry, my fault... it works well, but only with
RewriteCond %{REQUEST_FILENAME} !-f
can that be changed somehow? because i want to prevent direct access to any file.
You'll need to use a narrower filter, then. Replace
!-f
with what suits you best. You must always allow direct access to your application file, though. Assuming the file is calledindex.php
, filter it out with!index\.php$
. Every other request will get routed to your app.You are probably aware of the pitfalls, but just in case, and for anyone having the same idea: If you pipe everything through PHP, make sure that you don't keep any static assets in the directores below. It should just be PHP code, nothing else. Having your app process image requests, for instance, is nothing short of abysmal for performance (and you have to set MIME types, caching headers etc yourself, whereas Apache handles it out of the box).
hi,
I think allowing direct access to any file is a bad practice, because it includes also executables like php files.
i commonly handle it this way: i have a specific folder, that contains all the assets (css, js, ..) and graphics. files, where direct access is (probably) nothing harmful. and for this folder, it's more preferable to make own rewriteConds and Rules and put that before your special one.
br
I am not sure I got why you'd want to block access to PHP files in each and every case, but never mind. Glad you got it working.
Hi, I tried to use your tip, but it doesn't work for paths of type : https://localhost/public/mywebsite/arg1/arg2
It works when i go on https://localhost/public/mywebsite/ but not in the other case ... it always tries to get the file arg2 in the folder arg1.
How can i fix it ? thanks in advance. Regards
Sorry for the late reply, your message got caught up in a ton of comment spam.
I'd assume it is the first line which is causing you trouble:
RewriteCond %{REQUEST_FILENAME} !-f
. In plain English, it says "Rewrite the URL only if the requested path does not exist as a regular file in the file system". If the URL points to an existing file, it does not get rewritten. (URLs of existing directories do get rewritten, however.)I can't tell from your description if that's what is tripping you up, but it is a likely candidate. You'd need to tweak the condition, then. Have a look the comments above, try
RewriteCond %{REQUEST_FILENAME} !index\.php$
for the most radical approach.Or perhaps you have .htaccess files in more than one directory in your path? The rules might not even get applied then. Have a look at this Stack Overflow question if .htaccess precedence could be an issue.
An awe-inspiring hack. :-)
Thanks! :)
Awesome trick! Though, since the location of the .htaccess file doesn’t usually change during its execution, why not put the result of your code into an environment variable for all
RewriteRule
s to access?That way, you can use your dynamic rewrite base in every
RewriteRule
without restrictions on the pattern they match.Excellent idea, thank you! I have updated the post to reflect your suggestion.
Gracias! :)
Great ! Thank you, very very useful and handy !
Actually the rule "RewriteCond $0#%{REQUEST_URI} ([^#])#(.)\1$" will not work when used with an url having multiple successive '/'
For example : https://mywebsite///base///to/////////home////// will not work because $0 will be /base/to/home/ and REQUEST_URI will be ///base///to/////////home//////
Is there a way to easily remove duplicated '/' from %{REQUEST_URI} ?
Thanx !
I’m not sure this is the most efficient solution, but you can redirect the client to a more correct URI until the URI is sanitized:
This should redirect ///base///to/////////home////// to /base///to/////////home//////, then to /base/to/////////home//////, then to /base/to/home////// and finally to /base/to/home/.
Slash duplication didn't derail the regex when I tried, everything worked as expected. But I guess it depends on the the server setup. It's certainly safer to fix the request.
The idea of commenter #7 (hi there, and thanks!) is a straightforward approach. Unfortunately, a series of redirects will slow down your site a lot, particularly on ac mobile network. You can avoid that overhead, but for that you have to decide how often multi-slash separators are allowed to appear in a URL, and set a reasonable limit.
If you go down that route, you can clean up the separators and store the result in an environment variable, perhaps like this:
You just have to use
%{ENV:CLEAN_REQUEST_URI}
instead of%{REQUEST_URI}
from here on out, and that's it. Maybe there's a more elegant or efficient solution, but at least it should work.Thank you Michael & commenter #7 for your answers! I will give a try.
Just that you understand where my problem came from :
I have following rule : RewriteRule ^(.*)$ index.php?params=$1 [QSA,L]
because I built an mvc application and used urls as following : basepath/controllername/actionname/param1/param2/.../paramN
But when one parameter was empty (for example param1) I had : basepath/controllername/actionname//param2/.../paramN
Now I check these cases and replace empty parameters with a space : basepath/controllername/actionname/+/param2/.../paramN
Generally I don't know how many parameters I can have, so solution from commenter #7 suits me better ;)
GENIUS GENIUS GENIUS YOU SAVED MY LIFE THANK YOU VERY MUCH
You are welcome :)
Hi Michael,
This is a great solution, works like a charm for portability of apps regardless of context root :-) Thanks for posting it!
I was wondering what's the license like and whether I could use it for an open source project of mine licensed under MIT?
Absolutely, you can use it! Makes me happy to see it used in an open-source project!
I've never thought about a license because it is just such a short snippet, but you are right, any non-trivial code probably needs one even if it is just a few lines of code. So let's keep everyone happy and make it MIT. Enjoy :)
I want to prevent users from try
example.com/subdir/... Or example.com/bla bla bla ....
allow just example.com !!!!!!
I am using the example above its work fine But if I try to add subdir or bla bla I get "Not Found"
In this case try: RewriteCond $0#%{REQUEST_URI} ([^#])#(.)/\1$ RewriteRule ^.*$ - [E=CWD:%2/]
P.S. I prefer to use the \ character instead of # in condition rule: RewriteCond $0\%{REQUEST_URI} ([^\])\(.)/\1$
For example:
https://www.example.com/search/%23test (%23 = # encoded)
This is a pretty common situation and this will not work. Because:
$0 = %23test %{REQUEST_URI} = /search/%23test $0#%{REQUEST_URI} = %23test#/search/%23test
And with: ([^#])#(.)\1$ \1 = empty %2 = test#/search/%23test
And you wanted %2 to be /search/
So using # as a separator is a bad choise! So don't use this solution to avoid problems in future.
Comments are disabled.
Comments have primarily been disabled because of a flood of comment spam. Turning them off has also been an easy way to comply with EU privacy and data protection regulations. User nicknames have been replaced by anonymous placeholders. All data relating to the original commenters has been deleted.