At work, I use Mediawiki for internal and external knowledge bases. This is distributed to both employees and customers. Our KB’s are large, at a few thousand articles each. We have been using these for a number of years, recently we began to run into a problem where people had difficulty locating a page. This was due to a new method of us distributing URL’s on printed media as opposed to online media. For example, we had a page like:
http://wiki.mydomain.com/SuperPage/
However, people would type in :
http://wiki.mydomain.com/Superpage/ and end up at a “Page does not exist, create a new page” screen. This was problematic as people would feel like they were typing everything in correctly. This is a non-desirable configuration and I decided to look for a solution.
I figured that I could do a quick Google search and solve this problem in a similar matter to the ‘remove index.php from URL’s’ problem. However, I found very little feedback online for this problem - when I did locate information, it was sketchy at best. I found the following URL useful to find out how the case sensitivity worked:
http://en.wikipedia.org/wiki/Help:Page_name#Case-sensitivity
This page has a bunch of people having this same problem:
http://meta.wikimedia.org/wiki/Case_sensitivity
There are several methods using mod_rewrite to make URL’s either all lowercase or all uppercase, however this would not work as we had pages that had a mix of both, and without renaming ALL pages to be lowercase, using tricky regular expressions you might think we could uppercase the first letter of each word. Yes, however we had some pages that used syntaxes like this (do to product names):
http://wiki.mydomain.com/MYPage
I started to dig through the code to look at why this was being caused. Title.php is where most of the crunching related to this gets done. I found that the queries that are used to retrieve the title are very trivial.
“select page_id, page_title from page where page title like ‘MYPage’”
I thought to myself, being an avid MySQL user - that using the ‘like’ matching in the where clause - that it should not be case sensitive. I did a bit of testing in MySQL Manually:
mysql > select page_id, page_title from page where page_title like 'mypage';
Empty set (0.26 sec)
mysql> select page_id, page_title from page where page_title like '%MYPage%';
+---------+------------+
| page_id | page_title |
+---------+------------+
| 2173 | MYPage|
+---------+------------+
1 row in set (0.01 sec)
How annoying! I started to dig a little deeper, suspecting that my SQL skills were failing me, and my assumptions about how ‘like’ worked was incorrect. Looking at the ’show create table page’ I noticed something pecular:
The Column:
`page_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '',
The index:
UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
I have never seen this ‘collate’ command in a create table before. A quick search online found that the ‘collate latin1_bin’ actually forces the column to be case sensitive! This was great news, as I found the source of the case sensitivity.
I copied the mediawiki db to a test db, and ran some tests. After performing an alter table on the column, I was able to use none case sensitive URL’s!! Here is what I did, the key drop is required for the alter table:
alter table page drop key name_title;
alter table page modify column page_title varchar(255) NOT NULL default '';
After performing this command on the mediawiki database, URLS would now work, in a non case-sensitive manner! IE:
http://wiki.mydomain.com/MYPage
would return the same results as:
http://wiki.mydomain.com/mypage
I do understand that this is a VERY hacky solution, however - it does get the job done. I have a list of possible caveats for this solution:
- Speed: Losing this key will cause a significant decrease in speed to the queries building the page titles, this will be more apparent on pages that have LOTS of links on them. On a page with 300 links, I did not notice this. Our wiki gets a few thousand hits a day and the page load difference was negligible.
- Page Titles: Title.php pulls the title displayed in <title> and on the Heading from the actual argument passed to index.php, so if you load a page like MYPAGE, the <title> will display it as such, this is NOT a redirect.
- Searching: Well, again I haven’t noticed a difference after a few weeks of running our wiki’s like this. However, the search box on the right interestingly enough does a POST with all CAPS, so if you hit a page dead-on with the search box it will be displayed in ALL CAPS.
- Internationalization: I don’t use any international or other ‘odd’ characters that may be supported by latin but not by varchar()
Again, I MUST STRESS that this is a hacky solution that may or may not work for you, dump everything to a temporary DB before doing anything permanent! mysqldump is your friend.