Rules of Web Redirects
Published on 9 February 2009 in Web Development, internet, Planet Bods, web development, web standards
It’s been roughly 12 years since I first unleashed a website onto the world. The original site had a catchy url – www.dur.ac.uk/~d62b7h/, but later bits moved onto Geocities and Fortune City before finding a stable home at users.durge.org/~bods/ until I bought planetbods.org back in August 2000. For good measure, my blog spent many years sitting on bods.me.uk before I finally decided this was pointless, and shoved it all the one domain.
One of the legacies of a long lasting site is that you end up moving things around a bit. Of course anyone who knows anything will tell you that a good URL should not need to change if you get it right in the first place. But frankly that’s nigh on impossible to actually achieve all the time. You don’t know how websites will change over the years – how your content will vary; how some big new idea will come along and dominate; or indeed how a big new idea will fail and take loads of other stuff with it.
What I did spot whilst trawling through the 757 lines of redirects (some for individual pages, some applied to whole directories) in my .htaccess
files, was some common themes. Some can be learnt from, and some will never be avoided.
Reason 1 – Change in technology
The BBC is currently introducing PHP on many of its new sites, like it’s Topics pages. Looking at the URLs you’d never know – there’s no file extensions on any of them. The technology is well hidden.
This is in sharp contrast to its older pages, which all use SHTML and are pretty clear from URLs like www.bbc.co.uk/info/policies/switchover.shtml
.
Technology change accounts for a substantial proportion of my redirects. My original site was built as flat HTML files so had a .html
extension, which continued when I began using the Hitop HTML pre-processor (where you had to run a batch file to “compile” your pages!).
Later on, I moved to Hitoplive which did the compilation on the fly, so everything became .live
files.
The redirects for these scenarios are pretty simple – in one line swap every occurrence of .html
for .live
and you’re away.
However short, it’s a change that has an impact. Whilst compiling my 2008 Review of the Year I had to do a pile of data munging because during the year I’d ditched Hitoplive from this blog and rebuilt it in PHP. The result was that all the posts before the summer had two URLs listed – one with a .live
at the end, one without.
But it’s a problem I won’t be having again, as, like the BBC, I’ve been removing file extensions from my site, as I’ve made the move (slowly but surely) to PHP. If, in the future, I switch from PHP to something else, I just flick the switch behind the scenes and the URL can stay the same.
So that leads to the obvious Tip 1 – if you’re building a new site, don’t bother with the file extensions!
Reason 2 – Technology Failure
Now here’s an interesting one to analyise. I use blogging software to power this site (Movable Type if you must know). Now everyone knows that software that manages websites are good and proper.
Getting a management system that produces proper and good URLs is not always the case, as can be seen by trawling the URLs of many a big news website.
But Movable Type’s pretty good and consistent. Isn’t it?
Ironically around half the redirects on my site are actually caused by Movable Type. It’s not a structural problem – blogs tend to have quite easy to define structures. It’s a file name problem.
The reason is actually a legacy one – I was an early Movable Type user, adopting it in Movable Type 2 I think it was. Back in those days, filenames were based pretty much on the title of the blog post. That’s fine for a title like “Please Wait” which gets converted to please_wait.live
(or whatever) but I’m a rather wordy person who ends up with titles like “Bods’s Blog gets what everyone else had decades ago – comments” or “Oh dear, no, not another bloomin’ blog post about Google Chrome? I mean! Come on! How many does the internet really need????”. Now imagine that as a URL�
I didn’t end up with one that bad, but looking through, I did have the incredibly long
www.planetbods.org/blog/2003/06/03/yes_freeserve_i_will_send_my_password_to_you_its_such_a_sensible_idea.live
Not until much later did Movable Type introduce truncation to stop them becoming rather ugly and cumbersome. However that was applied retrospectively too, requiring redirects for all the old pages to their new, much shorter URLs.
There’s a similar problem too when you convert between blog software – when I moved the Transdiffusion Media Blog from Blogger to MT, I initially thought all the URLs looked pretty much the same. Only later did I find out that when converting a title into a filename, one threw away hyphons and one put them in.
That leads to Tip 2 – when using software to generate your URLs, get something that generates sensible (and replicable!) URLs from day 1
Reason 3 – Organisational Problems
Of course one of the major causes of redirects is simply site re-organisation. And this is one that’s almost impossible to avoid as websites change and vary. My original site didn’t have a blog, and was just a motley collection of random, dis-organised content. Over the years more and more content was added, and occasionally some removed completely. Structures that work in 1999 don’t necessarily work in 2002 or 2005. No matter what you do you’ll never avoid the fact that your website grows and changes in ways you don’t know or anticipate.
The tip instead is just to try and reduce the impact – and this is where it ties in with Tip 2, because Tip 3 is… change the directory, but keep the filenames of everything in the directory the same.
Not always going to be possible – you might split pages apart, or recompile them in a whole new way. But if you can keep the original file names, but just put in a different directory, you’ll have a lot less redirects to maintain.
Why bother at all?
For me as a user, there’s nothing worse than landing on a webpage only to be hit with a terse message informing you that the website has been redesigned, without telling me where the new content is (I generally don’t bother and bog off elsewhere).
Linkrot is one of the curses of the internet as a result, and causes confusion, disappointment and anger in various degrees. Which is why I’ve always gone for the mantra of trying to ensure every link, no matter how old, will still work, just in case. Well as many as I can anyway – the University of Durham killed off my original site off a few days after graduation. But certainly since around 1999, I’ve done by best to keep up.
And it’s worthwhile doing because if you don’t, you’re potentially losing visitors.
Old URLs have a surprising habit of still appearing, many years later. You’ll still find links to www.planetbods.org/tv/tynetees even though the site itself moved to the ttlp.org.uk domian in 2002 and then became Transdiffusion’s City Road in 2007.
But thanks to a little divergence and management, I still get those visitors even though the originating web page must be about eight years old.
Of course that’s why I have a huge swathe of redirects on my site that I need to maintain. But by learning the mistakes of the past, we’ll cut down on the mistakes for the future. I’ll end up with a few more redirects in the site as I move the final pages over to PHP, but hopefully then, there will be a lot less.
Well, at least, that’s the aim…