Save us from software that tries to be ‘clever’

Published on 17 January 2008 in , , ,

Since it launched in 2003, Transdiffusion’s MediaBlog has been running on Blogger.

The blog was a bit of an experiment at the time – and with no content management system in place, a quick and dirty implementation in Blogger was deemed by the powers that be, to be the best solution.

Then in October 2005, the first stage of a massive project to move the entire site to Movable Type was started. Two and a bit years on, it’s still not finished but most of the site is now in there. And with that state of affairs, the MediaBlog being on Blogger was looking a bit of an anachronism. It’s been on my todo list to migrate it for at least a year – and today was finally the day.

There is a Blogger Import plugin available for Movable Type, but I had no joy. Thankfully the less clever process is well documented process. The only problem was removing Blogger’s “Labels” code (their term for tags) which they crowbar onto the end of the entry, and a few other bits of Blogger specific code.

Cue lots of fun with regular expressions, sed and after much torment, Perl. But in the end they were gone and and with just a few extra tweaks (like changing usernames to match ones already in our copy of Movable Type), and a quick migration of the templates, I was ready to publish. Few minor tweaks and the new version was live! Only thing left was to tidy up the pieces – move the monthly redirects to new places etc, and have a cursory check through to make sure that all the old Blogger files had been replaced by new ones.

Easy. Or so I thought. In fact it turned out to be a far bigger task that the rest of the process put together.

The problem is all in the file names Blogger and Movable Type publish out to.

Unless you tell it otherwise, Movable Type will just take your entry title, put some dashes or underscores in (depending on your preference), truncate it and away you go. I dutifully set the truncation length to be roughly what I thought Blogger used (turned out I’d made it too long, but most of the file names were fine) and went on a directory crawl.

That’s when I discovered that Blogger tries to be clever. It also takes your entry title, puts some dashes in it and truncates it. But it also does something else.

It strips certain words like “and”, “at”, “a” and “the” from the filenames…

I’m sure someone thought they were doing something clever, but it makes migrating to Movable Type’s style a right pain in the backside – especially as it’s inconsistent. For example, put “and” at the start of your entry title and Blogger won’t publish it out.

In the end I had to auto-generate a batch of potential redirect statements for all 900odd entries and check each one manually.

To add to the fun, in times gone by the site used to use .htm for the files produced. This was later changed to .html but the old files were still in place and still potentially accessible.

As I write, I’ve spent most of the afternoon trying to sort out the mess and it’s still not quite done. As a perfectionist, I’m dammed if I’m going to break any links, or leave old content floating around. But it’s no fun way to spend a day – and all because someone decided their software should be a bit clever…

2 Comments

  • Martin Belam says:

    I’ve also got 900+ redirects in place from having to rebuild the site after two database crashes – and that was just moving from MT install to MT install.
    It’s getting worse than that .htaccess file that controls all the BBC’s TLD re-directs 😉

  • Andrew Bowden says:

    I’ve had fun with Movable Type myself – but at least there’s some element of consistency about what it does – and the new total backup function should help things.
    Finally finished all the changes now and have just totalled up – there’s now 1139 lines in the .htaccess file for the Media Blog. That works out as 1.34 per entry in the blog…
    I know there’s some I could easily do with regexes and things, but they’re the ones I wrote a simple PHP script to generate anyway!
    Other fun I encountered along the way was “and” not being filtered out of filenames by Blogger prior to mid 2004 and the fact that Blogger and Movable Type handle hyphens in titles differently – Blogger puts them into filenames, Movable Type doesn’t. Oh and despite changing extensions from .htm to .html, Blogger still resolutely published entries from before 2005 using the .htm extension, with no .html equivalent.
    On the plus side, the old Blogger version of the site (with its multiple copies of everything, plus Blogger’s “labels” pages – it puts a separate page out for each tag) saw the site come in at a whopping 91meg – the labels pages were 30meg alone. The new Movable Type copy comes in at just 22 meg. And that’s without changing the HTML from tables (yes I know…) to CSS.
    Now I’m just off to check the 404 logs and do a linklint check – try and catch any errors that have snook in.