Search

Domino Upgrade

VersionSupport end
5.0
6.0
6.5
7.0
8.0
8.5
Upgrade to 9.x now!
(see the full Lotus lifcyle) To make your upgrade a success use the Upgrade Cheat Sheet.
Contemplating to replace Notes? You have to read this! (also available on Slideshare)

Languages

Other languages on request.

Twitter

Useful Tools

Get Firefox
Use OpenDNS
The support for Windows XP has come to an end . Time to consider an alternative to move on.

About Me

I am the "IBM Collaboration & Productivity Advisor" for IBM Asia Pacific. I'm based in Singapore.
Reach out to me via:
Follow notessensei on Twitter
(posts)
Skype
Sametime
IBM
Facebook
LinkedIn
XING
Amazon Store
Amazon Kindle
NotesSensei's Spreadshirt shop
profile for stwissel on Stack Exchange, a network of free, community-driven Q&A sites

« Whiteboard Selling | Main| Graceful degradation »

Hyperlinks need to live forever - Blog edition

QuickImage THE bummer mistake in any web revamp is a total disregard for page addresses. The maximum to be found is a nice 404 page with a notice that things have been revamped and the invitation to search. What a waste of human time and disregard for a site's users!
The links to the original page live outside the sites control and Jacob already stated in 1998 Pages need to live forever. So what could you do when swapping blog platforms?
If your new platform runs behind an Apache HTTP server (also known as IHS), there is mod_rewrite that allows you to alter incoming addresses (the old links) into the new destinations based on a pattern match (other http servers have similar functions, but that's a story for another time).
HTTP knows 2 redirection codes:
  • 302 for temporary redirections
  • 301 for permanent ones.
You want to use the later, so at least the search engines update their links.
Now your new URL pattern most likely uses a different structure than the old one, so a simple Regex might not help for that transition. E.g. your existing format might be /myblog.nsf/d6plinks/ABCDEF while the new pattern would be /blog/2001/10/is-this-on.html.
For this case mod_rewrite provides the RewriteMap where you can use your old value (ABCDEF in our case) to find the new URL. Unfortunately mod_rewrite is very close to dark magic. It can be simple from a key/value lookup up to invoking an external program to get the result. For the key/value lookup you need make your key case insensitive, so all the possible case variations work. This is what I figured out:
RewriteEngine on
RewriteMap lowercase int:tolower
RewriteMap blog-map dbm:/var/www/blogmap.map
RewriteRule ^/myblog.nsf/d6plinks/(.*) /blog/${blog-map:${lowercase:$1}} [NC,R=301,L]
Let me pick that into pieces for you:
  1. RewriteEngine on
    This switches the rewrite engine on. It requires that mod_rewrite is loaded (check your documentation for that)
  2. RewriteMap lowercase int:tolower
    This enabled an internal conversion of the incoming string into its lower case format
  3. RewriteMap blog-map dbm:/var/www/blogmap.map
    This defines the actual lookup. The simplest case would be a text file with the key and result in one line separated by a space. However that might not perform well enough for larger numbers of links, so I choose a indexed table format. It is very easy to create, since the tool is included in the Apache install. I generated my translation list as text file and then invoked httxt2dbm -v -i /var/www/blogmap.txt -o /var/www/blogmap.map and the indexed file is created/updated
  4. RewriteRule ^/myblog.nsf/d6plinks/(.*) /blog/${blog-map:${lowercase:$1}} [NC,R=301,L]
    This is the rewrite rule with a nested set of parameters that first converts the key to lower case and then looks up the new URL. If a key isn't found it redirects to /blog/ which suits my needs, you might want to handle things different.
    In detail:
    1. ^/myblog.nsf/d6plinks/(.*) matches all links inside the d6plinks, the () "captures" ABCEDF (from our example), so it can be used in $1
    2. ${lowercase:$1} converts ABCDEF into abcdef
    3. ${blog-map: ... } finally looks it up in the map file
    4. [NC,R=301,L] are the switched governing the execution of the rewrite rule:
      • NC stands for NoCase. It allow to match /MyBlog.nsf/ /MYBLOG.NSF/ /myblog.NSF/ etc. It doesn't however convert the string
      • R=301 issues a permanent redirect response (default is 302, temporary)
      • L stops the evaluation of further redirection rules
As usual YMMV

Comments

Gravatar Image1 - Thanks for the nice tutorial on the rewrite rules. However, if a site is heavily reorganized there's often not much you can do with pattern matching. Sometimes you need individual redirects for the most popular pages and a good site map for the rest.

Gravatar Image2 - David,
I respectfully disagree. All reorganisations I came across change substantially in the base path - and that is all you need. The MAP will then help you to match an arbitrary old page to the exact new page.
AND the preservation of the old links needs to be an integrated part of the revamp considerations. Most commonly you actually CAN identify what pages are old by using a pattern. What never works is to use a simple patterns transformation to get from old to new. But that is eaxtly the point of this blog post: once you got hold of an old page you simply look it up.
Of course that requires the creation of the mapping table in the revamp project. Without that mapping nothing will work.
A variation of this approach would be to alter the 404 page and lookup if there is a replacement page.

Gravatar Image3 - Tim Berners-Lee in 1998: Cool URIs don't change
{ Link }

Gravatar Image4 - @David: mod_rewrite is voodoo. Damned cool voodoo, but still voodoo

Using the RewriteMap directive you could even handcraft or pre-generate subtitution tables so you do not have to only rely on regular expresions. With that you could cover the most complex sites and have no excuse to send your users to a 404 page. Emoticon

Disclaimer

This site is in no way affiliated, endorsed, sanctioned, supported, nor enlightened by Lotus Software nor IBM Corporation. I may be an employee, but the opinions, theories, facts, etc. presented here are my own and are in now way given in any official capacity. In short, these are my words and this is my site, not IBM's - and don't even begin to think otherwise. (Disclaimer shamelessly plugged from Rocky Oliver)
© 2003 - 2017 Stephan H. Wissel - some rights reserved as listed here: Creative Commons License
Unless otherwise labeled by its originating author, the content found on this site is made available under the terms of an Attribution/NonCommercial/ShareAlike Creative Commons License, with the exception that no rights are granted -- since they are not mine to grant -- in any logo, graphic design, trademarks or trade names of any type. Code samples and code downloads on this site are, unless otherwise labeled, made available under an Apache 2.0 license. Other license models are available on written request and written confirmation.