April 1, 2009
Ok, this post is going to get a little technical. It’s even going to include some open source code, which if you’re so inclined, you can use under the Creative Commons Public License. But before I bore you with the techical details, let’s talk a little about some interesting product trends that have really taken off lately, and why, if you’re not careful, you could end up shooting yourself in the foot.
Short URLs worth $46 Million?
You already know I’m a big fan of Twitter. Well, one of the interesting side effects of the rise of twitter is there has been a comparable rise in the use of URL shortening services like TinyURL. In fact the recent news that bit.ly raised a $2million A round suggests that venture money is even taking notice of these services. TechCrunch recently speculated that TinyURL, the dominant player in this space, could be worth as much as $46million dollars.
Why are services like TinyURL getting so much attention? Well, since Twitter limits posts to 140 characters, most users try to save as much space for content as they can, by using URL shortening services for any embedded links. Twitter will even automatically use TinyURL if your tweet includes long URLs.
Worth more, BUT Worthless for SEO
The problem with using these services is that your domain loses the SEO juice normally associated with inbound links. While it’s true that most links on Twitter pages are tagged as “nofollow” which limits their SEO power somewhat, one thing is certain, an link to tinyurl.com or bit.ly or any other url shortening service will never give you SEO juice. However, if you could include shortened url links to your own domain you have an opportunity to have that link copied and pasted around the internet giving you direct traffic and SEO link power to your domain.
So, hopefully I’ve convinced you that you don’t want to continue to give away links to generic URL shortening services, and you’re ready to tackle making your own URL shortener. Here’s where we’ll get a little more technical. If you’re not a developer, then have no fear, just take notes and send your dev team to come read the article and add url shortening to your product road map.
How to Build a URL Shortening Service
Developers, let’s talk design for a second. Assuming your building out your own web property and you’ve done some integration with Twitter, Facebook, or other social networking platforms, then you’re probably already familiar with the APIs available from services like TinyURL to post URLs and get back shortened forms.
These services have a much more challenging problem then you do. They need to shorten URLs from an infinite number of domains, and more importantly they need to support arbitrary resources from those domains. Granted, its not too hard to implement a solution for this, you can basically hash the URL into a sufficiently sparce id space and get a unique idenitfier for each URL. A truncated MD5 hash is probably a good solution. Then store the hash in a database, and whenever someone requests the shortened URL you can do a lookup and return the correct long URL from the database.
There are some great services out there that have taken this idea to the next level, and even include analytics, click through tracking, toolbars that frame the content and allow comments, bells and whistles galore. But what they make up from is splash they sacrafice in SEO juice and direct references to your domain. And so you’ll want a solution you can host off your domain.
In all likelihood, you’re content is probably database driven already, and so each potential URL is probably already associated with some unique asset ID. So instead of implementing a one-way database based solution that maps arbitrary URLs into short URLs, you could implement a solution that maps your asset IDs into short URLs.
For example, let’s say you wanted to do this for a WordPress blog, or even a WordPressMU blog network. Since Sweat365.com is based on the WordPressMU core, we developed a solution that allows us to map any blog post in our network onto a shortened URL in our domain.
Our goal was to implement a solution that wouldn’t require a new database table mapping between short and long URLs. We wanted a two way programatic solution, so that we could map to and from shortened andlong URLs with only the characters of the URL. Since we already had asset IDs to work with (in this case a blog_id and a post_id) we could map those IDs into a compressed form like a base46 encoding.
base46, You mean base64? No, actually base35!
What do I mean by base 46? Well imagine a numerical set that is made up of all the digits 0-9, all of the alpha characters a-z, and the 10 URL safe characters: “$-_.!*’(),” allowed by RFC 1738 (the URL spec). If you used those characters as digits for your encoding, then you’d have 46 characters to work with, and you’d be able to encode your asset IDs in base 46. It’s a pretty good system as, the asset ID 9,999,999,999 would be shortened to “f’*ip21″ and so most platforms could save a lot of space on URLs.
There are a couple of gotchas though. First of all, you might want to think about what happens when an asset ID like 1973507, 60546, or 2861642 randomly pops up and your encoded url ends up with words that might be considered offensive. I’ll go ahead and let you figure out what those IDs would encode into, but suffice to say, we wanted to protect against that. One solution, which we ended up choosing, is to simply remove the vowels from the allowed character set. It’s pretty hard to come up with randomly generated dirty words if you have no vowels to work with.
The second problem you might notice is that even though $-_.!*’() and , are allowed in URLs, they are rarely used and as such both Twitter and Facebook get confused when they see these characters in a web url, and they will truncate the link at the character that confuses them. Through testing we determined that ‘-’ ‘_’ and ‘.’ are really the only special characters that Twitter and Facebook allow in URLs.
So, if you just use the digits, the consonants, and the characters ‘-’ ‘_’ and ‘.’, you end up with 35 characters to work with. And as a result, you can encode your asset IDs in base 35. Now, 9,999,999,999 becomes “sb5.fh5″, which is still pretty short, and certainly moves you toward your URL shortening goal!
Quit Your Jibber Jabber, Give Me Some Code, Fool!
Ok, ok, so here’s a link to a WordPressMu plugin that will do URL shortening in base35. There are a couple important caveats. First of all, it’s only designed to work in WordPressMU, not WordPress. Second, it’s only designed to work in WPmu running in VHOST mode. And finally, this code is licensed under the Create Commons, Attribution-NonCommercial-ShareAlike 3.0 Unported, license, and so what that means is that you are free to use this for non-commercial purposes under the following restrictions: you must attribute the work in the manner specified by the licensor, you may not use this work for commercial purposes, if you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one, and for any reuse or distribution, you must make clear to others the license terms of this work.
The code is pretty self explanatory, but it’s also got tons of comments. Out of the box it will trap and redirect any shortened URLs that reach your server. In order to encode long URLs into shorter ones, you should can call either kmxt_url_to_shorturl($url) or kmxt_shorturl_from_ids($blog_id,$post_id) from somewhere else in your WordPress code. For example, if you’ve implemented a twitter auto-tweeting plugin, you could replace your calls to TinyURL with a call to this shortener.
Good luck and Happy URL Shortening!