Archive: May, 2010
Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.
import urlparse import httplib # Recursively follow redirects until there isn't a location header def resolve_http_redirect(url, depth=0): if depth > 10: raise Exception("Redirected "+depth+" times, giving up.") o = urlparse.urlparse(url,allow_fragments=True) conn = httplib.HTTPConnection(o.netloc) path = o.path if o.query: path +='?'+o.query conn.request("HEAD", path) res = conn.getresponse() headers = dict(res.getheaders()) if headers.has_key('location') and headers['location'] != url: return resolve_http_redirect(headers['location'], depth+1) else: return url