Resolving HTTP Redirects in Python

Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.

import urlparse
import httplib

# Recursively follow redirects until there isn't a location header
def resolve_http_redirect(url, depth=0):
    if depth > 10:
        raise Exception("Redirected "+depth+" times, giving up.")
    o = urlparse.urlparse(url,allow_fragments=True)
    conn = httplib.HTTPConnection(o.netloc)
    path = o.path
    if o.query:
        path +='?'+o.query
    conn.request("HEAD", path)
    res = conn.getresponse()
    headers = dict(res.getheaders())
    if headers.has_key('location') and headers['location'] != url:
        return resolve_http_redirect(headers['location'], depth+1)
        return url