Tag Archives: python

The Yahoo Firehose "feed" isn’t a feed at all

The web has been on a big trend of real-time for the past couple years. Friendfeed was one of the first services to show real-time updates across your social network and real-time feeds took the stage in a big way when Twitter started its streaming API. In April, Yahoo! announced it’s Firehose API claiming “it includes a real-time feed of every public action taken on our network”. The thing is, this isn’t a “feed” or a “stream” in the same sense that Twitter’s streaming API is. It’s a database you can poll with Yahoo’s YQL, an SQL like query language. Sure, the updates may be available in their database in near real-time, but to receive them you need to issue a new request. In fact the only way you know if there are updates is to continuously poll the service. A feed would be something like long-polling with HTTP server push (what twitter does) or PubSubHubbub.

It may be just semantics to some, but this bothers me. To those of us who build applications that publish or consumer real-time information this is a very important distinction. I plan on writing a python library that wraps flickr’s polling API into a “real-time” blocking continuous stream for a project I’m working on. I’ll publish the code on github and post it here when done.

Resolving HTTP Redirects in Python

Since everyone is using short urls these days and sometimes we just need to know where that URL leads I wrote this handy little function which finds out for us. Redirection can be a kind of tricky thing. We have 301 (“permanent”) and 302 (“temporary”) style status codes and multiple layers of redirection. I think the simplest approach to take is whenever the server returns a Location http header and the value in that location field is not the same as what you made the request to, we can pretty well be sure that it’s a redirect. The function below uses the http HEAD verb/method to request only the headers so as not to waste bandwidth and recursively calls itself until it gets a non-redirecting result. As a safeguard against infinite recursion I have a depth counter.

import urlparse
import httplib

# Recursively follow redirects until there isn't a location header
def resolve_http_redirect(url, depth=0):
    if depth > 10:
        raise Exception("Redirected "+depth+" times, giving up.")
    o = urlparse.urlparse(url,allow_fragments=True)
    conn = httplib.HTTPConnection(o.netloc)
    path = o.path
    if o.query:
        path +='?'+o.query
    conn.request("HEAD", path)
    res = conn.getresponse()
    headers = dict(res.getheaders())
    if headers.has_key('location') and headers['location'] != url:
        return resolve_http_redirect(headers['location'], depth+1)
        return url