A continuous, blocking python interface for streaming Flickr photos

As I explained in my last post, Yahoo! claims their Firehose is a real-time streaming API and it’s not. So to make life a bit easier for app developers I wrote a python wrapper that provides a continuous blocking interface to the Flickr polling API. Effectively it emulates a streaming API by stringing together frequent requests to the flickr.photos.getRecent results. And it’s dead simple.

import PyFlickrStreamr

fs = PyFlickrStreamr('your_api_key_here', extras=['date_upload','url_m'])
for row in fs:
    print str(row['id'])+"   "+row['url_m']

You can download the package from pypi or fork the source code on github. Have fun.

The Yahoo Firehose "feed" isn’t a feed at all

The web has been on a big trend of real-time for the past couple years. Friendfeed was one of the first services to show real-time updates across your social network and real-time feeds took the stage in a big way when Twitter started its streaming API. In April, Yahoo! announced it’s Firehose API claiming “it includes a real-time feed of every public action taken on our network”. The thing is, this isn’t a “feed” or a “stream” in the same sense that Twitter’s streaming API is. It’s a database you can poll with Yahoo’s YQL, an SQL like query language. Sure, the updates may be available in their database in near real-time, but to receive them you need to issue a new request. In fact the only way you know if there are updates is to continuously poll the service. A feed would be something like long-polling with HTTP server push (what twitter does) or PubSubHubbub.

It may be just semantics to some, but this bothers me. To those of us who build applications that publish or consumer real-time information this is a very important distinction. I plan on writing a python library that wraps flickr’s polling API into a “real-time” blocking continuous stream for a project I’m working on. I’ll publish the code on github and post it here when done.