I recently read this article on TechCrunch by James Altucher entitled “Why Entrepreneurs Should NOT Buy Homes“. In discussing the topic with some friends, I was turned on to this article in the Wall Street Journal by economist Robert Bridges showing statistically why home ownership is almost always a poor investment. The idea rings true to me. Altucher focuses on Entrepreneurs, but I believe his argument could be extended to almost anyone and it has to do with specialization of skill.

You could summarize Altucher’s article in two core points:

  1. Entrepreneurs should not own homes because the investment is illiquid meaning you can’t use the money for other things when you need it. In the same sense, when you might need to pick up and move to a new city for a job or your level of income changes it can be difficult and expensive to sell your house.
  2. Owning a home takes time, new skills, and adds a lot of stress that distracts your from things that are most important: being really good at what you do and enjoying your life.

Altucher’s argument is similar to the idea of specialization of skill and division of labor. Economic prosperity and development really took off when people stopped doing everything for themselves and started to focus more on what they’re good at. If you’re good at making furniture it’s more efficient for you to spend your time doing that and investing money in better tools rather than spending half your day farming and investing money in farming equipment. Leave the food to someone who does that for a living and sell him some furniture in exchange. Likewise, leave the property management and investment to someone who does that for a living and focus your time and money on being better at whatever it is you do – be it startups or photography or teaching economics.

If, on the other hand, you really ENJOY fixing up houses and aren’t worried about moving soon, it could make sense to buy a house for personal reasons, but less as an economic investment. Also, this argument ignores the psychological aspects of lack of disciplin for investing controlled by forced mortgage payments, but a similar effect can maxing out your 401k and other investment strategies.

Here’s a simple sigmoid function showing a logistic curve written in javascript

function sigmoid(t) {
    return 1/(1+Math.pow(Math.E, -t));
}

This tutorial is intended for beginners who aren’t familiar with EC2 yet, but are generally familiar with mongoDB. EC2 is actually pretty easy, but a lot of the basic info you need to get started is interspersed across numerous websites and articles. This post hopefully puts all the necessary details in one place.

The first thing to understand is that every EC2 instance runs an AMI (Amazon Machine Image) which is basically a bundle of one or more EBS (Elastic Block Storage) snapshots. The physical machine that your instance is hosted on has build in hard drive space, but it isn’t persistent. When you shut down or reboot the server whatever is on that disk will be wiped. Amazon already has a database of community AMI’s including basic Ubuntu installs. We can use one of these, then install the necessary packages, update configs, etc. and save the configured snapshot as our own AMI. Problem is, when you search the community AMI’s for ‘ubuntu’ you get some 500 results, so which one do we pick? http://alestic.com is a good resource for things related to EC2 and Ubuntu and they have a list of ‘official’ AMIs from Canonical. I’m basing my EC2 instance in amazon’s us-east1 data center so the AMI identifier for Ubuntu 11.04 EBS 64bit is ami-1aad5273. If your EC2 instances are located somewhere else, you’ll need the corresponding AMI identifier for that data center, which can be found on alestic.com

To start off, you can follow the EC2 getting started guide, except instead of the Basic Linux AMI you can use the Ubuntu AMI that I mentioned above. There’s also no need to terminate the instance at the end since we’ll just roll right into customizing this instance for MongoDB.

I like to start but getting any system updates that have come out since the AMI was created:

sudo apt-get update
sudo apt-get upgrade

I also like to install the linux tools dstat and htop to monitor system performance.

After following Amazon’s Getting Started Guide you should have a blank Ubuntu box and be SSH’ed into it. The linux root partition is usually an EBS volume and I like to make a second EBS volume that I can mount for just the mongodb database directory. This way I can detach the database volume and move it to another running instance. So go into the AWS Management Console and click on Volumes on the left. Create a new volume that has ample space for your database. You can’t resize these things so leave room to grow. After you create the EBS volume you need to attach it to your EC2 instance and define a mount point. I usually use /dev/sde.

Next, let’s log into the EC2 instance by ssh. We need to format the new volume, mount it, and add it to /etc/fstab so it auto-mounts when we restart. (note: on Ubuntu Natty 11.04 the drive ends up appearing as /dev/xvde, but on older systems and other flavors of linux it might still be /dev/sde)

sudo mkfs -t ext4 /dev/xvde

I’m going to mount my new volume at /db

sudo mkdir /db
sudo vim /etc/fstab

add the following line to the bottom of your /etc/fstab

/dev/xvde        /db     auto    noatime,noexec,nodiratime 0 0

We can either restart to auto-mount it or we can manually mount it now using

sudo mount /dev/xvde /db

Now lets install mongodb. Here are the official docs.

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo vim /etc/apt/sources.list
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
sudo apt-get update
sudo apt-get install mongodb-10gen

sudo mkdir /db/mongodb
sudo chown mongodb:mongodb /db/mongodb

Now lets edit /etc/mongodb.conf and change the location of the database. Near the top change dbpath so it looks like this:

dbpath=/db/mongodb

I also like to change my oplogSize to something larger than the default so if a secondary instance is down I have longer to bring it back up before it becomes too stale to re-sync. I also recommend turning on journaling to prevent data corruption.

oplogSize = 10000
replSet = myReplicaSet
journal = true

If you’re using a hostname in the replica set configuration instead of the IP address, you need to configure that in /etc/hostname and /etc/hosts

/etc/hostname:

db1

/etc/hosts:

127.0.0.1     db1   db1.mydomain.com    localhost.localdomain    localhost
xxx.xxx.xxx.xxx    db1   db1.mydomain.com

(where xxx.xxx.xxx.xxx is this machine’s IP address that you use in the replica set config. Usually the elastic IP.)

After changing hostname information you’ll need to restart the instance for it to take affect.

You need to add a hole in the EC2 firewall for the other replica nodes. Do this by going to the Security Groups section of the EC2 dashboard. Click on the security group you’re using and add a custom line TCP from port 27017, with /32 as the IP address for each node. (where xxx.xxx.xxx.xxx is the instances IP address). Each node of the replica set needs to be able to access every other node of the replica set. Best way to do this is use the same security group for all of them and add all IP addresses to the allowed list.

When you have the instance basically set, go back into the AWS control panel, right click the instance and choose Create Image. You can start up any number of these for the replica set, but you need to change the /etc/hostname and /etc/hosts file to reflect the individual IP address and hostname of the bot (db1, db2, db3, etc.)

From here on the instructions in MongoDB Replica Set Configuration docs are valid. You don’t need to specify the replSet name on the command line since we already set it in the config file. mongoDB should be already running, but you can restart it with /etc/init.d/mongodb restart if you change any configuration parameters.

As a continuation of my previous post on how to run cherrypy as an SSL server as HTTPS (port 443), this tutorial show how to run a single cherrypy instance on multiple ports for both HTTP (port 80) and HTTPS (port 443)

We need to do a few things differently than in most examples out there like how to set configs when not using the quickstart() function and creating multiple Server() objects. But after reading through the source code a little it becomes clear.

import cherrypy

class RootServer:
    @cherrypy.expose
    def index(self, **keywords):
        return "it works!"

if __name__ == '__main__':
    site_config = {
        '/static': {
            'tools.staticdir.on': True,
            'tools.staticdir.dir': "/home/ubuntu/my_website/static"
        },
        '/support': {
            'tools.staticfile.on': True,
            'tools.staticfile.filename': "/home/ubuntu/my_website/templates/support.html"
        }
    }
    cherrypy.tree.mount(RootServer())

    cherrypy.server.unsubscribe()

    server1 = cherrypy._cpserver.Server()
    server1.socket_port=443
    server1._socket_host='0.0.0.0'
    server1.thread_pool=30
    server1.ssl_module = 'pyopenssl'
    server1.ssl_certificate = '/home/ubuntu/my_cert.crt'
    server1.ssl_private_key = '/home/ubuntu/my_cert.key'
    server1.ssl_certificate_chain = '/home/ubuntu/gd_bundle.crt'
    server1.subscribe()

    server2 = cherrypy._cpserver.Server()
    server2.socket_port=80
    server2._socket_host="0.0.0.0"
    server2.thread_pool=30
    server2.subscribe()

    cherrypy.engine.start()
    cherrypy.engine.block()

It took me more time than it should have to piece together the right bits of current information for using SSL with cherrypy. Here’s a fully working example of cherrypy 3.2.0 serving up HTTPS requests.

Quick notes – if you haven’t tried cherrypy, do it. It’s awesome in its simplicity. Also, I got my SSL cert from godaddy, which was the cheapest I found. This particular cert uses a certificate chain, so when all is said and done we have my_cert.crt, my_cert.key, and gd_bundle.crt.

ssl_server.py:

import cherrypy

class RootServer:
    @cherrypy.expose
    def index(self, **keywords):
        return "it works!"

if __name__ == '__main__':
    server_config={
        'server.socket_host': '0.0.0.0',
        'server.socket_port':443,

        'server.ssl_module':'pyopenssl',
        'server.ssl_certificate':'/home/ubuntu/my_cert.crt',
        'server.ssl_private_key':'/home/ubuntu/my_cert.key',
        'server.ssl_certificate_chain':'/home/ubuntu/gd_bundle.crt'
    }

    cherrypy.config.update(server_config)
    cherrypy.quickstart(RootServer())

Launch the server like:

sudo python ssl_server.py

You need to use sudo because it runs on port 443. You should be asked to “Enter PEM pass phrase” that you set when generating your key.

Update: In a follow-up post I show how you run an HTTPS server (port 443) and an HTTP server (port 80) at the same time.

Mick Ebeling recently gave a TED talk about the homemade eye-tracking device he and a bunch of hackers made to allow a paralyzed man to communicate, stephen hawking style. They did this with an off-the-shelf PS3 camera and some open source software for $50. That’s what I call a righteous hack. Most importantly it has real-world significance. And it’s totally something I or many people I know could have done if I had thought of the idea.

I think a lot of hackers are hungry for this kind of meaningful work. We need a repository of project ideas like the Eyewriter – immediate needs that have a tangible social affect and can be done in a weekend or two. Organize the ideas by skills required and offer the platform for organizing groups of hackers to tackle the problem. There are a lot of developers out there looking for a side project and a way to have an impact. And we also need idea people. Social workers, NGO’s, and every day people to tell us how technology could solve the problems they see in the field.

Some resources:
Random Hacks of Kindness
Code For America
Public Equals Online
Applications for Good

For goodness sake, hack!

Test driven development is great as long as you have proper tests. The problem is that it’s very hard to predict enough edge cases to cover the field of possible scenarios. Code coverage analysis will help developers make sure all code blocks are executed, but it doesn’t do anything to ensure an application correctly handles the variations in data, user interaction, failure scenarios, or how it behaves under different stress conditions.

The fact that tests are helpful, but never complete is something most developers are already conscious of. The danger is that better tests make worse developers! It’s very easy to lean too heavily on passing tests, wildly changing code until the light goes green without spending enough time thinking through the application’s logic.

I’m basically saying that, psychologically speaking, passing tests gives us a false sense of security. They can be a distraction from carefully crafted and thought through code. That’s why I advocate writing tests only for the purposes of regression testing. It should be a follow-up step, not an integral part of initial development.

The democratic process hasn’t changed much. The public elects a representative, representatives make decisions. The average citizen is essentially limited to three tools for affecting change. First, we elect the politician who best convinces us they’ll make the decisions we would want to make. Second, we can send correspondence to our representatives hoping to influence their decisions. Third we can hold protests and demonstrations to broadcast our opinions.

But there can be a better way. Modern communication tools allow the public better access to government and can revolutionize the democratic process by voting directly on the issues. Using internet, text message, and phone voting systems citizens can be directly involved in the decision making process. The role of the representative is reduced to more of an organizer than a decision maker because constituents decide most of the issues themselves. This is democracy 2.0 not because of the invention of new tools, but because it changes the way people behave. I believe we’re headed in the direction of popular governing, but it’s not a perfect world.

Old school tools of Democracy 1.0

There are three essential tools. Ballot voting is infrequent and highly constrained. Decisions are simply yes/no or choose your favorite candidate. No second choices, no weighted scores.

Correspondence with representatives is a free-form way to express opinions and ideas, but it is usually ignored or not accurately counted because representatives simply can’t handle the volume. It lacks transparency and accountability in that only the representative knows the aggregate opinion. Sending correspondence requires knowledge of the system and time.

Protests show what a sample of the population thinks about an issue. To decision makers it roughly quantifies two things about an issue; proportion of the constituency and intensity of the opinion.

There’s a better way with Democracy 2.0

With more ubiquitous methods of communication we now have the tools for a more involved democracy. One where citizens can become more involved in the decision making process. The ease of arranging “micro-polls” through the web, text message, or telephone is such that we can weigh in on any range of issues. By making these polls open and transparent and frequent, we could have truly participatory democracy. Micro-polls can be used to develop policy decisions with rapid iteration harnessing the “wisdom of the crowds”. Determine weather any actions should be taken, then what kind of action until the constituency arrives at the most agreeable outcome.

Round 1:
Poll: Should the federal government adopt policies to reduce the number of illegal immigrants?

Round 2:
Poll: Which method of reducing the number of illegal immigrants do you most agree with?
– Forced deportation
– Keeping non-citizen status, but documenting and taxing
– Naturalizing to eventually be US Citizens

Round 3:
Poll: As non-citizen, documented workers the following services and taxes should be imposed:
– All normal social services and medicare except social security and impose all normal taxes except social security
– No social services and impose all normal taxes except social security

Public surveys on non-policy issues such as a yearly report card for representatives give feedback so they can do their job better without waiting for the next election cycle. Constituents rate various qualities on their representative on a scale from 1 to 10. Descriptive statistics would provide more valuable information such as standard deviation and a breakdown of ratings by demographics.

Crowd sourced idea generation could lead to novel solutions for hard problems. Anyone can submit ideas and the best ideas are voted up. Discussions revolve around the ideas to help them mature.

The web could be used as an official forum for discourse between a representatives and their constituency where everyone gets a voice. Think of it as a town hall meeting on a national scale. Voting and crowd filtering would make quality questions and comments raise to the top while trolls and inappropriate comments would be voted down and hidden by popular disapproval. The forum also gives the representative a venue for communicating their personal beliefs beliefs on the issues as a well informed, political professional. This is where the organizing and rallying happens.

By being more involved in decisions and having a high frequency of iteration, it strengthens the feedback loop that citizens feel through exercising their rights. Psychologically, many people would feel more connected to the democratic process and would hopefully be willing to contribute a greater amount of mental energy toward the running of their country through greater sense or ownership.

Not without it’s problems

No political system is perfect. While Democracy 2.0 could be a vast improvement, it doesn’t solve every problem of Democracy 1.0 and it also exposes new dangers. People are quick to point out that electronic voting is unsafe and error-prone. There are secure, open source options, but that is the subject of many other blog posts. Some countries like estonia already employ internet voting using the national ID card as authentication. Great care would need to be taken by the election commission to certify any system used.

The general public is often more susceptible to manipulation by persuasive individuals than an elected official would be. News pundits and politically motivated propaganda is no stranger to Americans today, but the effect could be even greater when the people have greater control over more decisions and it could result in a detrimental effect. Democracy depends on a well-informed public and that value is something we need to instill in our culture regardless.

Mob dynamics don’t always result in the best decision. There are numerous studies showing how the sense of moral responsibility declines in groups more than when acting individually. And sometimes the “most agreeable” compromise is worse than either extreme. We’ve all heard the maxim “A camel is a horse designed by committee.” This is where the role of an inspirational representative is so important in getting their constituents behind a cause and guiding the conversation toward an optimal outcome.

The idea of a representative has been necessary because people don’t have time or knowledge to wisely vote on every issue, so they instill their trust in their elected official. That problem is still valid, but there are ways of having it both ways. For example, if a constituent doesn’t vote, their vote could be allocated to the representative. Alternatively, those who care deeply about an issue will be represented and those who care less will not be heard. Voting is and always should be optional.

There is still a lot of thought and experimentation that needs to go into Democracy 2.0, but we can move forward in baby steps.

More Resources

http://usnowfilm.com
http://opengov.ideascale.com/a/dtd/2865-4049
http://www.nytimes.com/2009/09/12/world/americas/12iht-currents.html
http://en.wikipedia.org/wiki/Electronic_voting_in_Estonia
http://en.wikipedia.org/wiki/Direct_democracy#Electronic_direct_democracy

flashvars="hostname=cowbell.grooveshark.com&playlistID=42124145&style=metal&p=0" allowScriptAccess="always" wmode="window" />

While working on my forthcoming checkin.to project, I needed to use the MediaWiki API to get the summary paragraph of wikipedia articles pertaining to places. Checkin.to relies on the Yahoo Where On Earth Identifiers (woeid). Yahoo also conveniently offers a concordance API so from the woeid I get the Geonames ID and the Wikipedia page ID among other things. As far as I can tell, the MediaWiki API doesn’t allow you to request page content using the page ID so the first step here is to resolve the page id into a unique page title. This can be done using the query action like so:

http://en.wikipedia.org/w/api.php?action=query&pageids=49728&format=json

It gives a response resembling:

{"query":{"pages":{"49728":{"pageid":49728,"ns":0,"title":"San Francisco"}}}}

Step 2 is to get the actual page content. There are a variety of formats available including the raw wiki markup, but for my purpose the formatted HTML is much more useful. We also need to convert the spaces in the page title to underscores. The request looks like this:

http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=San_Francisco&format=json

And a response resembling:

{"parse":{"text":{"*":"<div class=\"dablink\">This article is about the place in California. [...] "}}}

Step 3 is to parse the resulting article html and extract just the first body paragraph which typically summarizes the whole article. The problem here is that a bunch of other stuff including all the sidebar content comes before the first body paragraph and that other stuff itself can include p tags. jQuery is a big help here, as usual. First, lets wrap the entire resulting wiki page in a div element to give everything a root. Then we can first just the simplings of that wrapper element to find the first root level p tag.

wikipage = $("<div>"+data.parse.text['*']+"<div>").children('p:first');

Below I have the entire resulting function that goes from page id to summary paragraph and appends it to a <div> somewhere in my DOM called #wiki_container. I also perform some optional cleanup including removing citations, updating the relative hrefs to absolute hrefs pointing to http://en.wikipedia.org, and adding a read more link.

function getAreaMetaInfo_Wikipedia(page_id) {
  $.ajax({
    url: 'http://en.wikipedia.org/w/api.php',
    data: {
      action:'query',
      pageids:page_id,
      format:'json'
    },
    dataType:'jsonp',
    success: function(data) {
      title = data.query.pages[page_id].title.replace(' ','_');
      $.ajax({
        url: 'http://en.wikipedia.org/w/api.php',
        data: {
          action:'parse',
          prop:'text',
          page:title,
          format:'json'
        },
        dataType:'jsonp',
        success: function(data) {
          wikipage = $("<div>"+data.parse.text['*']+"</div>").children('p:first');
          wikipage.find('sup').remove();
          wikipage.find('a').each(function() {
            $(this)
              .attr('href', 'http://en.wikipedia.org'+$(this).attr('href'))
              .attr('target','wikipedia');
          });
          $("#wiki_container").append(wikipage);
          $("#wiki_container").append("<a href='http://en.wikipedia.org/wiki/"+title+"' target='wikipedia'>Read more on Wikipedia</a>");
        }
      });
    }
  });
}