How-to Set Up Ubuntu w/ MongoDB Replica Sets on Amazon EC2

This tutorial is intended for beginners who aren’t familiar with EC2 yet, but are generally familiar with mongoDB. EC2 is actually pretty easy, but a lot of the basic info you need to get started is interspersed across numerous websites and articles. This post hopefully puts all the necessary details in one place.

The first thing to understand is that every EC2 instance runs an AMI (Amazon Machine Image) which is basically a bundle of one or more EBS (Elastic Block Storage) snapshots. The physical machine that your instance is hosted on has build in hard drive space, but it isn’t persistent. When you shut down or reboot the server whatever is on that disk will be wiped. Amazon already has a database of community AMI’s including basic Ubuntu installs. We can use one of these, then install the necessary packages, update configs, etc. and save the configured snapshot as our own AMI. Problem is, when you search the community AMI’s for ‘ubuntu’ you get some 500 results, so which one do we pick? http://alestic.com is a good resource for things related to EC2 and Ubuntu and they have a list of ‘official’ AMIs from Canonical. I’m basing my EC2 instance in amazon’s us-east1 data center so the AMI identifier for Ubuntu 11.04 EBS 64bit is ami-1aad5273. If your EC2 instances are located somewhere else, you’ll need the corresponding AMI identifier for that data center, which can be found on alestic.com

To start off, you can follow the EC2 getting started guide, except instead of the Basic Linux AMI you can use the Ubuntu AMI that I mentioned above. There’s also no need to terminate the instance at the end since we’ll just roll right into customizing this instance for MongoDB.

I like to start but getting any system updates that have come out since the AMI was created:

sudo apt-get update
sudo apt-get upgrade

I also like to install the linux tools dstat and htop to monitor system performance.

After following Amazon’s Getting Started Guide you should have a blank Ubuntu box and be SSH’ed into it. The linux root partition is usually an EBS volume and I like to make a second EBS volume that I can mount for just the mongodb database directory. This way I can detach the database volume and move it to another running instance. So go into the AWS Management Console and click on Volumes on the left. Create a new volume that has ample space for your database. You can’t resize these things so leave room to grow. After you create the EBS volume you need to attach it to your EC2 instance and define a mount point. I usually use /dev/sde.

Next, let’s log into the EC2 instance by ssh. We need to format the new volume, mount it, and add it to /etc/fstab so it auto-mounts when we restart. (note: on Ubuntu Natty 11.04 the drive ends up appearing as /dev/xvde, but on older systems and other flavors of linux it might still be /dev/sde)

sudo mkfs -t ext4 /dev/xvde

I’m going to mount my new volume at /db

sudo mkdir /db
sudo vim /etc/fstab

add the following line to the bottom of your /etc/fstab

/dev/xvde        /db     auto    noatime,noexec,nodiratime 0 0

We can either restart to auto-mount it or we can manually mount it now using

sudo mount /dev/xvde /db

Now lets install mongodb. Here are the official docs.

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo vim /etc/apt/sources.list
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
sudo apt-get update
sudo apt-get install mongodb-10gen

sudo mkdir /db/mongodb
sudo chown mongodb:mongodb /db/mongodb

Now lets edit /etc/mongodb.conf and change the location of the database. Near the top change dbpath so it looks like this:

dbpath=/db/mongodb

I also like to change my oplogSize to something larger than the default so if a secondary instance is down I have longer to bring it back up before it becomes too stale to re-sync. I also recommend turning on journaling to prevent data corruption.

oplogSize = 10000
replSet = myReplicaSet
journal = true

If you’re using a hostname in the replica set configuration instead of the IP address, you need to configure that in /etc/hostname and /etc/hosts

/etc/hostname:

db1

/etc/hosts:

127.0.0.1     db1   db1.mydomain.com    localhost.localdomain    localhost
xxx.xxx.xxx.xxx    db1   db1.mydomain.com

(where xxx.xxx.xxx.xxx is this machine’s IP address that you use in the replica set config. Usually the elastic IP.)

After changing hostname information you’ll need to restart the instance for it to take affect.

You need to add a hole in the EC2 firewall for the other replica nodes. Do this by going to the Security Groups section of the EC2 dashboard. Click on the security group you’re using and add a custom line TCP from port 27017, with /32 as the IP address for each node. (where xxx.xxx.xxx.xxx is the instances IP address). Each node of the replica set needs to be able to access every other node of the replica set. Best way to do this is use the same security group for all of them and add all IP addresses to the allowed list.

When you have the instance basically set, go back into the AWS control panel, right click the instance and choose Create Image. You can start up any number of these for the replica set, but you need to change the /etc/hostname and /etc/hosts file to reflect the individual IP address and hostname of the bot (db1, db2, db3, etc.)

From here on the instructions in MongoDB Replica Set Configuration docs are valid. You don’t need to specify the replSet name on the command line since we already set it in the config file. mongoDB should be already running, but you can restart it with /etc/init.d/mongodb restart if you change any configuration parameters.