This tutorial is intended for beginners who aren’t familiar with EC2 yet, but are generally familiar with mongoDB. EC2 is actually pretty easy, but a lot of the basic info you need to get started is interspersed across numerous websites and articles. This post hopefully puts all the necessary details in one place.

The first thing to understand is that every EC2 instance runs an AMI (Amazon Machine Image) which is basically a bundle of one or more EBS (Elastic Block Storage) snapshots. The physical machine that your instance is hosted on has build in hard drive space, but it isn’t persistent. When you shut down or reboot the server whatever is on that disk will be wiped. Amazon already has a database of community AMI’s including basic Ubuntu installs. We can use one of these, then install the necessary packages, update configs, etc. and save the configured snapshot as our own AMI. Problem is, when you search the community AMI’s for ‘ubuntu’ you get some 500 results, so which one do we pick? http://alestic.com is a good resource for things related to EC2 and Ubuntu and they have a list of ‘official’ AMIs from Canonical. I’m basing my EC2 instance in amazon’s us-east1 data center so the AMI identifier for Ubuntu 11.04 EBS 64bit is ami-1aad5273. If your EC2 instances are located somewhere else, you’ll need the corresponding AMI identifier for that data center, which can be found on alestic.com

To start off, you can follow the EC2 getting started guide, except instead of the Basic Linux AMI you can use the Ubuntu AMI that I mentioned above. There’s also no need to terminate the instance at the end since we’ll just roll right into customizing this instance for MongoDB.

I like to start but getting any system updates that have come out since the AMI was created:

sudo apt-get update
sudo apt-get upgrade

I also like to install the linux tools dstat and htop to monitor system performance.

After following Amazon’s Getting Started Guide you should have a blank Ubuntu box and be SSH’ed into it. The linux root partition is usually an EBS volume and I like to make a second EBS volume that I can mount for just the mongodb database directory. This way I can detach the database volume and move it to another running instance. So go into the AWS Management Console and click on Volumes on the left. Create a new volume that has ample space for your database. You can’t resize these things so leave room to grow. After you create the EBS volume you need to attach it to your EC2 instance and define a mount point. I usually use /dev/sde.

Next, let’s log into the EC2 instance by ssh. We need to format the new volume, mount it, and add it to /etc/fstab so it auto-mounts when we restart. (note: on Ubuntu Natty 11.04 the drive ends up appearing as /dev/xvde, but on older systems and other flavors of linux it might still be /dev/sde)

sudo mkfs -t ext4 /dev/xvde

I’m going to mount my new volume at /db

sudo mkdir /db
sudo vim /etc/fstab

add the following line to the bottom of your /etc/fstab

/dev/xvde        /db     auto    noatime,noexec,nodiratime 0 0

We can either restart to auto-mount it or we can manually mount it now using

sudo mount /dev/xvde /db

Now lets install mongodb. Here are the official docs.

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo vim /etc/apt/sources.list
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
sudo apt-get update
sudo apt-get install mongodb-10gen

sudo mkdir /db/mongodb
sudo chown mongodb:mongodb /db/mongodb

Now lets edit /etc/mongodb.conf and change the location of the database. Near the top change dbpath so it looks like this:

dbpath=/db/mongodb

I also like to change my oplogSize to something larger than the default so if a secondary instance is down I have longer to bring it back up before it becomes too stale to re-sync. I also recommend turning on journaling to prevent data corruption.

oplogSize = 10000
replSet = myReplicaSet
journal = true

If you’re using a hostname in the replica set configuration instead of the IP address, you need to configure that in /etc/hostname and /etc/hosts

/etc/hostname:

db1

/etc/hosts:

127.0.0.1     db1   db1.mydomain.com    localhost.localdomain    localhost
xxx.xxx.xxx.xxx    db1   db1.mydomain.com

(where xxx.xxx.xxx.xxx is this machine’s IP address that you use in the replica set config. Usually the elastic IP.)

After changing hostname information you’ll need to restart the instance for it to take affect.

You need to add a hole in the EC2 firewall for the other replica nodes. Do this by going to the Security Groups section of the EC2 dashboard. Click on the security group you’re using and add a custom line TCP from port 27017, with /32 as the IP address for each node. (where xxx.xxx.xxx.xxx is the instances IP address). Each node of the replica set needs to be able to access every other node of the replica set. Best way to do this is use the same security group for all of them and add all IP addresses to the allowed list.

When you have the instance basically set, go back into the AWS control panel, right click the instance and choose Create Image. You can start up any number of these for the replica set, but you need to change the /etc/hostname and /etc/hosts file to reflect the individual IP address and hostname of the bot (db1, db2, db3, etc.)

From here on the instructions in MongoDB Replica Set Configuration docs are valid. You don’t need to specify the replSet name on the command line since we already set it in the config file. mongoDB should be already running, but you can restart it with /etc/init.d/mongodb restart if you change any configuration parameters.

  • http://blog.matthewbrunelle.com/ ciferkey

    Excellent article.  I didn’t need to set up a replica set at this point yet, but this was perfect getting getting and will be a big help later on.

  • Donald Kainama

    (where xxx.xxx.xxx.xxx is this machine’s IP address)?
    Is this the Private IP Address or a Elastic Ip Address

    • http://www.zacwitte.com Zac Witte

      This is the elastic IP, what you’ll be putting in your replica set config. I’ll edit the text to clarify. Thanks for the comment.

  • Zed

    Zac – I tried implementing this on an AWS EC2 and followed your instructions closely (changing the name of the disk, of course, as it would apply to my ami).
    The problem I had was that the root directory is mounted on a relatively small 8.5G disk. I mounted the larger disk (about 450G) onto /db, per your instructions and started filling up that Mongo DB nicely for a while… until the root directory got “full” (and now I can’t do anything on the server except watch the Mongo DB get bigger).
    What’s happening, in your opinion, is it that the root (/) is mounted onto a smaller disk than the /db directory? Or is it that I should configure Mongo differently as to have it not overwhelm the smaller disk somehow? Your insight is greatly appreciated…

    • http://www.zacwitte.com Zac Witte

      Hey Zed, first make sure that the bigger disk you mounted in /db is correctly mounted. An easy way to do that is by running the command “df -h” which will give you the list of all devices mounted in your filesystem, the size of them, and what percent is used. Second, make sure that you changed the location of your database in /etc/mongodb.conf so that it uses /db instead of the default location. If the mongo database is, in fact, using the larger volume mounted at /db, then there might be something else filling up your root directory. You can play around with the command “du -sh /” in various directories to see how much stuff is in there.

  • Dharshan

    MongoDirector (www.mongodirector.com) is a great hosting solution for MongoDB on Amazon EC2. It completely automates the entire process of deploying and managing Mongo replica sets and shards using a simple two step wizard. You can pick the number of replicas and shards and the regions in which you want to place them. Provisioned IOPS and RAID can be used for optimal performance. Automatic backups can also be configured. LVM snapshots are used for backup – so backups take the same amount of time irrespective of the size of data.