Setting up Amazon Web Services


The following guide describes how you should set up your system to do homework 01 and the following homeworks. The guide was tested on Ubuntu 12.10 system, but should work on any Linux systems including Linux machines in the PC cluster at the first floor of WVH, which all CCIS Grad students should have access to.

When working on AWS (Amazon Web Service), I strongly suggest you to use Linux or Mac machine and build your own script that automates the AWS accessing process. AWS web interface is fairly time consuming and I strongly suggest you to avoid using it as much as possible. The following guide will introduce you to some tricks, hacks, hints, and scripts I built to automate your programming & testing process.

Good Luck!

Sign up Amazon Web Service

  • requires phone number
  • Activate account from your email (immediate email)
  • Write down or Copy&Paste your ACCESS_KEY, SECRET_ACCESS_KEY somewhere from activation page

Set up AWS S3 (Simple Storage Service)

S3 is usually referred to as "Bucket". It is HDFS-like (Hadoop File System) data storage system. Maybe, AWS S3 is HDFS, but I haven't checked.

  • Goto AWS Management Console. Link to management console is located at the top right corner of AWS main page.
  • Click S3 from console. It will lead to this page.
  • Follow Getting started guide. Write down BUCKET_NAME somewhere. Create bucket name using lowercase letters, numbers, periods (.), and dashes (-) only. AWS does not allow you to run MapReduce job using bucket whose name contains characters other than previously mentioned characters.
  • Uploading, moving, deleting files using browser is tedious. Let's automate this. Download and setup s3cmd from s3tools. If you are using Ubuntu you can type "sudo apt-get install s3cmd" from your console. In case of other systems, download the source from and setup. FYI, s3cmd does not require installation.
  • Type "s3cmd —configure" from console. It asks for "Access Key" and "Secret Key", which are ACCESS_KEY and SECRET_ACCESS_KEY mentioned above. Follow default settings (Just press enter until 'Save settings?' question.) if you are not sure about other settings. At 'Save settings?', enter 'y' if you want to save settings, since default is not saving the setting. Settings are saved at "/home/$USER/.s3cfg".
  • Download file to somewhere.
  • Type "s3cmd sync [Path to Downloaded file] s3://[BUCKET_NAME]/helloworld.txt"
  • Check that your file has been properly uploaded using S3 Console. "s3cmd sync" can sync whole folder and subfolders and now you have an easy way to upload multiple files to your S3.

Setup Hadoop & EMR client

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License