This is the second lab, writing and running a basic hadoop program
This is the basic wordcount program as discussed in the lecture.
Vagrantfile above creates an blank ubuntu trusty 64 vm and installs mrjob (python mapreduce library)
Install MRJob library Run the mrjob.py program
https://pythonhosted.org/mrjob/guides/quickstart.html
Alter the example program to produce a wordcount
hadoop classpath will give you the requisite libraries if you are compiling
from the command line.
Follow the instructions here to make and run the jar
- Run this on Elastic Mapreduce (You'll need API keys from your AWS Console)