MapReduce

This exercise helped me in understanding MapReduce paradigm. As I designed and implemented MapReduce algorithms for a variety of common data processing tasks.

It has 5 different Python files, with each performing its own task.

size_count.py : A python program, that implements a mapReduce algorithm to count the words of each size (large, medium, small, tiny) in a document.
list_friends.py : Given a simple social network dataset consisting of a set of key-value pairs (person, friend) representing a friend relationship between two people. This python program implements a MapReduce algorithm to produce a complete list of friends for each person.
pairs_of_friends.py : A python program implements a MapReduce algorithm to identify symmetric friendships in the input data. The program will output pairs of friends where personA is a friend of personB and personB is also a friend of personA. If the friendship is asymmetric (only one person in the pair considers the other person a friend), do not emit any output for that pair.
mutual_friends.py : A python program implements a MapReduce algorithm to identify mutual friends for a pair of friends. The program will output a list of friends that personA and personB are both friends with.
relational_join.py : Implement a MapReduce algorithm to join the two relations based on the Movie ID/Rated Movie ID value. A file with two relations that represent user information names and user movie ratings, is provided.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MapReduce

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MapReduce