Skip to content

Latest commit

 

History

History
10 lines (10 loc) · 1.53 KB

File metadata and controls

10 lines (10 loc) · 1.53 KB

MapReduce

This exercise helped me in understanding MapReduce paradigm. As I designed and implemented MapReduce algorithms for a variety of common data processing tasks.

It has 5 different Python files, with each performing its own task.

  1. size_count.py : A python program, that implements a mapReduce algorithm to count the words of each size (large, medium, small, tiny) in a document.
  2. list_friends.py : Given a simple social network dataset consisting of a set of key-value pairs (person, friend) representing a friend relationship between two people. This python program implements a MapReduce algorithm to produce a complete list of friends for each person.
  3. pairs_of_friends.py : A python program implements a MapReduce algorithm to identify symmetric friendships in the input data. The program will output pairs of friends where personA is a friend of personB and personB is also a friend of personA. If the friendship is asymmetric (only one person in the pair considers the other person a friend), do not emit any output for that pair.
  4. mutual_friends.py : A python program implements a MapReduce algorithm to identify mutual friends for a pair of friends. The program will output a list of friends that personA and personB are both friends with.
  5. relational_join.py : Implement a MapReduce algorithm to join the two relations based on the Movie ID/Rated Movie ID value. A file with two relations that represent user information names and user movie ratings, is provided.