top of page

Map Reduce Programming Help | Map-Reduce Example in Python | Realcode4you

Word

Count Suppose we have a text file consisting of multiple lines and we wish to find the count of each word appearing in that file. We will use the MapReduce framework to do that, as follows:

  • First, randomly select some text content and save them into a text file. Here, I copied the definition of MapReduce in wiki (https://en.wikipedia.org/wiki/MapReduce) and saved it into ‘MapReduce_wiki.txt’.


  • Then, define the mapper function to split each input line to a list of words, and output (word, 1) for each word found;

  • Next, define the reducer function to sum over the number of 1s for each word, and outputthe count of each word.



from mrjob.job import MRJob
import re
WORD_REGEX = re.compile(r"[\w]+")
class WordCount(MRJob):
	def mapper(self, _, line):
		for word in WORD_REGEX.findall(line):
			yield word.lower(), 1
	def reducer(self, word, counts):
		yield word, sum(counts)
if __name__ == "__main__":
	WordCount.run()

  • Finally, testing on your computer: ‘python WordCount.py

MapReduce_wiki.txt >output_wordcount.txt’ (open command prompt/interpreter (cmd.exe) and change the current working directory/folder to the one in which your python document ‘WordCount.py’ and input file ‘MapReduce_wiki.txt’ are stored. For example, my python document and input file are stored in 'D:\COMP6210\Example').






















Big data technologies became the popular in the recent time due to large demand of data analysis firm. Their are many other big data technologies rather than map reduce like; PySpark, Hive and Hadoop and more others.


We are group of experienced Big data experts and professionals that will help you to do your all big data related project with an reasonable price. For more details you can contact us or send your project requirement details at below mail id:


realcode4you@gmail.com
bottom of page