top of page

Count the Number of Char, Word and Lines of Text File Using MRJOB | Map Reduce Python Example

An Example: Let’s use MRJOB in Python to run a simple map-reduce algorithm. This program is to count the numbers of chars, words, and lines in a text document.


First, randomly select some text content and save them into a text file. Here, I copied the definition of MapReduce in wiki (https://en.wikipedia.org/wiki/MapReduce) and saved it into ‘MapReduce_wiki.txt’.



Then, define the mapper and reducer functions with MRJOB to count the numbers of chars, words, and lines of ‘MapReduce_wiki.txt’.




Code (Installation_instruction_examples.py):
# -*- coding: utf-8 -*-
"""
From: 
https://mrjob.readthedocs.io/en/latest/guides/quickstart.html#writing-your-first
-job
Description: This is a simple example to count the numbers of chars, words, and lines.
"""
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
	def mapper(self, _, line):
		yield "chars", len(line) # count num of characters
		yield "words", len(line.split()) # count num of words
		yield "lines", 1 # count num of lines – with each line, add 1
	def reducer(self, key, values):
		yield key, sum(values)
if __name__ == '__main__':
	MRWordFrequencyCount.run() # main program to call/run MRWordFrequencyCount

Finally, run this job.

The running commands are:


Option 1: Testing locally on your computer: ‘python Installation_instruction_example2.py MapReduce_wiki.txt >output_instruction_example2.txt’ (open command prompt/interpreter (cmd.exe) and change the current working directory/folder to the one in which your python document ‘Installation_instruction_examples.py’ and input file ‘MapReduce_wiki.txt’ are stored. For example, my python document and input file are stored in ‘E:\My work(laptop)\Comp6210\Python’)


Option 2 - Running on VirtualBox: ‘sudo python3.5 Installation_instruction_example2.py MapReduce_wiki.txt >output_instruction_example2.txt


(Note that, in VirtualBox, the default version of Python is 2.6.6; It is too old to support MRJOB. So you need to install a new version Python. I installed Python 3.5 in VirtualBox to support MRJOB)


First, start VirtualBox, copy the python document ‘Installation_instruction_examples.py’ and input file ‘MapReduce_wiki.txt’ into VirtualBox. Click ‘Machine’ and choose ‘File Manager’.



Enter the user name and password: cloudera. Then click ‘Create Session’ button


Right click and select ‘Open in Terminal’.


Enter the following commands in the terminal one by one to install Python 3.5.


python --version

wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz

tar -xvzf Python-3.5.2.tgz

cd Python-3.5.2

./configure --prefix=/usr/local

sudo make altinstall

Python3.5 --version




Install mrjob by using pip3.5 in Python3.5, the command is ‘sudo pip3.5 install mrjob

Ok, now we have installed Python3.5 in VirtualBox. Finally, run the job in VirtualBox. Change the current working directory to the desktop folder where we store ‘Installation_instruction_examples.py’ and input file ‘MapReduce_wiki.txt’. The command is ‘cd /home/cloudera/Desktop’.


Finally, enter the execution command:

‘sudo python3.5 Installation_instruction_example2.py MapReduce_wiki.txt >output_instruction_example2.txt’


Output: Open the output file ‘output_instruction_example2.txt’, we can see the results as follow.



If you want to learn more about MRJOB in Python, visit https://mrjob.readthedocs.io/en/latest/


Run map-reduce jobs on the Hadoop environment


You need to install Java and Hadoop.

The main steps are:

(1) Install Java

(2) Download Hadoop binaries

(3) Set up environment variables

(4) Configure Hadoop cluster

(5) Format name node

(6) Start Hadoop services


For Windows operation systems, there is one installation instructions:


(1) https://kontext.tech/column/hadoop/377/latest-hadoop-321-installation-on-windows-10-step -by-step-guide


After installation, you can run the example2 on Hadoop environment. The command is


‘python Installation_instruction_example2.py -r hadoop MapReduce_wiki.txt >output_instruction_example2.txt'



Contact Us to get help with reasonable price/Send your assignment requirement details at:


Realcode4you@gmail.com


bottom of page