Skip to content

reads a text file, extracts all words, counts their occurrences, and identifies non-English words using SpellChecker. It processes the text through mapping and reducing steps to determine the frequency of both English and non-English words. Finally, it prints the word count for all words and highlights the non-English words

Notifications You must be signed in to change notification settings

lokesh9899/Map-Reduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Word Count and Non-English Word Detection

Project Overview

This project analyzes text files to extract and count the occurrences of words, as well as identify non-English words using the SpellChecker library. The script processes the text data, providing insights into both English and non-English word usage.

Features

  • Word Extraction: Extracts all words from the specified text file.
  • Word Count: Counts the occurrences of each word.
  • Non-English Word Detection: Identifies and counts non-English words.
  • Sorted Output: Outputs results sorted by word frequency.

Getting Started

Prerequisites

Before running the script, ensure you have the following installed:

  • Python: Version 3.x
  • Required Libraries:
    pip install pyspellchecker

File Preparation

  1. Place your text files (e.g., file1.txt and file2.txt) in the project directory.

Running the Script

To run the script, use the following command in your terminal or command prompt:

python script.py

Function Descriptions

  • reading(f_path): Reads the content of the specified text file.
  • mapping(text): Splits the text into words and maps each word to a count of 1.
  • reducing(mapped_words): Counts the total occurrences of each word.
  • finding_non_english_words(text): Identifies non-English words using the SpellChecker.

Output

Upon execution, the script will print:

  • A list of all words with their respective counts.
  • A list of non-English words along with their counts, sorted in descending order.

Conclusion

This project serves as a foundational text analysis tool for counting word occurrences and identifying non-English words, facilitating deeper insights into text data for various applications.

About

reads a text file, extracts all words, counts their occurrences, and identifies non-English words using SpellChecker. It processes the text through mapping and reducing steps to determine the frequency of both English and non-English words. Finally, it prints the word count for all words and highlights the non-English words

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published