Word Count and Non-English Word Detection

Project Overview

This project analyzes text files to extract and count the occurrences of words, as well as identify non-English words using the SpellChecker library. The script processes the text data, providing insights into both English and non-English word usage.

Features

Word Extraction: Extracts all words from the specified text file.
Word Count: Counts the occurrences of each word.
Non-English Word Detection: Identifies and counts non-English words.
Sorted Output: Outputs results sorted by word frequency.

Getting Started

Prerequisites

Before running the script, ensure you have the following installed:

Python: Version 3.x
Required Libraries:
```
pip install pyspellchecker
```

File Preparation

Place your text files (e.g., file1.txt and file2.txt) in the project directory.

Running the Script

To run the script, use the following command in your terminal or command prompt:

python script.py

Function Descriptions

reading(f_path): Reads the content of the specified text file.
mapping(text): Splits the text into words and maps each word to a count of 1.
reducing(mapped_words): Counts the total occurrences of each word.
finding_non_english_words(text): Identifies non-English words using the SpellChecker.

Output

Upon execution, the script will print:

A list of all words with their respective counts.
A list of non-English words along with their counts, sorted in descending order.

Conclusion

This project serves as a foundational text analysis tool for counting word occurrences and identifying non-English words, facilitating deeper insights into text data for various applications.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
map_reduce.ipynb		map_reduce.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Count and Non-English Word Detection

Project Overview

Features

Getting Started

Prerequisites

File Preparation

Running the Script

Function Descriptions

Output

Conclusion

About

Releases

Packages

Languages

lokesh9899/Map-Reduce

Folders and files

Latest commit

History

Repository files navigation

Word Count and Non-English Word Detection

Project Overview

Features

Getting Started

Prerequisites

File Preparation

Running the Script

Function Descriptions

Output

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages