In this project, we have performed Natural-Language-Processing and Ensemble learning on the Covid-19 documents & summaries and found hidden insights about it. Also, implemented text-summarization techniques and extracted information about features such as sentence_score, cue_phase_score, sentence_position, sentence_length, heading_score, upper_letters, digits, pronoun_words. Along with this, calculated the TF-IDF (“Term Frequency — Inverse Document Frequency”) score, which signifies the importance of the word in the document and corpus. The repository consists of a Text-summarization web-app in Streamlit, deployed at Heroku.
Learned about Natural Language Processing techniques, feature-extraction, and conversion of Unsupervised data to Supervised data.