Ting-Wei Shen
[email protected]
Feb 7, 2019
This is a project to design a model to detect the sentence with spelling mistakes and create a corrected sentence. The data that I choose now will be twenty popular books from Project Gutenberg(small portion first). For analysis part, I will focus on the computational methods first, then explore more data that could improve my model performance. This idea comes from Currie32 Github.
I will use small portion data now to check the performance of the model. The data will be twenty popular books from Project Gutenberg. If the model works, then I will explore more suitable data.
I will eliminate the unnecessary information, just leave the core essay as the training data.
The overall data size will need to be determined.
Do you have an existing data source in mind that you can start with, and if so, what are the URLs or references?
-
For the end goal, I hope to design a model to detect the mistakes in the sentence, and correct them.
-
By understanding the method behind these model, we may increase the performance of spell checker.
-
We may use these model to determine the quality of different essay by different L2 learners, and correct the mistakes within the sentences.
Utilize spell checker algorithm to help me check spelling accuracy.
Are you planning to do any predictive analysis (machine learning, classification, etc.), and using what methods?
I am not sure using what methods now. I may try some pandas function to do data cleaning first. Then I can observe the data to figure out the method I can apply.
I may do data/analysis/presentation as 60-60-60 weight portion, and display the result by Jupyter notebook.