Compound aware automatic spelling correction
SymSpellCompound supports compound aware automatic spelling correction of multi-word input strings.
It is built on top of SymSpell's 1 million times faster spelling correction algorithm.
1. Compound splitting & decompounding
SymSpell assumed every input string as single term. SymSpellCompound supports compound splitting / decompounding with three cases:
- mistakenly inserted space within a correct word led to two incorrect terms
- mistakenly omitted space between two correct words led to one incorrect combined term
- multiple input terms with/without spelling errors
Splitting errors, concatenation errors, substitution errors, transposition errors, deletion errors and insertion errors can by mixed within the same word.
2. Automatic spelling correction
- Large document collections make manual correction infeasible and require unsupervised, fully-automatic spelling correction.
- In conventional spelling correction of a single token, the user is presented with spelling correction suggestions.
For automatic spelling correction of long multi-word text the the algorithm itself has to make an educated choice.
Examples:
- whereis th elove hehad dated forImuch of thepast who couqdn'tread in sixthgrade and ins pired him
+ where is the love he had dated for much of the past who couldn't read in sixth grade and inspired him (9 edits)
- in te dhird qarter oflast jear he hadlearned ofca sekretplan y iran
+ in the third quarter of last year he had learned of a secret plan by iran (10 edits)
- the bigjest playrs in te strogsommer film slatew ith plety of funn
+ the biggest players in the strong summer film slate with plenty of fun (9 edits)
- Can yu readthis messa ge despite thehorible sppelingmsitakes
+ can you read this message despite the horrible spelling mistakes (9 edits)
Performance
0.2 milliseconds / word
5000 words / second (single core on 2012 Macbook Pro)
Applications
- Query correction (10–15% of queries contain misspelled terms),
- Chatbots,
- OCR post-processing,
- Automated proofreading.