You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some rules in Dutch that currently make JWordSplitter less fit for Dutch. Most difficult is filtering the detected compounds.
autoonderdeel is not acceptable, even though auto and onderdeel are both valid parts; when a vowel that consists of two letters is split, this is unaccepetable. (e.g a-a a-e a-i a-u ij Aa and more) A regexp-like filter could prevent this, and also prevent other boundary mistakes.
Second issue is that joining parts can be s - and s-, but not every word does allow all of those. This could be solved by adding the part with their ~s , ~s- m ~- , but not all of those are allowed at the end of the compound. Some are allowed at the start, some in the middle, some at the end, some everywhere. F
Even then, checking could be more strict by having flags (postags?) for the parts, and filtering of valid orders of tags could be applied. But since there are exceptions, one could also add an exception list.
The text was updated successfully, but these errors were encountered:
There are some rules in Dutch that currently make JWordSplitter less fit for Dutch. Most difficult is filtering the detected compounds.
autoonderdeel is not acceptable, even though auto and onderdeel are both valid parts; when a vowel that consists of two letters is split, this is unaccepetable. (e.g a-a a-e a-i a-u ij Aa and more) A regexp-like filter could prevent this, and also prevent other boundary mistakes.
Second issue is that joining parts can be s - and s-, but not every word does allow all of those. This could be solved by adding the part with their ~s , ~s- m ~- , but not all of those are allowed at the end of the compound. Some are allowed at the start, some in the middle, some at the end, some everywhere. F
Even then, checking could be more strict by having flags (postags?) for the parts, and filtering of valid orders of tags could be applied. But since there are exceptions, one could also add an exception list.
The text was updated successfully, but these errors were encountered: