Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'charmap' codec can't encode characters in position 578-694: character maps to <undefined> #429

Open
ryntml opened this issue Oct 4, 2024 · 0 comments

Comments

@ryntml
Copy link

ryntml commented Oct 4, 2024

I am currently trying to do a training on Ottoman Turkish. This language consists of a mixture of the Arabic alphabet and the Persian alphabet. I created all the datasets, the moment I run train.py I get the following error:

Screenshot 2024-10-04 212521

A small example from labels.txt:

Screenshot 2024-10-04 213535

Even though I do UTF-8 encoding, I still get errors.

There is this problem with the characters:

This language, like Arabic, is written differently at the beginning, middle and end, and that's why I wrote all the characters.
For example, I added 3 spellings of the letter Noon.
Could this cause a problem? Does anyone know?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant