sanitize_word can fail on special characters #2

svisser · 2014-05-27T00:11:45Z

I'm using Python 2.6.

If I run crosswords ?sunción the program doesn't display asuncion but it fails with a UnicodeDecodeError. This happens because the default encoding ASCII can't perform the .decode() operation:

>>> 'Asunción'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128)

It may be worth passing the encoding explicitly to decode to avoid relying on the default encoding. It may also be a good idea to inform the user though I'm not sure where it's best to catch this error (in compile_pattern and let that function return None instead?).

The text was updated successfully, but these errors were encountered:

bfontaine · 2014-05-27T12:38:15Z

Right, when I wrote the script some years ago I used non-accented words on the CLI to avoid that, but it should be fixed. We also have a lot of accented words in French but accents are not needed in crosswords.

svisser mentioned this issue May 27, 2014

Special characters in word lists are not ignored #4

Open

bfontaine added the bug label May 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanitize_word can fail on special characters #2

sanitize_word can fail on special characters #2

svisser commented May 27, 2014

bfontaine commented May 27, 2014

sanitize_word can fail on special characters #2

sanitize_word can fail on special characters #2

Comments

svisser commented May 27, 2014

bfontaine commented May 27, 2014