Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with encoding #107

Closed
Rengervn opened this issue Jun 15, 2023 · 7 comments
Closed

Problems with encoding #107

Rengervn opened this issue Jun 15, 2023 · 7 comments

Comments

@Rengervn
Copy link

Hi

I have the following code (I attached the rds-file in zipped form). The code is more complex, but with this simple version, I can reproduce it on my computer. I was able to run this code without problems a few weeks ago and I assume that some update (Windows?) causes the problem, so I don't know if it is caused by a bug in readODS. Test contains some text in German and French.

library(tidyverse)
library(readODS)
test <- readRDS("legend.rds")
write_ods(test, "my.ods", sheet = "Legend", update = TRUE)
write_ods(test, "my.ods", sheet = "Legend", append = TRUE)

legend.zip

This gives me the following error:

Error in read_xml.character(contentfile) : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x72 0x69 0x6F [9]

Any idea what might be causing this?
I am using WIndows 10, R Version 4.04 and readODS 1.8.

CHeers
Renger

chainsawriot added a commit that referenced this issue Jun 15, 2023
@chainsawriot
Copy link
Collaborator

@Rengervn I added some test cases #108 in the Continuous Integration (which also includes Windows Server 2022 + the release version of R, i.e. 4.3.0) and can't reproduce the error.

Could you try this first?

test <- readRDS("legend.rds")
write_ods(test, "my.ods")
read_ods("my.ods")

chainsawriot added a commit that referenced this issue Jun 16, 2023
@Rengervn
Copy link
Author

Rengervn commented Jun 16, 2023

I tried and it gives me

 Error in read_xml.character(con, options = c("NOBLANKS", "HUGE")) : 
  Input is not proper UTF-8, indicate encoding !
Bytes: 0xE9 0x72 0x69 0x6F [9] 

It looks like the encoding of the data frame is a problem (it is "unknown"). I tried to change it to "UTF-8" but was not successfull up to now. I posted a question at stack overflow.

@chainsawriot
Copy link
Collaborator

Now #113 is fixed. We might need to widen the tests to cover Windows with R earlier than 4.2. cpp11 is "UTF-8 everywhere", but it's better to verify.

chainsawriot added a commit that referenced this issue Aug 19, 2023
* Checking windows 3.6

* But what is its encoding?

* GH check with Windows R 3.6

* Possibly fix std error for 3.6?

* Remove printing [no ci]

* Update NEWS

Github pleaes check this
@chainsawriot
Copy link
Collaborator

chainsawriot commented Aug 19, 2023

@Rengervn Sorry for taking this long. But this issue should have been fixed and it's now really routinely checked on CI with R 3.6 (Windows, the default Windows-1252). Having said so, it would be really great if you can test on your setup whether it is really the case. Please install the default branch.

remotes::install_github("ropensci/readODS")

@chainsawriot
Copy link
Collaborator

chainsawriot commented Aug 19, 2023

If it is working on Windows pre-4.2

readODS/README.Rmd

Lines 98 to 102 in 39df36a

### Text Encoding
In older versions of R (<4.2) on Windows, the default encoding for text is not UTF-8, and instead depends on your locale. This can cause problems processing characters that are not part of the character set R is using (usually [Windows-1252](https://en.wikipedia.org/wiki/Windows-1252)). Sheets written using these characters generally contains errors. The problem can be fixed by upgrading to a version of R >= 4.2.
**Radian:** Even for up-to-date versions of R, these issues with character encoding are still a known issue with Radian. Their suggested workaround is [here](https://github.com/randy3k/radian/issues/269#issuecomment-1169663251).

@Rengervn
Copy link
Author

Hi

It is now working on my machine (R 4.0.4).

Thanks!

Renger

`> library(tidyverse)
-- Attaching core tidyverse packages -------------------------------------- tidyverse 2.0.0 --
v dplyr     1.1.1     v readr     2.1.4
v forcats   1.0.0     v stringr   1.5.0
v ggplot2   3.4.1     v tibble    3.2.1
v lubridate 1.9.2     v tidyr     1.3.0
v purrr     1.0.1     
-- Conflicts -------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
> library(readODS)
Warning message:
package ‘readODS’ was built under R version 4.2.1 
> packageVersion("readODS")
[1] ‘2.0.2’
> test <- readRDS("legend.rds")
> write_ods(test, "my.ods", sheet = "Legend")
> read_ods("my.ods")
# A tibble: 4 x 5
  Description EN                                                             DE    FR    IT   
  <chr>       <chr>                                                          <chr> <chr> <chr>
1 PERIOD      The period is in the format YYYY where YYYY is the year, e.g.~ Die ~ La p~ Il p~
2 SECTOR      Institutional sector                                           Inst~ Sect~ Sett~
3 COFOG       Classification of the functions of government                  Klas~ Clas~ Clas~
4 UNIT_MEAS   Million Francs, at current prices and share (in %)             In M~ En m~ In m~
> write_ods(test, "my.ods", sheet = "Legend", update = TRUE)`

@chainsawriot
Copy link
Collaborator

@Rengervn Wonderful! Merci!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants