You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have written about this issue already in various places of the openxlsx issue tracker and probably in other places and this issue is just to raise awareness and not something to solve or something I'm going to work on.
The reason why formulas are not evaluated is, because there is nothing to evaluate them.
"But Excel shows me number if I open a file with formulas?" "Yes."
"And for some formulas wb_to_df() also shows values?" "Yes again."
Why is this the case?
The XML file structure of a formula written by us looks like this:
<c ...>
<f>SUM(A1:B1)</f>
</c>
Here we have a cell with a simple formula: SUM(A1:B1). If A1 is 1 and B1 is 1 the formula will evaluate 2. In your spreadsheet software you can see the value 2. If you save the file to disk and load it again, the XML structure of the cell has changed. Now it looks like this:
<c ...>
<f>SUM(A1:B1)</f>
<v>2</v>
</c>
It contains a value field now and this value is 2. Now if we read the value from openxlsx2 we also get a 2. But only because this value is now written into the cell. It is "cached" into the cell. And this might actually become an issue.
Image if you replace the cell A1 with a value of -1 (wb_add_data(dims = "A1", x = -1). Your cell still prints 2if readwb_to_df(), and the formula is obviously not evaluated, because it should show a value of 0`.
But why is the value not updated?
That is because, computers programs - in this case your spreadsheet software - need a way to evaluate the formula in the cell with actual data in the spreadsheet. The software needs a way to calculate the formula and it needs a reason to evaluate this formula. The implementation of the formula evaluation is somewhat simple with basic formulas like A1 + A2 and SUM(A1:A2), but it will get tricky if you ever try to implement something like nested VLOOKUP() / CHOOSE() functions, functions where R and and your spreadsheet software produce different results and last but not least, someone has to program all/many/most/at least a few of the functions and their quirks into a function we can evaluate in R. The first part is, to actually provide functions that reproduce what the complementary spreadsheet functions do. Since we cannot use a spreadsheet software, we would have to program them ourselves. And I do not want to do this and I do not want to be the one who does this and if you want me to be the one, lets talk about my salary ..., but good news!
You could be the one!
It is not really hard to begin with. First, most likely you do not want EVERY function (this could take a while, because there are a lot), but instead focus on only a few. Second, begin with something easy and not with an overly complex 300 lines of code containing nested INDEX() functions. Instead let us have a look at the basics, like ABS(), SUM() or AVERAGE() and maybe begin with something like this. The idea is get the sheet, reference and operators from the formula and evaluate everything in some R formula just like base R functions.
Draft code
Here I use spreadsheet formulas using + and SUM() and my code matches them with Rs + and sum(). Similar I could use MIN()/MAX() and other functions that have identical names. For AVERAGE() I might need something like average <- function(...) mean(...) and other more complex functions might require actual custom written functions:
library(openxlsx2)
dat<-data.frame(
num1=1,
num2=2,
fml1="A2 + B2",
fml2="SUM(A2, B2)",
fml3="'Sheet 1'!A2 + B2",
fml4="FST + B2"
)
class(dat$fml1) <-"formula"
class(dat$fml2) <-"formula"
class(dat$fml3) <-"formula"
class(dat$fml4) <-"formula"wb<- wb_workbook()$add_worksheet()$add_data(x=dat)
wb$add_named_region(dims="A2", name="FST")
# wb$open()dat<-data.frame(
x=0:10,
y=-5:5
)
dat_fml<-data.frame(
x="SUM(A2:A12)",
y="SUM(B2:B12)"
)
class(dat_fml$x) <-"formula"
class(dat_fml$y) <-"formula"wb$add_worksheet()$add_data(x=dat)$add_data(x=dat_fml, dims="A14")
wb_eval_excel_fml<-function(wb, sheet, dims) {
# example function that works with a tiny subset of formulas# if you use this function in production you are braver or# desparater than you look.wb<-wb$clone()
openxlsx2:::assert_workbook(wb)
sheetid<-wb$validate_sheet(sheet)
fmls<- as.character(wb$to_df(dims=dims, sheet=sheetid, show_formula=TRUE, col_names=FALSE))
message("Input formula is: ", fmls)
tkns<-tidyxl::xlex(fmls)
sel<-tkns$type=="ref"vars<-tkns$token[sel]
rnd<-openxlsx2:::random_string(n= length(vars), pattern="[a-z]")
sheets<-wb$get_sheet_names()[sheetid]
sheets<- rep(sheets, length(vars))
if (any(sel2<-tkns$type=="sheet")) {
shts<-tkns$token[sel2]
sel3<- which(tkns$type=="ref") %in% ( which(tkns$type=="sheet") +1)
sheets[sel3] <-stringi::stri_extract_first_regex(shts, "([^']+)")
# remove this from our formulatkns<- as.data.frame(tkns)
tkns<-tkns[!sel2, ]
sel<-tkns$type=="ref"
}
fml_env<- new.env()
for (iin seq_along(vars)) {
num<- wb_to_df(wb, dims=vars[i], sheet=sheets[i], col_names=FALSE)
assign(rnd[i], num, fml_env)
tkns$token[sel][i] <-rnd[i]
message("Var ", vars[i], " is: ", num)
}
lwr_fmls<- parse(text= tolower(paste0(tkns$token, collapse="")))
res<- as.numeric(eval(lwr_fmls, envir=fml_env))
message("Result: ", res)
sel<-wb$worksheets[[sheetid]]$sheet_data$cc$r==dimswb$worksheets[[sheetid]]$sheet_data$cc$v[sel] <-reswb
}
# wb$open()
wb_to_df(wb)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 NA NA NA NAwb<- wb_eval_excel_fml(wb, sheet=1, dims="C2")
#> Input formula is: A2 + B2#> Var A2 is: 1#> Var B2 is: 2#> Result: 3wb<- wb_eval_excel_fml(wb, sheet=1, dims="D2")
#> Input formula is: SUM(A2, B2)#> Var A2 is: 1#> Var B2 is: 2#> Result: 3wb<- wb_eval_excel_fml(wb, sheet=1, dims="E2")
#> Input formula is: 'Sheet 1'!A2 + B2#> Var A2 is: 1#> Var B2 is: 2#> Result: 3# # one of many cases that does not work# wb <- wb_eval_excel_fml(wb, sheet = 1, dims = "F2")wb<- wb_eval_excel_fml(wb, sheet=2, dims="A15")
#> Input formula is: SUM(A2:A12)#> Var A2:A12 is: c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)#> Result: 55wb<- wb_eval_excel_fml(wb, sheet=2, dims="B15")
#> Input formula is: SUM(B2:B12)#> Var B2:B12 is: c(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)#> Result: 0
wb_to_df(wb, show_formula=TRUE)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 A2 + B2 SUM(A2, B2) 'Sheet 1'!A2 + B2 FST + B2
wb_to_df(wb)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 3 3 3 NA
wb_to_df(wb, sheet=2, show_formula=TRUE) |> tail()
#> x y#> 10 8 3#> 11 9 4#> 12 10 5#> 13 <NA> <NA>#> 14 x y#> 15 SUM(A2:A12) SUM(B2:B12)
wb_to_df(wb, sheet=2) |> tail()
#> x y#> 10 8 3#> 11 9 4#> 12 10 5#> 13 <NA> <NA>#> 14 x y#> 15 55 0
The text was updated successfully, but these errors were encountered:
I have written about this issue already in various places of the
openxlsx
issue tracker and probably in other places and this issue is just to raise awareness and not something to solve or something I'm going to work on.The reason why formulas are not evaluated is, because there is nothing to evaluate them.
wb_to_df()
also shows values?" "Yes again."Why is this the case?
The XML file structure of a formula written by us looks like this:
Here we have a cell with a simple formula:
SUM(A1:B1)
. If A1 is1
and B1 is1
the formula will evaluate 2. In your spreadsheet software you can see the value2
. If you save the file to disk and load it again, the XML structure of the cell has changed. Now it looks like this:It contains a value field now and this value is
2
. Now if we read the value fromopenxlsx2
we also get a2
. But only because this value is now written into the cell. It is "cached" into the cell. And this might actually become an issue.Image if you replace the cell A1 with a value of
-1
(wb_add_data(dims = "A1", x = -1). Your cell still prints
2if read
wb_to_df(), and the formula is obviously not evaluated, because it should show a value of
0`.But why is the value not updated?
That is because, computers programs - in this case your spreadsheet software - need a way to evaluate the formula in the cell with actual data in the spreadsheet. The software needs a way to calculate the formula and it needs a reason to evaluate this formula. The implementation of the formula evaluation is somewhat simple with basic formulas like
A1 + A2
andSUM(A1:A2)
, but it will get tricky if you ever try to implement something like nestedVLOOKUP()
/CHOOSE()
functions, functions where R and and your spreadsheet software produce different results and last but not least, someone has to program all/many/most/at least a few of the functions and their quirks into a function we can evaluate in R. The first part is, to actually provide functions that reproduce what the complementary spreadsheet functions do. Since we cannot use a spreadsheet software, we would have to program them ourselves. And I do not want to do this and I do not want to be the one who does this and if you want me to be the one, lets talk about my salary ..., but good news!You could be the one!
It is not really hard to begin with. First, most likely you do not want EVERY function (this could take a while, because there are a lot), but instead focus on only a few. Second, begin with something easy and not with an overly complex 300 lines of code containing nested INDEX() functions. Instead let us have a look at the basics, like
ABS()
,SUM()
orAVERAGE()
and maybe begin with something like this. The idea is get the sheet, reference and operators from the formula and evaluate everything in some R formula just like base R functions.Draft code
Here I use spreadsheet formulas using
+
andSUM()
and my code matches them with Rs+
andsum()
. Similar I could useMIN()
/MAX()
and other functions that have identical names. ForAVERAGE()
I might need something likeaverage <- function(...) mean(...)
and other more complex functions might require actual custom written functions:The text was updated successfully, but these errors were encountered: