Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text_short() etc. and current API thoughts? #153

Open
bwiernik opened this issue Apr 9, 2021 · 13 comments
Open

text_short() etc. and current API thoughts? #153

bwiernik opened this issue Apr 9, 2021 · 13 comments
Labels
docs 📚 Something to be adressed in docs and/or vignettes

Comments

@bwiernik
Copy link
Contributor

bwiernik commented Apr 9, 2021

It's not exactly clear to me what the current status of text_short() etc. are. They still appear in the pkgdown site, but are commented out in various places in the source. Has that API been abandoned?

One API thought I had looking at the various functions is that format() might be a good generic to use here because it is so extensible. Table/text/numbers only, as well as choosing specific parameters or quantities could controlled with arguments. Not saying to discard other existing approaches, but might be useful in lieu of an ever expanding list of specific functions?

@DominiqueMakowski
Copy link
Member

DominiqueMakowski commented Apr 9, 2021

text_short was rather deprecated in favour of the summary() generic report() %>% summary(). Where you would see format fitting? In lieu of summary?

One nice API in theory would be to have report() and then pipe it to format(interpet_p = FALSE, display_intercept = TRUE, show_this = TRUE) that would format the report according to specifications, but that seems impossible as there's hardly any modularity possible at the report level, currently the workaround for the summary() is to have both versions (long and short) prepared upfront by report() and then summary() just extracts the short version but doesn't do anything in itself.

Basically, one of the issue with report (and why it had several entire rewrites, see the discussion around the last one here) is the balance between modularity and specificity. In some cases, it would be easier to just have a report() function for each model that does everything specifically for it, but it's also hard to test and a leads to lot of semi-redundant and non-optimized code. But full-modularity with pieces that can be manipulated afterwards seems also impossible.

So currently we split the reports into several submodules (report_parameters, report_intercept, report_performance, etc.) [that we also export] which seem to make sense for a lot of models and share a lot of commonalities, but once the main report() assembles them that's it it's hard to preserve the piece-wise information. In fact, going back to the issue, each of this piece also makes the long and "summarized" version, which is arguably not the most efficient way of doing it (even though the pieces are relatively lightweight since there are just some text strings or tables) but I didn't find any better way of doing it.

One possibility would be to drop the summarized version altogether, but I realized (early on) when using proto-versions of reports that in my own work for instance if I have several similar models, having all the full info for all of them made the files super verbose and not digest with a lot of redundant info, so it was nice to have like one time the full model and then just the critical information (although I agree that what is "critical" is highly debatable)

in any case I forgot what I wanted to say

@bwiernik
Copy link
Contributor Author

bwiernik commented Apr 9, 2021

The main things I was considering were (1) interpret_p or similar arguments to exclude p values and/or their interpretation and (2) a format to report only formatted numeric values without accompanying text (ala #86 and #108).

@bwiernik
Copy link
Contributor Author

bwiernik commented Apr 9, 2021

text_short was rather deprecated in favour of the summary() generic report() %>% summary().

The pkgdown site still has these functions documented: https://easystats.github.io/report/reference/text_long.html

@IndrajeetPatil

This comment has been minimized.

@IndrajeetPatil IndrajeetPatil added the docs 📚 Something to be adressed in docs and/or vignettes label Apr 9, 2021
@bwiernik
Copy link
Contributor Author

I'm wondering what the purpose of storing static text_full and text values is, versus keeping the components as a list until print/format time.

So, for example:

format_report_coefficient <- function(rr, digits = 3, down_with_nhst = FALSE) {
	if (down_with_nhst) {
		rr$interpret_significance <- NA
		rr$p_value <- NA
	}
	with(
		rr,
		glue::glue(
			"The effect of {term} was",
			if (!is.na(interpret_significance)) " {interpret_significance}" else "",
			if (all(!is.na(interpret_significance), !is.na(interpret_es))) " and" else "",
			if (!is.na(interpret_es)) " {interpret_es}" else "",
			" (β = {format(es, digits = digits)}",
			if (!is.na(ci)) ", [{ci}% CI {format(ci_low, digits = digits)}, {format(ci_high, digits = digits)}]" else "",
			if (!is.na(p_value)) ", p = {format(p_value, digits = digits)}" else "",
			")",
			"."
		)
	)
}

rr <- list(term = "predictor", interpret_significance = "statistically significant", interpret_es = "large", es = .90, ci = 99, ci_low = .85, ci_high = .95, p_value = .001)

format_report_coefficient(rr)
# The effect of predictor was statistically significant and large (ß = 0.9, [99% CI 0.85, 0.95], p = 0.001).

format_report_coefficient(rr, down_with_nhst = TRUE)
# The effect of predictor was large (ß = 0.9, [99% CI 0.85, 0.95]).

Here, the idea is that report() generates a list of components and then this function is what would glue the pieces together according to the specified parameters. This enables modularity for things like omitting significance or p values, reporting just formatted numeric results without textual interpretations, or even adjustment of things like number of digits, CI width, etc. at print time.

(Using glue() here is mostly to make things easier to write, but it also is what permits easy use of digit formatting, etc.)

@bwiernik
Copy link
Contributor Author

Other formatting bits could include things like localization (e.g., US English "we fit a model" vs UK English "we fitted a model", perhaps eventually non-English translation).

@DominiqueMakowski
Copy link
Member

That seems like a good idea, we could start experimenting/implementing such increased modularity little-by-little, starting in places where it's easy and "stable"?

though we might want to keep not having "glue" as a dependency (or we add it, after all, report does some heavy gluing throughout... @strengejacke ?).

Also, I don't think we'll be able to have ß symbols, since CRAN only allows ASCII symbols in the package (hence that discussion #97 and my proposal to spell out the symbol as beta as a replacement).

@strengejacke
Copy link
Member

though we might want to keep not having "glue" as a dependency (or we add it, after all, report does some heavy gluing throughout...

Although glue has no further dependencies, I don't see the benefit, i.e. if using glue outweigh potential issues. I always have the example of "withr" and "rstanarm" in mind, which totally broke the package. Unless we really need specific functions, I'd say we stick to base R functions.

@strengejacke
Copy link
Member

have ß symbols

Yes, I think so, too. I'm not sure if expression() or bquote() help, or if these are only working for graphics?

@bwiernik
Copy link
Contributor Author

The way to use the beta symbol is to Unicode escape it. "\u03B2"

@DominiqueMakowski
Copy link
Member

I'm worried that it will introduce a lot of \u03B2 = 42 when formatting is not done. Isn't easier to keep, at its core, beta = 42 and then eventually replace all of the beta = by ``\u03B2 =` during postprocessing/formatting/displaying?

@bwiernik
Copy link
Contributor Author

If the printing environment can’t support unicode (such as R GUI on Windows with a US code page), it prints fallback characters, which for Greek letters are English characters (b, etc). There are some tests we could run to check the environment and unicode support, but they probably aren’t necessary.

If report is going to be used in, for example, latex or markdown report generation, it really needs to be able to include correct character formatting.

@DominiqueMakowski
Copy link
Member

cool! Indeed we should test it a bit first but otherwise it seems like a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 📚 Something to be adressed in docs and/or vignettes
Projects
None yet
Development

No branches or pull requests

4 participants