improve SMART parameters standardization #708

DominiqueMakowski · 2019-08-20T06:07:05Z

imo this is a critical issue to be able to retrieve 'refit' standardized parameters with a posthoc method.

improvements are possible with especially for the case of interaction, but a more robust and systematic testing framework might be needed. also, a knowledge of model matrices when factors are involved appears as key.

strengejacke · 2019-09-22T20:59:22Z

❓

I have no idea what you're talking about 😆 I could look at the code, but am too lazy... Can you elaborate a bit more, maybe with an example?

DominiqueMakowski · 2019-09-23T00:41:02Z

Long story short they are still some cases where the SMART method does not perform well, in comparison to refit and also to "classic". This appears mainly for interaction terms, and for interaction between a continuous and a factor variable.

I have no idea how to improve on that, and I think that to address this we would need someone with a deep understanding on how model matrices are built for interactions terms, and how we can standardize in it the interaction term so that it reflects the interaction term of two standardized variables...

That's more a longterm issue tho, if by any chance we meet someone with some understanding of the model matrices and formulas...

DominiqueMakowski · 2019-09-23T00:43:04Z

In a nutshell, the goal here is to reconstruct the standardized model.matrix (the 2nd one below) with the original model matrix and the Mean/SD of each variable...

df <- iris
dfZ <- parameters::standardize(iris)

head(model.matrix(~ df$Sepal.Length * df$Species))
#>   (Intercept) df$Sepal.Length df$Speciesversicolor df$Speciesvirginica
#> 1           1             5.1                    0                   0
#> 2           1             4.9                    0                   0
#> 3           1             4.7                    0                   0
#> 4           1             4.6                    0                   0
#> 5           1             5.0                    0                   0
#> 6           1             5.4                    0                   0
#>   df$Sepal.Length:df$Speciesversicolor df$Sepal.Length:df$Speciesvirginica
#> 1                                    0                                   0
#> 2                                    0                                   0
#> 3                                    0                                   0
#> 4                                    0                                   0
#> 5                                    0                                   0
#> 6                                    0                                   0
head(model.matrix(~ dfZ$Sepal.Length * dfZ$Species))
#>   (Intercept) dfZ$Sepal.Length dfZ$Speciesversicolor dfZ$Speciesvirginica
#> 1           1       -0.8976739                     0                    0
#> 2           1       -1.1392005                     0                    0
#> 3           1       -1.3807271                     0                    0
#> 4           1       -1.5014904                     0                    0
#> 5           1       -1.0184372                     0                    0
#> 6           1       -0.5353840                     0                    0
#>   dfZ$Sepal.Length:dfZ$Speciesversicolor
#> 1                                      0
#> 2                                      0
#> 3                                      0
#> 4                                      0
#> 5                                      0
#> 6                                      0
#>   dfZ$Sepal.Length:dfZ$Speciesvirginica
#> 1                                     0
#> 2                                     0
#> 3                                     0
#> 4                                     0
#> 5                                     0
#> 6                                     0

^{Created on 2019-09-23 by the reprex package (v0.3.0)}

…nction) #97

DominiqueMakowski · 2019-10-08T03:35:23Z

I believe one of the reasons for the issues with interactions stems out of the fact that as we know, a regression model "fixes" the other parameters at 0. This corresponds to the mean, when a standardized dataset is passed.

Let's say we have the interaction between x * y. The coefficient corresponding to "x" is the coefficient of x when y = 0. This coefficient changes, as y changes (following the interaction coefficient). Now, if a standardized dataset is passed, it is normal that the effect of the "x" parameter is different, as it corresponds to the effect of "x" at the mean of "y" (which might not be the case of unstandardized data).

Hence, my hint is that the posthoc standardization should somehow take the mean of the variables into account in the case of interactions. I have no idea how, though.

To facilitate the exploration, I've refacted parameters_standardize and created the standardize_info function, that returns values useful for parameters standardization, such as deviations of response and variables.

model <- lm(Sepal.Width ~ Petal.Width * Sepal.Length, data = iris)
info <- parameters::standardize_info(model)
info$Refit <- parameters::parameters_standardize(model, method = "refit")[, 2]
info$Raw <- insight::get_parameters(model)[, 2]
info[sapply(info, is.numeric)] <- sapply(info[sapply(info, is.numeric)], round, digits = 1)
info
#>                  Parameter        Type Factor Deviation_Response
#> 1              (Intercept)   intercept   <NA>                0.4
#> 2              Petal.Width     numeric  FALSE                0.4
#> 3             Sepal.Length     numeric  FALSE                0.4
#> 4 Petal.Width:Sepal.Length interaction  FALSE                0.4
#>   Mean_Response Deviation_Classic Mean_Classic Deviation_Smart Mean_Smart
#> 1           3.1               0.0          0.0             0.0        0.0
#> 2           3.1               0.8          1.2             0.8        1.2
#> 3           3.1               0.8          5.8             0.8        5.8
#> 4           3.1               5.3          7.5             0.8        5.8
#>   Refit  Raw
#> 1  -0.2  3.4
#> 2  -0.7 -1.5
#> 3   0.4  0.0
#> 4   0.3  0.2

^{Created on 2019-10-08 by the reprex package (v0.3.0)}

In a nutshell, the problem is to try to FIND the "Refit" column from the "Raw" column using the remaining information...

DominiqueMakowski · 2019-10-08T03:39:49Z

At the same time, it suggests that the issue with interaction are not real issues, it's just that the estimates correspond to something different, but they are not wrong per se (I think)

DominiqueMakowski · 2019-10-08T04:08:17Z

I reckon that's the reason for partial standardized coefficients (https://www.jstor.org/stable/2684719), but we need VIF for that

mattansb · 2020-09-28T08:28:24Z

@DominiqueMakowski Can you explain what exactly the "smart" method is trying to do? It seems to break somewhat when there are formula-transformations (log(y) ~ sqrt(x)) and I'd like to try and fix that, but I don't know what I'm actually trying to achieve 😅 (Currently, methods "refit" and "basic" are the most stable, but that is only because I know what they're suppose to return...)

(I cases with transformations it will never be equal to method "refit" because the parameters themselves are estimated differently)

(As you mention above, this is also the issue with interactions - the centering changes the simple/conditional slope parameters, so it also cannot be the same as with "refit", no?)

So broadly asking, what would method "smart" do to this model:

log(y) ~ sqrt(x) * some_factor

Also, what exactly is the conceptual difference between methods "smart" and "posthoc"?

DominiqueMakowski · 2020-09-28T08:57:20Z

TLDR;

So broadly asking, what would method "smart" do to this model:

No idea

Longer story:

so in simpler models basic and refit are equivalent. But for more complex models (especially with interactions, transformations etc) basic starts to depart from refit. IMO, refit gives the "gold standard" results, because they don't involve any posthoc transformation. However, the problem is that "refit" is computationally heavy (Bayes). So the goal of "smart" is to be a posthoc method (that does not refit the model from scratch but simply transforms the parameters with the info it has), but that gives the same results as "refit".

So in simple cases, there should be no difference between the 3 methods basic, refit and smart. In more complex cases, smart aims at giving the same results as "refit".

Basically "smart" is supposed to be "refit" minus the model refitting.

So to get back to your initial question of what the result should be in that particular case, the expected result should be the same as the one given by "refit"...

About how it works, basically standardize() doesn't change factors, which are still booleans in the model matrix. So in order to adjust the parameters to mimic the "refit" method, for example sone must know if a given parameters refer to a continuous (in which case the parameter must be scaled by both the sd of the outcome and the predictor) or factor (only the outcome). I reckon there might be other cases where something particular should be made (reverse transformation when transformation are specified?)

DominiqueMakowski · 2020-09-28T09:00:33Z

but we can move slowly here, it doesn't need to be perfect from scratch, smart can always be equivalent to "basic" when we don't know how to retrieve the "refit"-like coefs. Basically it's like "basic+" method, i.e., it's fast, and will work in most cases, and in some cases it will need you the straightforward "basic" standardization

DominiqueMakowski changed the title ~~improve SMART parameters standardization (ex 'full')~~ improve SMART parameters standardization Sep 23, 2019

DominiqueMakowski referenced this issue Oct 8, 2019

simplified parameters_standardize (added standardize_info() as new fu…

e388e71

…nction) #97

strengejacke transferred this issue from easystats/parameters Oct 11, 2019

mattansb assigned DominiqueMakowski Apr 22, 2020

mattansb added the enhancement 🔥 label Jul 27, 2020

mattansb self-assigned this Sep 28, 2020

mattansb mentioned this issue Sep 28, 2020

Tag "posthoc" / "smart" as *experimental* easystats/effectsize#139

Closed

3 tasks

mattansb removed their assignment Sep 28, 2020

mattansb mentioned this issue Sep 28, 2020

improve POSTHOC parameters standardization easystats/effectsize#140

Closed

DominiqueMakowski mentioned this issue Jun 16, 2021

(Adjusted) Cohen's d for contrasts in linear models easystats/effectsize#351

Open

mattansb transferred this issue from easystats/effectsize May 3, 2022

strengejacke added Enhancement 💥 Implemented features can be improved or revised and removed enhancement 🔥 labels May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve SMART parameters standardization #708

improve SMART parameters standardization #708

DominiqueMakowski commented Aug 20, 2019

strengejacke commented Sep 22, 2019

DominiqueMakowski commented Sep 23, 2019

DominiqueMakowski commented Sep 23, 2019

DominiqueMakowski commented Oct 8, 2019

DominiqueMakowski commented Oct 8, 2019

DominiqueMakowski commented Oct 8, 2019

mattansb commented Sep 28, 2020

DominiqueMakowski commented Sep 28, 2020

DominiqueMakowski commented Sep 28, 2020

improve SMART parameters standardization #708

improve SMART parameters standardization #708

Comments

DominiqueMakowski commented Aug 20, 2019

strengejacke commented Sep 22, 2019

DominiqueMakowski commented Sep 23, 2019

DominiqueMakowski commented Sep 23, 2019

DominiqueMakowski commented Oct 8, 2019

DominiqueMakowski commented Oct 8, 2019

DominiqueMakowski commented Oct 8, 2019

mattansb commented Sep 28, 2020

DominiqueMakowski commented Sep 28, 2020

DominiqueMakowski commented Sep 28, 2020