From 1dcaf78271dd611591442eab3a7e22c5fc15cdab Mon Sep 17 00:00:00 2001 From: Brian Ward Date: Mon, 5 Aug 2024 12:32:31 -0400 Subject: [PATCH] Doc clarifications --- src/reference-manual/statements.qmd | 28 +++++++++++++++---------- src/reference-manual/user-functions.qmd | 12 +++++++---- src/stan-users-guide/user-functions.qmd | 19 ++++++++++++----- 3 files changed, 39 insertions(+), 20 deletions(-) diff --git a/src/reference-manual/statements.qmd b/src/reference-manual/statements.qmd index 39b1fdf7a..b44ad928a 100644 --- a/src/reference-manual/statements.qmd +++ b/src/reference-manual/statements.qmd @@ -15,7 +15,7 @@ are used --- if they do not, the behavior is undefined. The basis of Stan's execution is the evaluation of a log probability function (specifically, a log probability density function) for a given set of (real-valued) parameters. Log probability functions can be -constructed by using distribution statements and log probability increment +constructed by using distribution statements and log probability increment statements. Statements may be grouped into sequences and into for-each loops. In addition, Stan allows local variables to be declared in blocks and also allows an empty @@ -248,7 +248,8 @@ of the posterior up to an additive constant. Data and transformed data are fixed before the log density is evaluated. The total log probability is initialized to zero. Next, any log Jacobian adjustments accrued by the variable constraints are added to the log -density (the Jacobian adjustment may be skipped for optimization). +density (the Jacobian adjustment may be skipped for maximum likelihood estimation +via optimization). Distribution statements and log probability increment statements may add to the log density in the model block. A log probability increment statement directly increments the log density with the value of an expression as @@ -370,9 +371,9 @@ or in functions ending with `_jacobian` to mimic the log Jacobian adjustments accrued by built-in variable transforms. Similarly to those implemented for the built-in transforms, these Jacobian adjustment -may be skipped for optimization. +may be skipped for maximum likelihood estimation via optimization. -For example, here is a program which re-creates the existing +For example, here is a program which recreates the existing [`` transform](transforms.qmd#upper-bounded-scalar) on real numbers: ```stan @@ -391,16 +392,21 @@ parameters { transformed parameters { real b = upper_bound_jacobian(b_raw, ub); } +model { + // use b as if it was declared `real b;` in parameters + // e.g. + // b ~ lognormal(0, 1); +} ``` ### Accessing the log density {-} -To access accumulated log density up to the current execution point, +To access the accumulated log density up to the current execution point, the function `target()` may be used. ## Sampling statements {#sampling-statements.section} -The term "sampling statement" has been replaced with +The term "sampling statement" has been replaced with [distribution statement](#distribution-statements.section). ## Distribution statements {#distribution-statements.section} @@ -464,7 +470,7 @@ terms in the model block. Equivalently, each $\sim$ statement corresponds to a multiplicative factor in the unnormalized posterior density. -Distribution statements (`~`) accept only built-in or user-defined +Distribution statements (`~`) accept only built-in or user-defined distributions on the right side. The left side of a distribution statement may be data, parameter, or a complex expression, but the evaluated type needs to @@ -484,8 +490,8 @@ target += normal_lpdf(sigma | 0, 1); ``` Stan models can mix distribution statements and log probability -increment statements. Although statistical models -are usually defined with distributions in the literature, +increment statements. Although statistical models +are usually defined with distributions in the literature, there are several scenarios in which we may want to code the log likelihood or parts of it directly, for example, due to computational efficiency (e.g. censored data model) or coding language limitations @@ -517,7 +523,7 @@ target += dist_lpmf(y | theta1, ..., thetaN); This will be well formed if and only if `dist_lpdf(y | theta1, ..., thetaN)` or `dist_lpmf(y | theta1, ..., thetaN)` is a -well-formed expression of type `real`. User defined distributions +well-formed expression of type `real`. User defined distributions can be defined in functions block by using function names ending with `_lpdf`. @@ -913,7 +919,7 @@ The equivalent code for a vectorized truncation depends on which of the variables are non-scalars (arrays, vectors, etc.): 1. If the variate `y` is the only non-scalar, the result is the same as - described in the above sections, but the `lcdf`/`lccdf` calculation is + described in the above sections, but the `lcdf`/`lccdf` calculation is multiplied by `size(y)`. 2. If the other arguments to the distribution are non-scalars, then the diff --git a/src/reference-manual/user-functions.qmd b/src/reference-manual/user-functions.qmd index 324f9b8f6..c581d95f7 100644 --- a/src/reference-manual/user-functions.qmd +++ b/src/reference-manual/user-functions.qmd @@ -95,7 +95,7 @@ arguments to produce an expression, which has a value when executed. ### Functions as statements {-} Functions with void return types may be applied to arguments and used -as [statements.qmd](statements). +as [statements.qmd](statements). These act like distribution statements or print statements. Such uses are only appropriate for functions that act through side effects, such as incrementing the log probability @@ -161,7 +161,11 @@ used in place of parameterized distributions on the right side of Functions of certain types are restricted on scope of usage. Functions whose names end in `_lp` assume access to the log probability accumulator and are only available in the transformed -parameter and model blocks. +parameters and model blocks. + +Functions whose name end in `_jacobian` assume access to the log +probability accumulator may only be used within the transformed parameters +block. Functions whose names end in `_rng` assume access to the random number generator and may only be used @@ -293,8 +297,8 @@ a function elsewhere results in a compile-time error. ### Log probability access in functions {-} -Functions that include -[statements.qmd#distribution-statements.section](distribution statements) or +Functions that include +[statements.qmd#distribution-statements.section](distribution statements) or [statements.qmd#increment-log-prob.section](log probability increment statements) must have a name that ends in `_lp`. Attempts to use distribution statements or increment log probability diff --git a/src/stan-users-guide/user-functions.qmd b/src/stan-users-guide/user-functions.qmd index eb71a35cc..554ffa77b 100644 --- a/src/stan-users-guide/user-functions.qmd +++ b/src/stan-users-guide/user-functions.qmd @@ -293,15 +293,20 @@ Functions whose names end in `_jacobian` can use the `jacobian +=` statement. This can be used to implement a custom change of variables for arbitrary parameters. -For example, here is a program which re-creates the built-in +For example, this function recreates the built-in `` transform on real numbers: +```stan +real upper_bound_jacobian(real x, real ub) { + jacobian += x; + return ub - exp(x); +} +``` + +It can be used as a replacement for `real` as follows: ```stan functions { - real upper_bound_jacobian(real x, real ub) { - jacobian += x; - return ub - exp(x); - } + // upper_bound_jacobian as above } data { real ub; @@ -312,6 +317,10 @@ parameters { transformed parameters { real b = upper_bound_jacobian(b_raw, ub); } +model { + b ~ lognormal(0, 1); + // ... +} ``` ## Functions acting as random number generators