From 1dcaf78271dd611591442eab3a7e22c5fc15cdab Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Mon, 5 Aug 2024 12:32:31 -0400
Subject: [PATCH] Doc clarifications

---
 src/reference-manual/statements.qmd     | 28 +++++++++++++++----------
 src/reference-manual/user-functions.qmd | 12 +++++++----
 src/stan-users-guide/user-functions.qmd | 19 ++++++++++++-----
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/src/reference-manual/statements.qmd b/src/reference-manual/statements.qmd
index 39b1fdf7a..b44ad928a 100644
--- a/src/reference-manual/statements.qmd
+++ b/src/reference-manual/statements.qmd
@@ -15,7 +15,7 @@ are used --- if they do not, the behavior is undefined.
 The basis of Stan's execution is the evaluation of a log probability
 function (specifically, a log probability density function) for a given
 set of (real-valued) parameters. Log probability functions can be
-constructed by using distribution statements and log probability increment 
+constructed by using distribution statements and log probability increment
 statements.  Statements may be grouped
 into sequences and into for-each loops.  In addition, Stan allows
 local variables to be declared in blocks and also allows an empty
@@ -248,7 +248,8 @@ of the posterior up to an additive constant.  Data and transformed
 data are fixed before the log density is evaluated.  The total log
 probability is initialized to zero.  Next, any log Jacobian
 adjustments accrued by the variable constraints are added to the log
-density (the Jacobian adjustment may be skipped for optimization).
+density (the Jacobian adjustment may be skipped for maximum likelihood estimation
+via optimization).
 Distribution statements and log probability increment statements may add to the log
 density in the model block.  A log probability increment statement
 directly increments the log density with the value of an expression as
@@ -370,9 +371,9 @@ or in functions ending with `_jacobian` to mimic the log Jacobian
 adjustments accrued by built-in variable transforms.
 
 Similarly to those implemented for the built-in transforms, these Jacobian adjustment
-may be skipped for optimization.
+may be skipped for maximum likelihood estimation via optimization.
 
-For example, here is a program which re-creates the existing
+For example, here is a program which recreates the existing
 [`<upper=x>` transform](transforms.qmd#upper-bounded-scalar) on real numbers:
 
 ```stan
@@ -391,16 +392,21 @@ parameters {
 transformed parameters {
   real b = upper_bound_jacobian(b_raw, ub);
 }
+model {
+  // use b as if it was declared `real<upper=ub> b;` in parameters
+  // e.g.
+  // b ~ lognormal(0, 1);
+}
 ```
 
 ### Accessing the log density {-}
 
-To access accumulated log density up to the current execution point,
+To access the accumulated log density up to the current execution point,
 the function `target()` may be used.
 
 ## Sampling statements {#sampling-statements.section}
 
-The term "sampling statement" has been replaced with 
+The term "sampling statement" has been replaced with
 [distribution statement](#distribution-statements.section).
 
 ## Distribution statements {#distribution-statements.section}
@@ -464,7 +470,7 @@ terms in the model block.  Equivalently, each $\sim$ statement
 corresponds to a multiplicative factor in the unnormalized posterior
 density.
 
-Distribution statements (`~`) accept only built-in or user-defined 
+Distribution statements (`~`) accept only built-in or user-defined
 distributions on the
 right side. The left side of a distribution statement may be data,
 parameter, or a complex expression, but the evaluated type needs to
@@ -484,8 +490,8 @@ target += normal_lpdf(sigma | 0, 1);
 ```
 
 Stan models can mix distribution statements and log probability
-increment statements. Although statistical models 
-are usually defined with distributions in the literature, 
+increment statements. Although statistical models
+are usually defined with distributions in the literature,
 there are several scenarios in which we may want to code the log
 likelihood or parts of it directly, for example, due to computational
 efficiency (e.g. censored data model) or coding language limitations
@@ -517,7 +523,7 @@ target += dist_lpmf(y | theta1, ..., thetaN);
 
 This will be well formed if and only if `dist_lpdf(y | theta1,
   ..., thetaN)` or `dist_lpmf(y | theta1, ..., thetaN)` is a
-well-formed expression of type `real`. User defined distributions 
+well-formed expression of type `real`. User defined distributions
 can be defined in functions block by using function names ending with `_lpdf`.
 
 
@@ -913,7 +919,7 @@ The equivalent code for a vectorized truncation depends on which of the
 variables are non-scalars (arrays, vectors, etc.):
 
 1. If the variate `y` is the only non-scalar, the result is the same as
-   described in the above sections, but the `lcdf`/`lccdf` calculation is 
+   described in the above sections, but the `lcdf`/`lccdf` calculation is
    multiplied by `size(y)`.
 
 2. If the other arguments to the distribution are non-scalars, then the
diff --git a/src/reference-manual/user-functions.qmd b/src/reference-manual/user-functions.qmd
index 324f9b8f6..c581d95f7 100644
--- a/src/reference-manual/user-functions.qmd
+++ b/src/reference-manual/user-functions.qmd
@@ -95,7 +95,7 @@ arguments to produce an expression, which has a value when executed.
 ### Functions as statements {-}
 
 Functions with void return types may be applied to arguments and used
-as [statements.qmd](statements). 
+as [statements.qmd](statements).
 These act like distribution statements or print
 statements.  Such uses are only appropriate for functions that act
 through side effects, such as incrementing the log probability
@@ -161,7 +161,11 @@ used in place of parameterized distributions on the right side of
 Functions of certain types are restricted on scope of usage.
 Functions whose names end in `_lp` assume access to the log
 probability accumulator and are only available in the transformed
-parameter and model blocks.
+parameters and model blocks.
+
+Functions whose name end in `_jacobian` assume access to the log
+probability accumulator may only be used within the transformed parameters
+block.
 
 Functions whose names end in `_rng`
 assume access to the random number generator and may only be used
@@ -293,8 +297,8 @@ a function elsewhere results in a compile-time error.
 
 ### Log probability access in functions {-}
 
-Functions that include 
-[statements.qmd#distribution-statements.section](distribution statements) or 
+Functions that include
+[statements.qmd#distribution-statements.section](distribution statements) or
 [statements.qmd#increment-log-prob.section](log probability increment statements)
 must have a name that ends in `_lp`.
 Attempts to use distribution statements or increment log probability
diff --git a/src/stan-users-guide/user-functions.qmd b/src/stan-users-guide/user-functions.qmd
index eb71a35cc..554ffa77b 100644
--- a/src/stan-users-guide/user-functions.qmd
+++ b/src/stan-users-guide/user-functions.qmd
@@ -293,15 +293,20 @@ Functions whose names end in `_jacobian` can use the
 `jacobian +=` statement. This can be used to implement a custom
 change of variables for arbitrary parameters.
 
-For example, here is a program which re-creates the built-in
+For example, this function recreates the built-in
 `<upper=x>` transform on real numbers:
+```stan
+real upper_bound_jacobian(real x, real ub) {
+  jacobian += x;
+  return ub - exp(x);
+}
+```
+
+It can be used as a replacement for `real<lower=ub>` as follows:
 
 ```stan
 functions {
-  real upper_bound_jacobian(real x, real ub) {
-    jacobian += x;
-    return ub - exp(x);
-  }
+  // upper_bound_jacobian as above
 }
 data {
   real ub;
@@ -312,6 +317,10 @@ parameters {
 transformed parameters {
   real b = upper_bound_jacobian(b_raw, ub);
 }
+model {
+  b ~ lognormal(0, 1);
+  // ...
+}
 ```
 
 ## Functions acting as random number generators