diff --git a/docs/2_34/404.html b/docs/2_34/404.html index b4c201dfd..1401c7548 100644 --- a/docs/2_34/404.html +++ b/docs/2_34/404.html @@ -70,10 +70,12 @@ - + + - + + - + @@ -217,6 +219,7 @@
+
@@ -266,25 +269,6 @@

Page Not Found

Back to top - - + + - + + - + @@ -183,6 +185,7 @@
+
@@ -218,25 +221,6 @@

Re Back to top - - + - + + - + + - + @@ -214,6 +216,7 @@ - - @@ -563,25 +536,6 @@

Error mess Back to top -

@@ -440,46 +451,101 @@

On this page

-
-

Standalone Generate Quantities

-

The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. For an overview of the uses of this feature, see the QuickStart Guide section and the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

-

This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a sample from an equivalent model, i.e., a model with the same parameters, transformed parameters, and model blocks, conditioned on the same data.

+
+

Generating Quantities of Interest from a Fitted Model

+

The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. Instead, you write a modified version of the original Stan program and add a generated quantities block or modify the existing one which specifies how to compute the new quantities of interest. Running the generate_quantities method on the new program together with sampler outputs (i.e., a set of draws) from the fitted model runs the generated quantities block of the new program using the the existing sample by plugging in the per-draw parameter estimates for the computations in the generated quantities block.

+

This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a parameter values from an equivalent model, i.e., a model with the same parameters block, conditioned on the same data.

+

The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

+
    +
  • generate simulated data for model testing by forward sampling
  • +
  • generate predictions for new data
  • +
  • calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
  • +
  • calculate posterior expectations
  • +
  • transform parameters for reporting
  • +
  • apply full Bayesian decision theory
  • +
  • calculate log likelihoods, deviances, etc. for model comparison
  • +
+

For an overview of the uses of this feature, see the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

+
+

Example

+

To illustrate how this works we use the generate_quantities method to do posterior predictive checks using the estimate of theta given the example bernoulli model and data, following the posterior predictive simulation procedure in the Stan User’s Guide.

+

We write a program bernoulli_ppc.stan which contains the following generated quantities block, with comments to explain the procedure:

+
generated quantities {
+  array[N] int y_sim;
+  // use current estimate of theta to generate new sample
+  for (n in 1:N) {
+    y_sim[n] = bernoulli_rng(theta);
+  }
+  // estimate theta_rep from new sample
+  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
+}
+

The rest of the program is the same as in bernoulli.stan.

+

The generate_method requires the sub-argument fitted_params which takes as its value the name of a Stan CSV file. The per-draw parameter values from the fitted_params file will be used to run the generated quantities block.

If we run the bernoulli.stan program for a single chain to generate a sample in file bernoulli_fit.csv:

> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv

Then we can run the bernoulli_ppc.stan to carry out the posterior predictive checks:

> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
                   data file=bernoulli.data.json \
                   output file=bernoulli_ppc.csv
-

The fitted_params file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

+

The output file bernoulli_ppc.csv contains only the values for the variables declared in the generated quantities block, i.e., theta_rep and the elements of y_sim:

+
# model = bernoulli_ppc_model
+# method = generate_quantities
+#   generate_quantities
+#     fitted_params = bernoulli_fit.csv
+# id = 1 (Default)
+# data
+#   file = bernoulli.data.json
+# init = 2 (Default)
+# random
+#   seed = 2983956445 (Default)
+# output
+#   file = output.csv (Default)
+y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep
+1,1,1,0,0,0,1,1,0,1,0.6
+1,1,0,1,0,0,1,0,1,0,0.5
+1,0,1,1,1,1,1,1,0,1,0.8
+0,1,0,1,0,1,0,1,0,0,0.4
+1,0,0,0,0,0,0,0,0,0,0.1
+0,0,0,0,0,1,1,1,0,0,0.3
+0,0,1,0,1,0,0,0,0,0,0.2
+1,0,1,0,1,1,0,1,1,0,0.6
+...
+

Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:

+
generated quantities {
+  array[N] int y_sim;
+  // use current estimate of theta to generate new sample
+  for (n in 1:N) {
+    y_sim[n] = bernoulli_rng(theta);
+  }
+  real<lower=0, upper=1> theta_cp = theta;
+  // estimate theta_rep from new sample
+  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
+}
+

Now the output is slightly more interpretable: theta_cp is the same as the theta used to generate the values y_sim[1] through y_sim[1]. Comparing columns theta_cp and theta_rep allows us to see how the uncertainty in our estimate of theta is carried forward into our predictions:

+
y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep
+0,1,1,0,1,0,0,1,1,0,0.545679,0.5
+1,1,1,1,1,1,0,1,1,0,0.527164,0.8
+1,1,1,1,0,1,1,1,1,0,0.529116,0.8
+1,0,1,1,1,1,0,0,1,0,0.478844,0.6
+0,1,0,0,0,0,1,0,1,0,0.238793,0.3
+0,0,0,0,0,1,1,0,0,0,0.258294,0.2
+1,1,1,0,0,0,0,0,0,0,0.258465,0.3
+
+
+

Errors

+

The fitted_params file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

Error reading fitted param names from sample csv file <filename.csv>

The fitted_params file must contain columns corresponding to legal values for all parameters defined in the model. If any parameters are missing, the program will exit with an error message of the form:

Error reading fitted param names from sample csv file <filename.csv>
-

The parameter values of the fitted_params are on the constrained scale and must obey all constraints. For example, if we modify the contencts of the first reported draw in bernoulli_fit.csv so that the value of theta is outside the declared bounds real<lower=0, upper=1>, the program will return the following error message:

-
Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] (in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
+

The parameter values of the fitted_params are on the constrained scale and must obey all constraints. For example, if we modify the contents of the first reported draw in bernoulli_fit.csv so that the value of theta is outside the declared bounds real<lower=0, upper=1>, the program will return the following error message:

+
Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \
+(in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
+
Back to top -
Back to top - - + + - + + - + @@ -214,6 +216,7 @@
+ @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
- - @@ -496,7 +469,7 @@

CmdStan Installation

Installation via conda

-

With conda, you can install CmdStan from the conda-forge channel. This will install a pre-built version of CmdStan along with the required dependencies (i.e. a C++ compiler, a version of Make, and required libraries) detailed below under [Source installation]. The conda installation is designed so one can use the R or Python bindings to CmdStan seamlessly. Additionally, it provides the command cmdstan_model to activate the CmdStan makefile from anywhere.

+

With conda, you can install CmdStan from the conda-forge channel. This will install a pre-built version of CmdStan along with the required dependencies (i.e. a C++ compiler, a version of Make, and required libraries). The conda installation is designed so one can use the R or Python bindings to CmdStan seamlessly. Additionally, it provides the command cmdstan_model to activate the CmdStan makefile from anywhere.

Note: This requires that conda has been installed already on your machine. You can either install miniconda, a free, minimal installer for conda or you can get the full Anaconda system which provides graphical installer wizards for MacOS and Windows users.

We recommend installing CmdStan in a new conda environment:

 conda create -n stan -c conda-forge cmdstan
@@ -570,7 +543,7 @@

Checking the St > make examples/bernoulli/bernoulli # fit to provided data (results of 10 trials, 2 out of 10 successes) -> ./examples/bernoulli/bernoulli sample\ +> ./examples/bernoulli/bernoulli sample\ data file=examples/bernoulli/bernoulli.data.json # default output written to file `output.csv`, @@ -741,7 +714,7 @@

Using GNU Make

4. Build the diagnose utility bin/diagnose 5. Build all libraries and object files compile and link an executable Stan program - Note: to build using multiple cores, use the -j option to make, e.g., + Note: to build using multiple cores, use the -j option to make, e.g., for 4 cores: > make build -j4 @@ -798,25 +771,6 @@

Using GNU Make

  • To open a Windows command shell, first open the Start Menu, (usually in the lower left of the screen), select option All Programs, then option Accessories, then program Command Prompt. Alternatively, enter [Windows+r] (both keys together on the keyboard), and enter cmd into the text field that pops up in the Run window, then press [Return] on the keyboard to run.↩︎

  • - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -286,7 +289,7 @@ - - - - @@ -345,7 +324,7 @@
    - - @@ -687,25 +660,6 @@

    Empty arrays in JSON< Back to top - - + + - + + - + @@ -185,6 +187,7 @@
    + @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -447,7 +420,7 @@

    On this page

    Laplace sampling

    -

    The laplace method produces a sample from a normal approximation centered at the mode of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. If the mode is a maximum likelihood estimate (MLE), the sample provides an estimate of the standard error of the likelihood. In general, the posterior mode in the unconstrained space doesn’t correspond to the mean (nor mode) in the constrained space, and thus the sample is needed to infer the mean as well as the standard deviation. (See this case study for a visual illustration.)

    +

    The laplace method produces a sample from a normal approximation centered at the mode of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. If the mode is a maximum likelihood estimate (MLE), the sample provides an estimate of the standard error of the likelihood. In general, the posterior mode in the unconstrained space doesn’t correspond to the mean (nor mode) in the constrained space, and thus the sample is needed to infer the mean as well as the standard deviation. (See this case study for a visual illustration.)

    This is computationally inexpensive compared to exact Bayesian inference with MCMC. The goodness of this estimate depends on both the estimate of the mode and how much the true posterior in the unconstrained space resembles a Gaussian.

    Configuration

    @@ -518,25 +491,6 @@

    Example

    Back to top -
    Back to top - - + + - + + - + @@ -234,6 +236,7 @@ - - @@ -481,6 +454,8 @@

    On this page

    • MCMC Sampling using Hamiltonian Monte Carlo
        +
      • Running the sampler
      • +
      • Stan CSV output file
      • Iterations
      • Adaptation
      • Sampler diagnostic file
      • -
      • Multiple chains in one executable
      • -
      • Examples - older parallelism +
      • Running multiple chains +
      • +
      • Summarizing sampler output(s) with stansummary
      • +
      • Examples - older parallelism
        • Running multiple chains with a specified RNG seed
        • Changing the default warmup and sampling iterations
        • @@ -520,8 +499,78 @@

          On this page

          MCMC Sampling using Hamiltonian Monte Carlo

          The sample method provides Bayesian inference over the model conditioned on data using Hamiltonian Monte Carlo (HMC) sampling. By default, the inference engine used is the No-U-Turn sampler (NUTS), an adaptive form of Hamiltonian Monte Carlo sampling. For details on HMC and NUTS, see the Stan Reference Manual chapter on MCMC Sampling.

          -

          The full set of configuration options available for the sample method is reported at the beginning of the sampler output file as CSV comments. When the example model bernoulli.stan is run via the command line with all default arguments, the resulting Stan CSV file header comments show the complete set of default sample method configuration options:

          -
          # method = sample (Default)
          +
          +

          Running the sampler

          +

          To generate a sample from the posterior distribution of the model conditioned on the data, we run the executable program with the argument sample or method=sample together with the input data. The executable can be run from any directory.

          +

          The full set of configuration options available for the sample method is available by using the sample help-all subcommand. The arguments with their requested values or defaults are also reported at the beginning of the sampler console output and in the output CSV file’s comments.

          +

          Here, we run it in the directory which contains the Stan program and input data, <cmdstan-home>/examples/bernoulli:

          +
          > cd examples/bernoulli
          +> ls
          +  bernoulli  bernoulli.data.json  bernoulli.data.R  bernoulli.stan
          +

          To execute sampling of the model under Linux or Mac, use:

          +
          > ./bernoulli sample data file=bernoulli.data.json
          +

          In Windows, the ./ prefix is not needed:

          +
          > bernoulli.exe sample data file=bernoulli.data.json
          +

          The output is the same across all supported platforms. First, the configuration of the program is echoed to the standard output:

          +
          method = sample (Default)
          +  sample
          +    num_samples = 1000 (Default)
          +    num_warmup = 1000 (Default)
          +    save_warmup = 0 (Default)
          +    thin = 1 (Default)
          +    adapt
          +      engaged = 1 (Default)
          +      gamma = 0.050000000000000003 (Default)
          +      delta = 0.80000000000000004 (Default)
          +      kappa = 0.75 (Default)
          +      t0 = 10 (Default)
          +      init_buffer = 75 (Default)
          +      term_buffer = 50 (Default)
          +      window = 25 (Default)
          +      save_metric = 0 (Default)
          +    algorithm = hmc (Default)
          +      hmc
          +        engine = nuts (Default)
          +          nuts
          +            max_depth = 10 (Default)
          +        metric = diag_e (Default)
          +        metric_file =  (Default)
          +        stepsize = 1 (Default)
          +        stepsize_jitter = 0 (Default)
          +    num_chains = 1 (Default)
          +id = 0 (Default)
          +data
          +  file = bernoulli.data.json
          +init = 2 (Default)
          +random
          +  seed = 3252652196 (Default)
          +output
          +  file = output.csv (Default)
          +  diagnostic_file =  (Default)
          +  refresh = 100 (Default)
          +

          After the configuration has been displayed, a short timing message is given.

          +
          Gradient evaluation took 1.2e-05 seconds
          +1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
          +Adjust your expectations accordingly!
          +

          Next, the sampler reports the iteration number, reporting the percentage complete.

          +
          Iteration:    1 / 2000 [  0%]  (Warmup)
          +...
          +Iteration: 2000 / 2000 [100%]  (Sampling)
          +

          Finally, the sampler reports timing information:

          +
           Elapsed Time: 0.007 seconds (Warm-up)
          +               0.017 seconds (Sampling)
          +               0.024 seconds (Total)
          +
          +
          +

          Stan CSV output file

          +

          Each execution of the model results in draws from a single Markov chain being written to a file in comma-separated value (CSV) format. The default name of the output file is output.csv.

          +

          The first part of the output file records the version of the underlying Stan library and the configuration as comments (i.e., lines beginning with the pound sign (#)).

          +

          When the example model bernoulli.stan is run via the command line with all default arguments, the following configuration is displayed:

          +
          # stan_version_major = 2
          +# stan_version_minor = 23
          +# stan_version_patch = 0
          +# model = bernoulli_model
          +# method = sample (Default)
           #   sample
           #     num_samples = 1000 (Default)
           #     num_warmup = 1000 (Default)
          @@ -546,7 +595,44 @@ 

          MCMC Sampling using Hamiltonian Monte Carlo

          # metric_file = (Default) # stepsize = 1.000000 (Default) # stepsize_jitter = 0.000000 (Default) -# num_chains = 1 (Default)
          +# num_chains = 1 (Default) +# output +# file = output.csv (Default) +# diagnostic_file = (Default) +# refresh = 100 (Default)
          +

          This is followed by a CSV header indicating the names of the values sampled.

          +
          lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
          +

          The first output columns report the HMC sampler information:

          +
            +
          • lp__ - the total log probability density (up to an additive constant) at each sample
          • +
          • accept_stat__ - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory
          • +
          • stepsize__ - integrator step size
          • +
          • treedepth__ - depth of tree used by NUTS (NUTS sampler)
          • +
          • n_leapfrog__ - number of leapfrog calculations (NUTS sampler)
          • +
          • divergent__ - has value 1 if trajectory diverged, otherwise 0. (NUTS sampler)
          • +
          • energy__ - value of the Hamiltonian
          • +
          • int_time__ - total integration time (static HMC sampler)
          • +
          +

          Because the above header is from the NUTS sampler, it has columns treedepth__, n_leapfrog__, and divergent__ and doesn’t have column int_time__. The remaining columns correspond to model parameters. For the Bernoulli model, it is just the final column, theta.

          +

          The header line is written to the output file before warmup begins. If option save_warmup is set to 1, the warmup draws are output directly after the header. The total number of warmup draws saved is num_warmup divided by thin, rounded up (i.e., ceiling).

          +

          Following the warmup draws (if any), are comments which record the results of adaptation: the stepsize, and inverse mass metric used during sampling:

          +
          # Adaptation terminated
          +# Step size = 0.884484
          +# Diagonal elements of inverse mass matrix:
          +# 0.535006
          +

          The default sampler is NUTS with an adapted step size and a diagonal inverse mass matrix. For this example, the step size is 0.884484, and the inverse mass contains the single entry 0.535006 corresponding to the parameter theta.

          +

          Draws from the posterior distribution are printed out next, each line containing a single draw with the columns corresponding to the header.

          +
          -6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853
          +-6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295
          +-7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299
          +-6.88712,1,0.884484,1,1,0,7.02101,0.188229
          +-7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596
          +...
          +

          The output ends with timing details:

          +
          #  Elapsed Time: 0.007 seconds (Warm-up)
          +#                0.017 seconds (Sampling)
          +#                0.024 seconds (Total)
          +

          Iterations

          At every sampler iteration, the sampler returns a set of estimates for all parameters and quantities of interest in the model. During warmup, the NUTS algorithm adjusts the HMC algorithm parameters metric and stepsize in order to efficiently sample from typical set, the neighborhood substantial posterior probability mass through which the Markov chain will travel in equilibrium. After warmup, the fixed metric and stepsize are used to produce a set of draws.

          @@ -652,32 +738,86 @@

          Integration time

          Sampler diagnostic file

          The output keyword sub-argument diagnostic_file=<filepath> specifies the location of the auxiliary output file which contains sampler information for each draw, and the gradients on the unconstrained scale and log probabilities for all parameters in the model. By default, no auxiliary output file is produced.

          -
          -

          Multiple chains in one executable

          -

          As described in the quickstart section on parallelism, the preferred way to run multiple chains is to use the num_chains argument.

          -

          This will run multiple chains of MCMC from the same executable, which can save on memory usage due to only needing one copy of the model and data. As noted in the quickstart guide, this will be done in parallel if the model was compiled with STAN_THREADS=true.

          +
          +

          Running multiple chains

          +

          A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. One way to monitor whether a chain has approximately converged to the equilibrium distribution is to compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains.

          +

          The preferred way of using multiple chains is to run them all from the same executable using the num_chains argument. There is also the option to use the Unix or DOS shell to run multiple executables.

          +
          +

          Using the num_chains argument to run multiple chains

          +

          The num_chains argument can be used for all of Stan’s samplers with the exception of the static HMC engine. This will run multiple chains of MCMC from the same executable, which can save on memory usage due to only needing one copy of the model and data. Depending on whether the model was compiled with STAN_THREADS=true, these will either run in parallel or one after the other.

          The num_chains argument changes the meanings of several other arguments when it is greater than 1 (the default). Many arguments are now interpreted as a “template” which is used for each chain.

          For example, when num_chains=2, the argument output file=foo.csv no longer produces a file foo.csv, but instead produces two files, foo_1.csv and foo_2.csv. If you also supply id=5, the files produced will be foo_5.csv and foo_6.csvid=5 gives the id of the first chain, and the remaining chains are sequential from there.

          This also applies to input files, like those used for initialization. For example, if num_chains=3 and init=bar.json will first look for bar_1.json. If it exists, it will use bar_1.json for the first chain, bar_2.json for the second, and so on. If bar_1.json does not exist, it falls back to looking for bar.json, and if it exists, uses the same initial values for each chain. The numbers in these filenames are also based on the id argument, which defaults to 1.

          +

          For example, this will run 4 chains:

          +
          ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv
          +

          This will produce samples in output_1.csv, output_2.csv, output_3.csv, output_4.csv. A suffix with the chain id is appended to the provided output filename (output.csv in the above command).

          +

          If the model was not compiled with STAN_THREADS=true, the above command will run 4 chains sequentially.

          +

          If the model was compiled with STAN_THREADS=true, the chains can run in parallel, with the num_threads argument defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization (map_rect or reduce_sum calls), the below command will run 4 chains in parallel, provided there are cores available:

          +
          ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4
          +

          If the model uses within-chain parallelization (map_rect or reduce_sum calls), the threads are automatically scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, 1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the Threading Building Blocks scheduler.

          +
          ./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16
          -
          -

          Examples - older parallelism

          +
          +
          +

          Summarizing sampler output(s) with stansummary

          +

          The stansummary utility processes one or more output files from a run or set of runs of Stan’s HMC sampler given a model and data. For all columns in the Stan CSV output file stansummary reports a set of statistics including mean, standard deviation, percentiles, effective number of samples, and \(\hat{R}\) values.

          +

          To run stansummary on the output files generated by the for loop above, by the above run of the bernoulli model on Mac or Linux:

          +
          <cmdstan-home>/bin/stansummary output_*.csv
          +

          On Windows, use backslashes to call the stansummary.exe.

          +
          <cmdstan-home>\bin\stansummary.exe output_*.csv
          +

          The stansummary output consists of one row of statistics per column in the Stan CSV output file. Therefore, the first rows in the stansummary report statistics over the sampler state. The final row of output summarizes the estimates of the model variable theta:

          +
          Inference for Stan model: bernoulli_model
          +4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
          +
          +Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total
          +Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total
          +
          +                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
          +lp__            -7.3  1.8e-02    0.75   -8.8  -7.0  -6.8  1.8e+03  2.4e+04  1.0e+00
          +accept_stat__   0.89  2.7e-03    0.17   0.52  0.96   1.0  3.9e+03  5.1e+04  1.0e+00
          +stepsize__       1.1  7.5e-02    0.11   0.93   1.2   1.2  2.0e+00  2.6e+01  2.5e+13
          +treedepth__      1.4  8.1e-03    0.49    1.0   1.0   2.0  3.6e+03  4.7e+04  1.0e+00
          +n_leapfrog__     2.3  1.7e-02    0.98    1.0   3.0   3.0  3.3e+03  4.3e+04  1.0e+00
          +divergent__     0.00      nan    0.00   0.00  0.00  0.00      nan      nan      nan
          +energy__         7.8  2.6e-02     1.0    6.8   7.5   9.9  1.7e+03  2.2e+04  1.0e+00
          +theta           0.25  2.9e-03    0.12  0.079  0.23  0.46  1.7e+03  2.1e+04  1.0e+00
          +
          +Samples were drawn using hmc with nuts.
          +For each parameter, N_Eff is a crude measure of effective sample size,
          +and R_hat is the potential scale reduction factor on split chains (at
          +convergence, R_hat=1).
          +

          In this example, we conditioned the model on data consisting of the outcomes of 10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% percentile values for theta reflect the uncertainty in our estimate, due to the small amount of data, given the prior of beta(1, 1)

          +
          +
          +

          Examples - older parallelism

          Note: Many of these examples can be simplified by using the num_chains argument.

          -

          The Quickstart Guide MCMC Sampling chapter section on multiple chains also showed how to run multiple chains given a model and data, using the minimal required command line options: the method, the name of the data file, and a chain-specific name for the output file.

          -

          This creates multiple copies of the model process which will all load the data.

          -

          To run 4 chains in parallel on Mac OS and Linux, the syntax in both bash and zsh is the same:

          +

          When the num_chains argument is not available or is undesirable for whatever reason, built-in tools in the system shell can be used.

          +

          To run multiple chains given a model and data, either sequentially or in parallel, we can also use the Unix or DOS shell for loop to set up index variables needed to identify each chain and its outputs.

          +

          On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters is:

          +
          for NAME [in LIST]; do COMMANDS; done
          +

          The list can be a simple sequence of numbers, or you can use the shell expansion syntax {1..N} which expands to the sequence from \(1\) to \(N\), e.g. {1..4} expands to 1 2 3 4. Note that the expression {1..N} cannot contain spaces.

          +

          To run 4 chains for the example bernoulli model on MacOS or Linux:

          +
          > for i in {1..4}
          +    do
          +      ./bernoulli sample data file=bernoulli.data.json \
          +      output file=output_${i}.csv
          +    done
          +

          The backslash (\) indicates a line continuation in Unix. The expression ${i} substitutes in the value of loop index variable i. To run chains in parallel, put an ampersand (&) at the end of the nested sampler command:

          > for i in {1..4}
               do
          -      ./bernoulli sample data file=my_model.data.json \
          -                  output file=output_${i}.csv &
          +      ./bernoulli sample data file=bernoulli.data.json \
          +      output file=output_${i}.csv &
               done
          -

          The backslash (\) indicates a line continuation in Unix. The expression ${i} substitutes in the value of loop index variable i. The ampersand (&) pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.

          -

          On Windows the corresponding loop is:

          +

          This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.

          +

          On Windows, the DOS for-loop syntax is one of:

          +
          for %i in (SET) do COMMAND COMMAND-ARGUMENTS
          +for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS
          +

          To run 4 chains in parallel on Windows:

          >for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^
          -                                    data file=my_model.data.json my_data ^
          +                                    data file=bernoulli.data.json my_data ^
                                               output file=output_%i.csv

          The caret (^) indicates a line continuation in DOS. The expression %i is the loop index.

          -

          In the following examples, we focus on just the nested sampler command for Unix.

          +

          In the following extended examples, we focus on just the nested sampler command for Unix.

          Running multiple chains with a specified RNG seed

          For reproducibility, we specify the same RNG seed across all chains and use the chain id argument to specify the RNG offset.

          @@ -835,25 +975,6 @@

          Everything example

          Betancourt, Michael. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv 1701.02434. https://arxiv.org/abs/1701.02434.
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    MCMC Sampling

    -
    -

    Running the sampler

    -

    To generate a sample from the posterior distribution of the model conditioned on the data, we run the executable program with the argument sample or method=sample together with the input data. The executable can be run from any directory. Here, we run it in the directory which contains the Stan program and input data, <cmdstan-home>/examples/bernoulli:

    -
    > cd examples/bernoulli
    -

    To execute sampling of the model under Linux or Mac, use:

    -
    > ./bernoulli sample data file=bernoulli.data.json
    -

    In Windows, the ./ prefix is not needed:

    -
    > bernoulli.exe sample data file=bernoulli.data.json
    -

    The output is the same across all supported platforms. First, the configuration of the program is echoed to the standard output:

    -
    method = sample (Default)
    -  sample
    -    num_samples = 1000 (Default)
    -    num_warmup = 1000 (Default)
    -    save_warmup = 0 (Default)
    -    thin = 1 (Default)
    -    adapt
    -      engaged = 1 (Default)
    -      gamma = 0.050000000000000003 (Default)
    -      delta = 0.80000000000000004 (Default)
    -      kappa = 0.75 (Default)
    -      t0 = 10 (Default)
    -      init_buffer = 75 (Default)
    -      term_buffer = 50 (Default)
    -      window = 25 (Default)
    -      save_metric = 0 (Default)
    -    algorithm = hmc (Default)
    -      hmc
    -        engine = nuts (Default)
    -          nuts
    -            max_depth = 10 (Default)
    -        metric = diag_e (Default)
    -        metric_file =  (Default)
    -        stepsize = 1 (Default)
    -        stepsize_jitter = 0 (Default)
    -    num_chains = 1 (Default)
    -id = 0 (Default)
    -data
    -  file = bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 3252652196 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -

    After the configuration has been displayed, a short timing message is given.

    -
    Gradient evaluation took 1.2e-05 seconds
    -1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
    -Adjust your expectations accordingly!
    -

    Next, the sampler reports the iteration number, reporting the percentage complete.

    -
    Iteration:    1 / 2000 [  0%]  (Warmup)
    -....
    -Iteration: 2000 / 2000 [100%]  (Sampling)
    -

    Finally, the sampler reports timing information:

    -
     Elapsed Time: 0.007 seconds (Warm-up)
    -               0.017 seconds (Sampling)
    -               0.024 seconds (Total)
    -
    -
    -

    Running multiple chains

    -

    A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains.

    -

    There are two different ways of running multiple chains, with the num_chains argument using a single executable and by using the Unix and DOS shell to run multiple executables.

    -
    -

    Using the num_chains argument to run multiple chains

    -

    The num_chains argument can be used for all of Stan’s samplers with the exception of the static HMC engine.

    -

    Example that will run 4 chains:

    -
    ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv
    -

    If the model was not compiled with STAN_THREADS=true, the above command will run 4 chains sequentially and will produce the sample in output_1.csv, output_2.csv, output_3.csv, output_4.csv. A suffix with the chain id is appended to the provided output filename (output.csv in the above command).

    -

    If the model was compiled with STAN_THREADS=true, the chains can run in parallel, with the num_threads argument defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization (map_rect or reduce_sum calls), the below command will run 4 chains in parallel, provided there are cores available:

    -
    ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4
    -

    If the model uses within-chain parallelization (map_rect or reduce_sum calls), the threads are automatically scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, 1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the Threading Building Blocks scheduler.

    -
    ./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16
    -
    -
    -

    Using shell for running multiple chains

    -

    To run multiple chains given a model and data, either sequentially or in parallel, we can also use the Unix or DOS shell for loop to set up index variables needed to identify each chain and its outputs.

    -

    On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters is:

    -
    for NAME [in LIST]; do COMMANDS; done
    -

    The list can be a simple sequence of numbers, or you can use the shell expansion syntax {1..N} which expands to the sequence from \(1\) to \(N\), e.g. {1..4} expands to 1 2 3 4. Note that the expression {1..N} cannot contain spaces.

    -

    To run 4 chains for the example bernoulli model on MacOS or Linux:

    -
    > for i in {1..4}
    -    do
    -      ./bernoulli sample data file=bernoulli.data.json \
    -      output file=output_${i}.csv
    -    done
    -

    The backslash (\) indicates a line continuation in Unix. The expression ${i} substitutes in the value of loop index variable i. To run chains in parallel, put an ampersand (&) at the end of the nested sampler command:

    -
    > for i in {1..4}
    -    do
    -      ./bernoulli sample data file=bernoulli.data.json \
    -      output file=output_${i}.csv &
    -    done
    -

    This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.

    -

    On Windows, the DOS for-loop syntax is one of:

    -
    for %i in (SET) do COMMAND COMMAND-ARGUMENTS
    -for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS
    -

    To run 4 chains in parallel on Windows:

    -
    >for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^
    -                                    data file=bernoulli.data.json my_data ^
    -                                    output file=output_%i.csv
    -

    The caret (^) indicates a line continuation in DOS.

    -
    -
    -
    -

    Stan CSV output file

    -

    Each execution of the model results in draws from a single Markov chain being written to a file in comma-separated value (CSV) format. The default name of the output file is output.csv.

    -

    The first part of the output file records the version of the underlying Stan library and the configuration as comments (i.e., lines beginning with the pound sign (#)).

    -
    # stan_version_major = 2
    -# stan_version_minor = 23
    -# stan_version_patch = 0
    -# model = bernoulli_model
    -# method = sample (Default)
    -#   sample
    -#     num_samples = 1000 (Default)
    -#     num_warmup = 1000 (Default)
    -...
    -# output
    -#   file = output.csv (Default)
    -#   diagnostic_file =  (Default)
    -#   refresh = 100 (Default)
    -

    This is followed by a CSV header indicating the names of the values sampled.

    -
    lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
    -

    The first output columns report the HMC sampler information:

    -
      -
    • lp__ - the total log probability density (up to an additive constant) at each sample
    • -
    • accept_stat__ - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory
    • -
    • stepsize__ - integrator step size
    • -
    • treedepth__ - depth of tree used by NUTS (NUTS sampler)
    • -
    • n_leapfrog__ - number of leapfrog calculations (NUTS sampler)
    • -
    • divergent__ - has value 1 if trajectory diverged, otherwise 0. (NUTS sampler)
    • -
    • energy__ - value of the Hamiltonian
    • -
    • int_time__ - total integration time (static HMC sampler)
    • -
    -

    Because the above header is from the NUTS sampler, it has columns treedepth__, n_leapfrog__, and divergent__ and doesn’t have column int_time__. The remaining columns correspond to model parameters. For the Bernoulli model, it is just the final column, theta.

    -

    The header line is written to the output file before warmup begins. If option save_warmup is set to 1, the warmup draws are output directly after the header. The total number of warmup draws saved is num_warmup divided by thin, rounded up (i.e., ceiling).

    -

    Following the warmup draws (if any), are comments which record the results of adaptation: the stepsize, and inverse mass metric used during sampling:

    -
    # Adaptation terminated
    -# Step size = 0.884484
    -# Diagonal elements of inverse mass matrix:
    -# 0.535006
    -

    The default sampler is NUTS with an adapted step size and a diagonal inverse mass matrix. For this example, the step size is 0.884484, and the inverse mass contains the single entry 0.535006 corresponding to the parameter theta.

    -

    Draws from the posterior distribution are printed out next, each line containing a single draw with the columns corresponding to the header.

    -
    -6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853
    --6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295
    --7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299
    --6.88712,1,0.884484,1,1,0,7.02101,0.188229
    --7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596
    -...
    -

    The output ends with timing details:

    -
    #  Elapsed Time: 0.007 seconds (Warm-up)
    -#                0.017 seconds (Sampling)
    -#                0.024 seconds (Total)
    -
    -
    -

    Summarizing sampler output(s) with stansummary

    -

    The stansummary utility processes one or more output files from a run or set of runs of Stan’s HMC sampler given a model and data. For all columns in the Stan CSV output file stansummary reports a set of statistics including mean, standard deviation, percentiles, effective number of samples, and \(\hat{R}\) values.

    -

    To run stansummary on the output files generated by the for loop above, by the above run of the bernoulli model on Mac or Linux:

    -
    <cmdstan-home>/bin/stansummary output_*.csv
    -

    On Windows, use backslashes to call the stansummary.exe.

    -
    <cmdstan-home>\bin\stansummary.exe output_*.csv
    -

    The stansummary output consists of one row of statistics per column in the Stan CSV output file. Therefore, the first rows in the stansummary report statistics over the sampler state. The final row of output summarizes the estimates of the model variable theta:

    -
    Inference for Stan model: bernoulli_model
    -4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
    -
    -Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total
    -Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total
    -
    -                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
    -lp__            -7.3  1.8e-02    0.75   -8.8  -7.0  -6.8  1.8e+03  2.4e+04  1.0e+00
    -accept_stat__   0.89  2.7e-03    0.17   0.52  0.96   1.0  3.9e+03  5.1e+04  1.0e+00
    -stepsize__       1.1  7.5e-02    0.11   0.93   1.2   1.2  2.0e+00  2.6e+01  2.5e+13
    -treedepth__      1.4  8.1e-03    0.49    1.0   1.0   2.0  3.6e+03  4.7e+04  1.0e+00
    -n_leapfrog__     2.3  1.7e-02    0.98    1.0   3.0   3.0  3.3e+03  4.3e+04  1.0e+00
    -divergent__     0.00      nan    0.00   0.00  0.00  0.00      nan      nan      nan
    -energy__         7.8  2.6e-02     1.0    6.8   7.5   9.9  1.7e+03  2.2e+04  1.0e+00
    -theta           0.25  2.9e-03    0.12  0.079  0.23  0.46  1.7e+03  2.1e+04  1.0e+00
    -
    -Samples were drawn using hmc with nuts.
    -For each parameter, N_Eff is a crude measure of effective sample size,
    -and R_hat is the potential scale reduction factor on split chains (at
    -convergence, R_hat=1).
    -

    In this example, we conditioned the model on a dataset consisting of the outcomes of 10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% percentile values for theta reflect the uncertainty in our estimate, due to the small amount of data, given the prior of beta(1, 1)

    - - -
    -
    - - Back to top
    - - - -
    - - - - - \ No newline at end of file diff --git a/docs/2_34/cmdstan-guide/optimization_intro.html b/docs/2_34/cmdstan-guide/optimization_intro.html deleted file mode 100644 index e74210b71..000000000 --- a/docs/2_34/cmdstan-guide/optimization_intro.html +++ /dev/null @@ -1,961 +0,0 @@ - - - - - - - - - -Optimization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    Optimization

    -

    The CmdStan executable can run Stan’s optimization algorithms which provide a deterministic method to find the posterior mode. If the posterior is not convex, there is no guarantee Stan will be able to find the global mode as opposed to a local optimum of log probability.

    -

    The executable does not need to be recompiled in order to switch from sampling to optimization, and the data input format is the same. The following is a minimal call to Stan’s optimizer using defaults for everything but the location of the data file.

    -
    > ./bernoulli optimize data file=bernoulli.data.json
    -

    Executing this command prints both output to the console and to a csv file.

    -

    The first part of the console output reports on the configuration used. The above command uses all default configurations, therefore the optimizer used is the L-BFGS optimizer and its default initial stepsize and tolerances for monitoring convergence:

    -
     ./bernoulli optimize data file=bernoulli.data.json
    -method = optimize
    -  optimize
    -    algorithm = lbfgs (Default)
    -      lbfgs
    -        init_alpha = 0.001 (Default)
    -        tol_obj = 1e-12 (Default)
    -        tol_rel_obj = 10000 (Default)
    -        tol_grad = 1e-08 (Default)
    -        tol_rel_grad = 10000000 (Default)
    -        tol_param = 1e-08 (Default)
    -        history_size = 5 (Default)
    -    iter = 2000 (Default)
    -    save_iterations = 0 (Default)
    -id = 0 (Default)
    -data
    -  file = bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 87122538 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -

    The second part of the output indicates how well the algorithm fared, here converging and terminating normally. The numbers reported indicate that it took 5 iterations and 8 gradient evaluations. This is, not surprisingly, far fewer iterations than required for sampling; even fewer iterations would be used with less stringent user-specified convergence tolerances. The alpha value is for step size used. In the final state the change in parameters was roughly \(0.002\) and the length of the gradient roughly 3e-05 (\(0.00003\)).

    -
    Initial log joint probability = -6.85653
    -    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
    -       5      -5.00402    0.00184936   3.35074e-05           1           1        8   
    -Optimization terminated normally: 
    -  Convergence detected: relative gradient magnitude is below tolerance
    -

    The output from optimization is written into the file output.csv by default. The output follows the same pattern as the output for sampling, first dumping the entire set of parameters used as comment lines:

    -
    # stan_version_major = 2
    -# stan_version_minor = 23
    -# stan_version_patch = 0
    -# model = bernoulli_model
    -# method = optimize
    -#   optimize
    -#     algorithm = lbfgs (Default)
    -...
    -

    Following the config information, are two lines of output: the CSV headers and the recorded values:

    -
    lp__,theta
    --5.00402,0.200003
    -

    Note that everything is a comment other than a line for the header, and a line for the values. Here, the header indicates the unnormalized log probability with lp__ and the model parameter theta. The maximum log probability is -5.0 and the posterior mode for theta is 0.20. The mode exactly matches what we would expect from the data. Because the prior was uniform, the result 0.20 represents the maximum likelihood estimate (MLE) for the very simple Bernoulli model. Note that no uncertainty is reported.

    -

    All of the optimizers stream per-iteration intermediate approximations to the command line console. The sub-argument save_iterations specifies whether or not to save the intermediate iterations to the output file. Allowed values are \(0\) or \(1\), corresponding to False and True respectively. The default value is \(0\), i.e., intermediate iterations are not saved to the output file. Running the optimizer with save_iterations=1 writes both the initial log joint probability and values for all iterations to the output CSV file.

    -

    Running the example model with option save_iterations=1, i.e., the command

    -
    > ./bernoulli optimize save_iterations=1 data file=bernoulli.data.json
    -

    produces CSV file output rows:

    -
    lp__,theta
    --6.85653,0.493689
    --6.10128,0.420936
    --5.02953,0.22956
    --5.00517,0.206107
    --5.00403,0.200299
    --5.00402,0.200003
    - - -
    - - Back to top
    - - - -
    - - - - - \ No newline at end of file diff --git a/docs/2_34/cmdstan-guide/optimize_config.html b/docs/2_34/cmdstan-guide/optimize_config.html index 1f5b56d7b..2a3ab13e5 100644 --- a/docs/2_34/cmdstan-guide/optimize_config.html +++ b/docs/2_34/cmdstan-guide/optimize_config.html @@ -7,7 +7,7 @@ -Optimization Configuration +Optimization +/* CSS for citations */ +div.csl-bib-body { } +div.csl-entry { + clear: both; + margin-bottom: 0em; +} +.hanging-indent div.csl-entry { + margin-left:2em; + text-indent:-2em; +} +div.csl-left-margin { + min-width:2em; + float:left; +} +div.csl-right-inline { + margin-left:2em; + padding-left:1em; +} +div.csl-indent { + margin-left: 2em; +} @@ -38,10 +58,12 @@ - + + - + + - + @@ -214,6 +236,7 @@
    + @@ -252,7 +275,7 @@ - - - - @@ -311,7 +310,7 @@
    - - @@ -477,9 +470,14 @@

    On this page

    Pathfinder Method for Approximate Bayesian Inference

    -

    The Pathfinder algorithm is described in section Pathfinder overview.

    +

    The CmdStan method pathfinder uses the Pathfinder algorithm of Zhang et al. (2022), which is further described in the Stan Reference Manual.

    +

    A single run of the Pathfinder algorithm generates a set of approximate draws. Inference is improved by running multiple Pathfinder instances and using Pareto-smoothed importance resampling (PSIS) of the resulting sets of draws. This better matches non-normal target densities and also eliminates minor modes.

    The pathfinder method runs multi-path Pathfinder by default, which returns a PSIS sample over the draws from several individual (“single-path”) Pathfinder runs. Argument num_paths specifies the number of single-path Pathfinders, the default is \(4\). If num_paths is set to 1, then only one individual Pathfinder is run without the PSIS reweighting of the sample.

    -

    The full set of configuration options available for the pathfinder method is reported at the beginning of the pathfinder output file as CSV comments. When the example model bernoulli.stan is run with method=pathfinder via the command line with all default arguments, the resulting Stan CSV file header comments show the complete set of default configuration options:

    +

    The full set of configuration options available for the pathfinder method is available by using the pathfinder help-all subcommand. The arguments with their requested values or defaults are also reported at the beginning of the algorithm’s console output and in the output CSV file’s comments.

    +

    The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.

    +
    > ./bernoulli pathfinder data file=bernoulli.data.R
    +

    Executing this command prints both output to the console and csv files.

    +

    The first part of the console output reports on the configuration used.

    method = pathfinder
       pathfinder
         init_alpha = 0.001 (Default)
    @@ -496,7 +494,39 @@ 

    Pathfinder Method for Approximate Bayesian Inference

    save_single_paths = 0 (Default) max_lbfgs_iters = 1000 (Default) num_draws = 1000 (Default) - num_elbo_draws = 25 (Default)
    + num_elbo_draws = 25 (Default) +id = 1 (Default) +data + file = examples/bernoulli/bernoulli.data.json +init = 2 (Default) +random + seed = 1995513073 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) + sig_figs = -1 (Default) + profile_file = profile.csv (Default) +num_threads = 1 (Default) +

    The rest of the output describes the progression of the algorithm.

    +

    By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, then uses importance resampling on the set of returned draws to produce the specified number of draws.

    +
    Path [1] :Initial log joint density = -11.543343
    +Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      1.070e-03   1.707e-05    1.000e+00  1.000e+00       126 -6.220e+00 -6.220e+00
    +Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
    +Path [2] :Initial log joint density = -7.443345
    +Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      9.936e-05   3.738e-07    1.000e+00  1.000e+00       126 -6.164e+00 -6.164e+00
    +Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
    +Path [3] :Initial log joint density = -18.986308
    +Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      2.996e-04   4.018e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00
    +Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
    +Path [4] :Initial log joint density = -8.304453
    +Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      2.814e-04   2.034e-06    1.000e+00  1.000e+00       126 -6.221e+00 -6.221e+00
    +Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
    +Total log probability function evaluations:8404

    Pathfinder Configuration

      @@ -512,18 +542,43 @@

      Pathfinder Config

    L-BFGS Configuration

    -

    Arguments init_alpha through history_size are the full set of arguments to the L-BFGS optimizer and have the same defaults for optimization.

    +

    Arguments init_alpha through history_size are the full set of arguments to the L-BFGS optimizer and have the same defaults for optimization.

    Multi-path Pathfinder CSV files

    By default, the pathfinder method uses 4 independent Pathfinder runs, each of which produces 1000 approximate draws, which are then importance resampled down to 1000 final draws. The importance resampled draws are output as a StanCSV file.

    The CSV files have the following structure:

    -
      -
    • The full set of configuration options available for the pathfinder method is reported at the beginning of the sampler output file as CSV comments.

    • -
    • The CSV header row consists of columns lp_approx__, lp__, and the Stan model parameters, transformed parameters, and generated quantities in the order in which they are declared in the Stan program.

    • -
    • The data rows contain the draws from the single- or multi-path run.

    • -
    • Final comments containing timing information.

    • -
    +

    The initial CSV comment rows contain the complete set of CmdStan configuration options.

    +
    ...
    +# method = pathfinder
    +#   pathfinder
    +#     init_alpha = 0.001 (Default)
    +#     tol_obj = 9.9999999999999998e-13 (Default)
    +#     tol_rel_obj = 10000 (Default)
    +#     tol_grad = 1e-08 (Default)
    +#     tol_rel_grad = 10000000 (Default)
    +#     tol_param = 1e-08 (Default)
    +#     history_size = 5 (Default)
    +#     num_psis_draws = 1000 (Default)
    +#     num_paths = 4 (Default)
    +#     psis_resample = 1 (Default)
    +#     calculate_lp = 1 (Default)
    +#     save_single_paths = 0 (Default)
    +#     max_lbfgs_iters = 1000 (Default)
    +#     num_draws = 1000 (Default)
    +#     num_elbo_draws = 25 (Default)
    +...
    +

    Next is the column header line, followed the set of approximate draws. The Pathfinder algorithm first outputs lp_approx__, the log density in the approximating distribution, and lp__, the log density in the target distribution, followed by estimates of the model parameters, transformed parameters, and generated quantities.

    +
    lp_approx__,lp__,theta
    +-2.4973, -8.2951, 0.0811852
    +-0.87445, -7.06526, 0.160207
    +-0.812285, -7.07124, 0.35819
    +...
    +

    The final lines are comment lines which give timing information.

    +
    # Elapsed Time: 0.016000 seconds (Pathfinders)
    +#               0.003000 seconds (PSIS)
    +#               0.019000 seconds (Total)
    +

    Pathfinder provides option save_single_paths which will save output from the single-path Pathfinder runs.

    Single-path Pathfinder Outputs.

    @@ -591,29 +646,15 @@

    Single-pat

    Option num_paths=1 runs one single-path Pathfinder and output CSV file contains the draws from that run without PSIS reweighting. The combination of arguments num_paths=1 save_single_paths=1 creates just two output files, the CSV sample and the set of ELBO iterations. In this case, the default output file name is “output.csv” and the default diagnostic file name is “output.json”.

    +

    - Back to top - + Back to top

    References

    +
    +Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49. http://jmlr.org/papers/v23/21-0889.html. +
    +
    diff --git a/docs/2_34/cmdstan-guide/pathfinder_intro.html b/docs/2_34/cmdstan-guide/pathfinder_intro.html deleted file mode 100644 index 92918fd7a..000000000 --- a/docs/2_34/cmdstan-guide/pathfinder_intro.html +++ /dev/null @@ -1,965 +0,0 @@ - - - - - - - - - -Pathfinder for Variational Inference - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    Variational Inference using Pathfinder

    -

    The CmdStan method pathfinder uses the Pathfinder algorithm of Zhang et al. (2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the L-BFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.

    -

    Pathfinder differs from the ADVI method in that it uses quasi-Newton optimization on the log posterior instead of stochastic gradient descent (SGD) on the Monte Carlo computation of the evidence lower bound (ELBO). Pathfinder’s approach is both faster and more stable than that of ADVI. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.

    -

    A single run of the Pathfinder algorithm generates a set of approximate draws. Inference is improved by running multiple Pathfinder instances and using Pareto-smoothed importance resampling (PSIS) of the resulting sets of draws. This better matches non-normal target densities and also eliminates minor modes. By default, the pathfinder method uses 4 independent Pathfinder runs, each of which produces 1000 approximate draws, which are then importance resampled down to 1000 final draws.

    -

    The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.

    -
    > ./bernoulli pathfinder data file=bernoulli.data.R
    -

    Executing this command prints both output to the console and csv files.

    -

    The first part of the console output reports on the configuration used.

    -
    method = pathfinder
    -  pathfinder
    -    init_alpha = 0.001 (Default)
    -    tol_obj = 9.9999999999999998e-13 (Default)
    -    tol_rel_obj = 10000 (Default)
    -    tol_grad = 1e-08 (Default)
    -    tol_rel_grad = 10000000 (Default)
    -    tol_param = 1e-08 (Default)
    -    history_size = 5 (Default)
    -    num_psis_draws = 1000 (Default)
    -    num_paths = 4 (Default)
    -    psis_resample = 1 (Default)
    -    calculate_lp = 1 (Default)
    -    save_single_paths = 0 (Default)
    -    max_lbfgs_iters = 1000 (Default)
    -    num_draws = 1000 (Default)
    -    num_elbo_draws = 25 (Default)
    -id = 1 (Default)
    -data
    -  file = examples/bernoulli/bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 1995513073 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -  sig_figs = -1 (Default)
    -  profile_file = profile.csv (Default)
    -num_threads = 1 (Default)
    -

    The rest of the output describes the progression of the algorithm.

    -

    By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, the uses importance resampling on the set of returned draws to produce the specified number of draws.

    -
    Path [1] :Initial log joint density = -11.543343
    -Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      1.070e-03   1.707e-05    1.000e+00  1.000e+00       126 -6.220e+00 -6.220e+00
    -Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
    -Path [2] :Initial log joint density = -7.443345
    -Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      9.936e-05   3.738e-07    1.000e+00  1.000e+00       126 -6.164e+00 -6.164e+00
    -Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
    -Path [3] :Initial log joint density = -18.986308
    -Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      2.996e-04   4.018e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00
    -Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
    -Path [4] :Initial log joint density = -8.304453
    -Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      2.814e-04   2.034e-06    1.000e+00  1.000e+00       126 -6.221e+00 -6.221e+00
    -Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
    -Total log probability function evaluations:8404
    -

    Pathfinder outputs a StanCSV file file which contains the importance resampled draws from multi-path Pathfinder. The initial CSV comment rows contain the complete set of CmdStan configuration options. Next is the column header line, followed the set of approximate draws. The Pathfinder algorithm first outputs lp_approx__, the log density in the approximating distribution, and lp__, the log density in the target distribution, followed by estimates of the model parameters, transformed parameters, and generated quantities.

    -
    lp_approx__,lp__,theta
    --2.4973, -8.2951, 0.0811852
    --0.87445, -7.06526, 0.160207
    --0.812285, -7.07124, 0.35819
    -...
    -

    The final lines are comment lines which give timing information.

    -
    # Elapsed Time: 0.016000 seconds (Pathfinders)
    -#               0.003000 seconds (PSIS)
    -#               0.019000 seconds (Total)
    -

    Pathfinder provides option save_single_paths which will save output from the single-path Pathfinder runs. See section Pathfinder Method for details.

    - - - -
    - - Back to top

    References

    -
    -Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49. http://jmlr.org/papers/v23/21-0889.html. -
    -
    - - - -
    - - - - - \ No newline at end of file diff --git a/docs/2_34/cmdstan-guide/print.html b/docs/2_34/cmdstan-guide/print.html index 59a934c48..f8b75a86c 100644 --- a/docs/2_34/cmdstan-guide/print.html +++ b/docs/2_34/cmdstan-guide/print.html @@ -38,10 +38,12 @@ - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -448,25 +421,6 @@

    print (deprecated): MCMC Output Analysis

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
    - - @@ -704,25 +677,6 @@

    BNF grammar for Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
    - - @@ -570,7 +543,7 @@

    Sampler Stan # refresh = 100 (Default)

    Note that when running multi-threaded programs which use reduce_sum for high-level parallelization, the number of threads used will also be included in this initial comment header.

    Column headers

    -

    The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. The sampler parameters are described in detail in the output file section of the Quickstart Guide chapter on MCMC Sampling. The example model bernoulli.stan only contains one parameter theta, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter:

    +

    The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. The sampler parameters are described in detail in the output file section of the chapter on MCMC Sampling. The example model bernoulli.stan only contains one parameter theta, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter:

    lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta

    As a second example, we show the output of the eight_schools.stan model on run on example dataset. This model has 3 parameters: mu, theta a vector whose length is dependent on the input data, here N = 8, and tau. The initial columns are for the 7 sampler parameters, as before. The column headers for the model parameters are:

    mu,theta.1,theta.2,theta.3,theta.4,theta.5,theta.6,theta.7,theta.8,tau
    @@ -693,25 +666,6 @@

    Diagnose method ou Back to top - - + - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -478,25 +451,6 @@

    The Stan compile Back to top -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    Variational Inference using ADVI

    -

    The CmdStan method variational uses the Automatic Differentiation Variational Inference (ADVI) algorithm of Kucukelbir et al. (2017) to provide an approximate posterior distribution of the model conditioned on the data. The approximating distribution it uses is a Gaussian in the unconstrained variable space, either a fully factorized Gaussian approximation, specified by argument algorithm=meanfield option, or a Gaussian approximation using a full-rank covariance matrix, specified by argument algorithm=fullrank. By default, ADVI uses option algorithm=meanfield.

    -

    The following is a minimal call to Stan’s variational inference algorithm using defaults for everything but the location of the data file.

    -
    > ./bernoulli variational data file=bernoulli.data.R
    -

    Executing this command prints both output to the console and to a csv file.

    -

    The first part of the console output reports on the configuration used: the default option algorithm=meanfield and the default tolerances for monitoring the algorithm’s convergence.

    -
    method = variational
    -  variational
    -    algorithm = meanfield (Default)
    -      meanfield
    -    iter = 10000 (Default)
    -    grad_samples = 1 (Default)
    -    elbo_samples = 100 (Default)
    -    eta = 1 (Default)
    -    adapt
    -      engaged = 1 (Default)
    -      iter = 50 (Default)
    -    tol_rel_obj = 0.01 (Default)
    -    eval_elbo = 100 (Default)
    -    output_samples = 1000 (Default)
    -id = 0 (Default)
    -data
    -  file = bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 3323783840 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -

    After the configuration has been displayed, informational and timing messages are output:

    -
    ------------------------------------------------------------
    -EXPERIMENTAL ALGORITHM:
    -  This procedure has not been thoroughly tested and may be unstable
    -  or buggy. The interface is subject to change.
    -------------------------------------------------------------
    -
    -Gradient evaluation took 2.1e-05 seconds
    -1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
    -Adjust your expectations accordingly!
    -

    The rest of the output describes the progression of the algorithm. An adaptation phase finds a good value for the step size scaling parameter eta. The evidence lower bound (ELBO) is the variational objective function and is evaluated based on a Monte Carlo estimate. The variational inference algorithm in Stan is stochastic, which makes it challenging to assess convergence. That is, while the algorithm appears to have converged in \(\sim\) 250 iterations, the algorithm runs for another few thousand iterations until mean change in ELBO drops below the default tolerance of 0.01.

    -
    Begin eta adaptation.
    -Iteration:   1 / 250 [  0%]  (Adaptation)
    -Iteration:  50 / 250 [ 20%]  (Adaptation)
    -Iteration: 100 / 250 [ 40%]  (Adaptation)
    -Iteration: 150 / 250 [ 60%]  (Adaptation)
    -Iteration: 200 / 250 [ 80%]  (Adaptation)
    -Success! Found best value [eta = 1] earlier than expected.
    -
    -Begin stochastic gradient ascent.
    -  iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
    -   100           -6.131             1.000            1.000
    -   200           -6.458             0.525            1.000
    -   300           -6.300             0.359            0.051
    -   400           -6.137             0.276            0.051
    -   500           -6.243             0.224            0.027
    -   600           -6.305             0.188            0.027
    -   700           -6.289             0.162            0.025
    -   800           -6.402             0.144            0.025
    -   900           -6.103             0.133            0.025
    -  1000           -6.314             0.123            0.027
    -  1100           -6.348             0.024            0.025
    -  1200           -6.244             0.020            0.018
    -  1300           -6.293             0.019            0.017
    -  1400           -6.250             0.017            0.017
    -  1500           -6.241             0.015            0.010   MEDIAN ELBO CONVERGED
    -
    -Drawing a sample of size 1000 from the approximate posterior... 
    -COMPLETED.
    -

    The output from variational is written into the file output.csv by default. The output follows the same pattern as the output for sampling, first dumping the entire set of parameters used as CSV comments:

    -
    # stan_version_major = 2
    -# stan_version_minor = 23
    -# stan_version_patch = 0
    -# model = bernoulli_model
    -# method = variational
    -#   variational
    -#     algorithm = meanfield (Default)
    -#       meanfield
    -#     iter = 10000 (Default)
    -#     grad_samples = 1 (Default)
    -#     elbo_samples = 100 (Default)
    -#     eta = 1 (Default)
    -#     adapt
    -#       engaged = 1 (Default)
    -#       iter = 50 (Default)
    -#     tol_rel_obj = 0.01 (Default)
    -#     eval_elbo = 100 (Default)
    -#     output_samples = 1000 (Default)
    -...
    -

    Next is the column header line, followed more CSV comments reporting the adapted value for the stepsize, followed by the values. The first line is special: it is the mean of the variational approximation. The rest of the output contains output_samples number of samples drawn from the variational approximation.

    -
    lp__,log_p__,log_g__,theta
    -# Stepsize adaptation complete.
    -# eta = 1
    -0,0,0,0.236261
    -0,-6.82318,-0.0929121,0.300415
    -0,-6.89701,-0.158687,0.321982
    -0,-6.99391,-0.23916,0.343643
    -0,-7.35801,-0.51787,0.401554
    -0,-7.4668,-0.539473,0.123081
    -...
    -

    The header indicates the unnormalized log probability with lp__. This is a legacy feature that we do not use for variational inference. The ELBO is not stored unless a diagnostic option is given.

    - - - -
    - - Back to top

    References

    -
    -Kucukelbir, Alp, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. 2017. “Automatic Differentiation Variational Inference.” Journal of Machine Learning Research. -
    -
    - - - -
    - - - - - \ No newline at end of file diff --git a/docs/2_34/functions-reference-2_34.pdf b/docs/2_34/functions-reference-2_34.pdf index cdc45896f..c896b4198 100644 Binary files a/docs/2_34/functions-reference-2_34.pdf and b/docs/2_34/functions-reference-2_34.pdf differ diff --git a/docs/2_34/functions-reference/array_operations.html b/docs/2_34/functions-reference/array_operations.html index 95ce8ae37..a822c158b 100644 --- a/docs/2_34/functions-reference/array_operations.html +++ b/docs/2_34/functions-reference/array_operations.html @@ -72,10 +72,12 @@ - + + - + + - + @@ -248,6 +250,7 @@
    + - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -703,25 +706,6 @@

    Stan Functions

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -569,25 +572,6 @@

    Stan functions

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -1140,25 +1143,6 @@

    Stan functions

    Back to top - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -620,25 +623,6 @@

    Numerical stability Back to top - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -916,25 +919,6 @@

    Complex hyperbolic trigonom Back to top - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -1479,25 +1482,6 @@

    Rev Back to top - - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -549,25 +552,6 @@

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -620,25 +623,6 @@

    Stan functions

    Back to top - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -751,25 +754,6 @@

    Return type

    Back to top - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -684,25 +687,6 @@

    Stan functions

    Back to top - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -562,7 +565,7 @@

    On this page

    Deprecated Functions

    This appendix lists currently deprecated functionality along with how to replace it.

    -

    Starting in Stan 2.29, deprecated functions with drop in replacements (such as the renaming of get_lp or multiply_log) will be removed 3 versions later e.g., functions deprecated in Stan 2.20 will be removed in Stan 2.23 and placed in [Removed Functions]. The Stan compiler can automatically update these on the behalf of the user for the entire deprecation window and at least one version following the removal.

    +

    Starting in Stan 2.29, deprecated functions with drop in replacements (such as the renaming of get_lp or multiply_log) will be removed 3 versions later e.g., functions deprecated in Stan 2.20 will be removed in Stan 2.23 and placed in Removed Functions. The Stan compiler can automatically update these on the behalf of the user for the entire deprecation window and at least one version following the removal.

    Integer division with operator/

    Deprecated: Using / with two integer arguments is interpreted as integer floor division, such that

    @@ -729,25 +732,6 @@

    Sizes and para

    Back to top - - + + - + + - + @@ -234,6 +236,7 @@
    - - + + - + + - + @@ -184,6 +186,7 @@
    +
    @@ -3773,25 +3776,6 @@

    Z

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -550,25 +553,6 @@

    Stan functions

    Back to top - - + + - + + - + @@ -268,6 +270,7 @@
    - - + + - + + - + @@ -184,6 +186,7 @@
    +
    @@ -491,7 +494,7 @@

    Stan Functions Reference

    -

    This is the reference for the functions defined in the Stan math library and available in the Stan programming language.

    +

    This is the reference for the functions defined in the Stan math library and available in the Stan programming language.

    For more information the Stan language and inference engines and how to use Stan for Bayesian inference, see

    • the Stan User’s Guide. The Stan user’s guide provides example models and programming techniques for coding statistical models in Stan. It also serves as an example-driven introduction to Bayesian modeling and inference:

    • @@ -517,25 +520,6 @@

      Licensing

      Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -633,7 +636,7 @@

      Casting functions

      int to_int(data real x)

      Return the value x truncated to an integer. This will throw an error if the value of x is too big to represent as a 32-bit signed integer.

      -

      This is similar to trunc (see [Rounding functions]) but the return type is of type int. For example, to_int(3.9) is 3, and to_int(-3.9) is -3.

      +

      This is similar to trunc (see Rounding functions) but the return type is of type int. For example, to_int(3.9) is 3, and to_int(-3.9) is -3.

      Available since 2.31

      @@ -646,25 +649,6 @@

      Casting functions

      Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -550,25 +553,6 @@

      Digamma

      Back to top - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -750,25 +753,6 @@

      Mixed Operations

      Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -645,25 +648,6 @@

      Stan functions

      Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -1070,25 +1073,6 @@

      Stan functions

      Back to top - - + + - + + - + @@ -234,6 +236,7 @@
      - - + + - + + - + @@ -268,6 +270,7 @@
      - - + + - + + - + @@ -183,6 +185,7 @@
      +
      @@ -218,25 +221,6 @@

      Re Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -581,25 +584,6 @@

      Back to top - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -628,25 +631,6 @@

      Stan functions

      Back to top - - + + - + + - + @@ -214,6 +216,7 @@
      +
      @@ -589,25 +592,6 @@

      Sparse matrix Back to top - - + + - + + - + @@ -234,6 +236,7 @@

      - - + + - + + - + @@ -234,6 +236,7 @@
      - - + + - + + - + @@ -185,6 +187,7 @@
      +
      @@ -511,25 +514,6 @@

      Reject statement

      Back to top - - + + - + + - + @@ -184,6 +186,7 @@
      +
      @@ -230,7 +233,7 @@

      Stan Documentation

      -

      This is the official documentation for Stan.

      +

      This is the official documentation for Stan.

      • The Stan User’s Guide (pdf) provides example models and programming techniques for coding statistical models in Stan.

      • The Stan Reference Manual (pdf) specifies the Stan programming language and inference algorithms.

      • @@ -256,25 +259,6 @@

        Licensing

        Back to top - - + + - + + - + @@ -234,6 +236,7 @@
        + - - + + - + + - + @@ -268,6 +270,7 @@
        + -

        In this example, y[N] is a modeled data vector. Although it is specified in the data block, and thus must have a known value before the program may be run, it is modeled as if it were generated randomly as described by the model.

        +

        In this example, y is an array of modeled data. Although it is specified in the data block, and thus must have a known value before the program may be run, it is modeled as if it were generated randomly as described by the model.

        The variable N is a typical example of unmodeled data. It is used to indicate a size that is not part of the model itself.

        The other variables declared in the data and transformed data block are examples of unmodeled parameters, also known as hyperparameters. Unmodeled parameters are parameters to probability densities that are not themselves modeled probabilistically. In Stan, unmodeled parameters that appear in the data block may be specified on a per-model execution basis as part of the data read. In the above model, mu_mu and sigma_mu are configurable unmodeled parameters.

        Unmodeled parameters that are hard coded in the model must be declared in the transformed data block. For example, the unmodeled parameters alpha and beta are both hard coded to the value 0.1. To allow such variables to be configurable based on data supplied to the program at run time, they must be declared in the data block, like the variables mu_mu and sigma_mu.

        @@ -934,25 +937,6 @@

        Program
      • With multiple threads, or even running chains sequentially in a single thread, data could be read only once per set of chains. Stan was designed to be thread safe and future versions will provide a multithreading option for Markov chains.↩︎

      • - - + + - + + - + @@ -219,6 +221,7 @@
        +
        @@ -469,25 +472,6 @@

        Bracketed comments

        Back to top - - + + - + + - + @@ -219,6 +221,7 @@
        +
        @@ -452,7 +455,7 @@

        On this page

        Deprecated Features

        This appendix lists currently deprecated functionality along with how to replace it.

        -

        Starting with Stan 2.29, minor (syntax-level) deprecations can be removed 3 versions after release; e.g., syntax deprecated in Stan 2.20 will be removed in Stan 2.23 and placed in [Removed Features]. The Stan compiler can automatically update many of these on the behalf of the user for at least one version after they are removed.

        +

        Starting with Stan 2.29, minor (syntax-level) deprecations can be removed 3 versions after release; e.g., syntax deprecated in Stan 2.20 will be removed in Stan 2.23 and placed in Removed Features. The Stan compiler can automatically update many of these on the behalf of the user for at least one version after they are removed.

        Any feature which changes semantic meaning (such as the upgraded ODE solver interface) will not be removed until a major version change (e.g., Stan 3.0).

        lkj_cov distribution

        @@ -482,25 +485,6 @@

        Deprecated Functions<

        Back to top - - + + - + + - + @@ -214,6 +216,7 @@

        Includes Jacobian

        -

        The log density includes the Jacobian adjustment implied by the constraints declared on variables. The Jacobian adjustment for constrained parameter transforms will be turned off if optimization is used in practice, but there is as of yet no way to turn it off in diagnostic mode.

        +

        The log density includes the Jacobian adjustment implied by the constraints declared on variables. The Jacobian adjustment for constrained parameter transforms may be turned off for optimization, but there is as of yet no way to turn it off in diagnostic mode.

        @@ -510,25 +513,6 @@

        Speed warn

        Back to top - - + + - + + - + @@ -185,6 +187,7 @@
        +
        @@ -444,25 +447,6 @@

        String literals

        Back to top - - + + - + + - + @@ -268,6 +270,7 @@
        + - - + + - + + - + @@ -268,6 +270,7 @@
        + - - + + - + + - + @@ -219,6 +221,7 @@
        +
        @@ -501,11 +504,21 @@

        Comment

        Recursive includes

        -

        Recursive includes will be ignored. For example, suppose a.stan contains

        +

        Recursive includes will lead to a compiler error. For example, suppose a.stan contains

        #include b.stan

        and b.stan contains

        #include a.stan
        -

        The result of processing this file will be empty, because a.stan will include b.stan, from which the include of a.stan is ignored and a warning printed.

        +

        This will result in an error explaining the circular dependency:

        +
        Syntax error in './b.stan', line 1, column 0, included from
        +'./a.stan', line 1, column 0, included from
        +'./b.stan', line 1, column 0, included from
        +'a.stan', line 1, column 0, include error:
        +   -------------------------------------------------
        +     1:  #include a.stan
        +         ^
        +   -------------------------------------------------
        +
        +File a.stan recursively included itself.

        Include paths

        @@ -520,25 +533,6 @@

        Slashe

        Back to top - - + + - + + - + @@ -184,6 +186,7 @@
        +
        @@ -421,7 +424,7 @@

        Stan Reference Manual

        -

        This is the official reference manual for Stan’s programming language for coding probability models, inference algorithms for fitting models and making predictions, and posterior analysis tools for evaluating the results. This manual applies to all Stan interfaces.

        +

        This is the official reference manual for Stan’s programming language for coding probability models, inference algorithms for fitting models and making predictions, and posterior analysis tools for evaluating the results. This manual applies to all Stan interfaces.

        The first part of the reference manual provides a full specification of the Stan programming language. The language is responsible for defining a log density function conditioned on data. Typically, this is a Bayesian posterior, but it may also be a penalized likelihood function. The second part of the manual specifies the inference algorithms and posterior inference tools. The third part provides auxiliary information about the use of Stan.

        Download the pdf version of this manual.

        Back to top - - + + - + + - + @@ -214,6 +216,7 @@
        +
        @@ -456,25 +459,6 @@

        Laplace Approximation

        Back to top - - + + - + + - + @@ -184,6 +186,7 @@

    @@ -454,6 +458,14 @@

    SUNDIALS license

    The copyright of SUNDIALS is owned by Lawrence Livermore National Security Lab.

    +
    +

    Threaded Building Blocks (TBB) License

    +

    Stan uses the Threaded Building Blocks (TBB) library for parallel computations. TBB is distributed under the

    + +

    The copyright of TBB is owned by Intel Corporation.

    +

    Google test license

    Stan uses Google Test for unit testing; it is not required to compile or execute models. Google Test is distributed under the

    @@ -473,25 +485,6 @@

    Google test license

    Universities or companies often own the copyright of computer programs developed by their employees.↩︎

    - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -268,6 +270,7 @@
    +
    @@ -591,25 +594,6 @@

    - -// replace cmd keyboard shortcut w/ control on non-Mac platforms -const kPlatformMac = typeof navigator !== 'undefined' ? /Mac/.test(navigator.platform) : false; -if (!kPlatformMac) { - var kbds = document.querySelectorAll("kbd") - kbds.forEach(function(kbd) { - kbd.innerHTML = kbd.innerHTML.replace(/⌘/g, '⌃'); - }); -} - -// tweak headings in pymd -document.querySelectorAll(".pymd span.co").forEach(el => { - if (!el.innerText.startsWith("#|")) { - el.style.fontWeight = 1000; - } -}); - - - + + - + + - + @@ -205,6 +207,7 @@

    - - + + - + + - + @@ -183,6 +185,7 @@
    +
    @@ -218,25 +221,6 @@

    Re Back to top - - + + - + + - + @@ -219,6 +221,7 @@
    +
    @@ -581,25 +584,6 @@

    Real values in Back to top - - + + - + + - + @@ -205,6 +207,7 @@

    - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -562,10 +565,11 @@

    Promotion

    Lvalue summary

    -

    The expressions that are legal left-hand sides of assignment statements are known as “lvalues.” In Stan, there are only two kinds of legal lvalues,

    +

    The expressions that are legal left-hand sides of assignment statements are known as “lvalues.” In Stan, there are three kinds of legal lvalues,

    • a variable, or
    • -
    • a variable with one or more indices.
    • +
    • a variable with one or more indices, or
    • +
    • a comma separated list of lvalues surrounded by ( and )

    To be used as an lvalue, an indexed variable must have at least as many dimensions as the number of indices provided. An array of real or integer types has as many dimensions as it is declared for. A matrix has two dimensions and a vector or row vector one dimension; this also holds for the constrained types, covariance and correlation matrices and their Cholesky factors and ordered, positive ordered, and simplex vectors. An array of matrices has two more dimensions than the array and an array of vectors or row vectors has one more dimension than the array. Note that the number of indices can be less than the number of dimensions of the variable, meaning that the right hand side must itself be multidimensional to match the remaining dimensions.

    @@ -1061,21 +1065,20 @@

    Loc

    No constraints on local variables

    -

    Local variables may not have constraints on their declaration. The only types that may be used are

    -
    int, real, vector[K], row_vector[K], matrix[M, N].
    +

    Local variables may not have constraints on their declaration. The only types that may be used are listed in the types table under “local”.

    Blocks within blocks

    A block is itself a statement, so anywhere a sequence of statements is allowed, one or more of the statements may be a block. For instance, in a for loop, it is legal to have the following

    -
    for (m in 1:M) {
    -  {
    -     int n = 2 * m;
    -     sum += n;
    -  }
    -  for (n in 1:N) {
    -    sum += x[m, n];
    -  }
    -}
    +
    for (m in 1:M) {
    +  {
    +     int n = 2 * m;
    +     sum += n;
    +  }
    +  for (n in 1:N) {
    +    sum += x[m, n];
    +  }
    +}

    The variable declaration int n; is the first element of an embedded block and so has scope within that block. The for loop defines its own local block implicitly over the statement following it in which the loop variable is defined. As far as Stan is concerned, these two uses of n are unrelated.

    @@ -1086,56 +1089,56 @@

    Break and contin

    Break statements

    When a break statement is executed, the most deeply nested loop currently being executed is ended and execution picks up with the next statement after the loop. For example, consider the following program:

    -
    while (1) {
    -  if (n < 0) {
    -    break;
    -  }
    -  foo(n);
    -  n = n - 1;
    -}
    +
    while (1) {
    +  if (n < 0) {
    +    break;
    +  }
    +  foo(n);
    +  n = n - 1;
    +}

    The while~(1) loop is a “forever” loop, because 1 is the true value, so the test always succeeds. Within the loop, if the value of n is less than 0, the loop terminates, otherwise it executes foo(n) and then decrements n. The statement above does exactly the same thing as

    -
    while (n >= 0) {
    -  foo(n);
    -  n = n - 1;
    -}
    +
    while (n >= 0) {
    +  foo(n);
    +  n = n - 1;
    +}

    This case is simply illustrative of the behavior; it is not a case where a break simplifies the loop.

    Continue statements

    The continue statement ends the current operation of the loop and returns to the condition at the top of the loop. Such loops are typically used to exclude some values from calculations. For example, we could use the following loop to sum the positive values in the array x,

    +
    real sum;
    +sum = 0;
    +for (n in 1:size(x)) {
    +  if (x[n] <= 0) {
    +    continue;
    +  }
    +  sum += x[n];
    +}
    +

    When the continue statement is executed, control jumps back to the conditional part of the loop. With while and for loops, this causes control to return to the conditional of the loop. With for loops, this advances the loop variable, so the the above program will not go into an infinite loop when faced with an x[n] less than zero. Thus the above program could be rewritten with deeper nesting by reversing the conditional,

    real sum;
     sum = 0;
     for (n in 1:size(x)) {
    -  if (x[n] <= 0) {
    -    continue;
    +  if (x[n] > 0) {
    +    sum += x[n];
       }
    -  sum += x[n];
    -}
    -

    When the continue statement is executed, control jumps back to the conditional part of the loop. With while and for loops, this causes control to return to the conditional of the loop. With for loops, this advances the loop variable, so the the above program will not go into an infinite loop when faced with an x[n] less than zero. Thus the above program could be rewritten with deeper nesting by reversing the conditional,

    -
    real sum;
    -sum = 0;
    -for (n in 1:size(x)) {
    -  if (x[n] > 0) {
    -    sum += x[n];
    -  }
    -}
    +}

    While the latter form may seem more readable in this simple case, the former has the main line of execution nested one level less deep. Instead, the conditional at the top finds cases to exclude and doesn’t require the same level of nesting for code that’s not excluded. When there are several such exclusion conditions, the break or continue versions tend to be much easier to read.

    Breaking and continuing nested loops

    If there is a loop nested within a loop, a break or continue statement only breaks out of the inner loop. So

    -
    while (cond1) {
    -  // ...
    -  while (cond2) {
    -    // ...
    -    if (cond3) {
    -      break;
    -    }
    -    // ...
    -  }
    -  // execution continues here after break
    -  // ...
    -}
    +
    while (cond1) {
    +  // ...
    +  while (cond2) {
    +    // ...
    +    if (cond3) {
    +      break;
    +    }
    +    // ...
    +  }
    +  // execution continues here after break
    +  // ...
    +}

    If the break is triggered by cond3 being true, execution will continue after the nested loop.

    As with break statements, continue statements go back to the top of the most deeply nested loop in which the continue appears.

    Although break and continue must appear within loops, they may appear in nested statements within loops, such as within the conditionals shown above or within nested statements. The break and continue statements jump past any control structure other than while-loops and for-loops.

    @@ -1144,32 +1147,32 @@

    Print statements

    Stan provides print statements that can print literal strings and the values of expressions. Print statements accept any number of arguments. Consider the following for-each statement with a print statement in its body.

    -
    for (n in 1:N) { print("loop iteration: ", n); ... }
    +
    for (n in 1:N) { print("loop iteration: ", n); ... }

    The print statement will execute every time the body of the loop does. Each time the loop body is executed, it will print the string “loop iteration:” (with the trailing space), followed by the value of the expression n, followed by a new line.

    Non-void input

    @@ -1188,12 +1191,13 @@

    String literals

    Debug by print

    Because Stan is an imperative language, print statements can be very useful for debugging. They can be used to display the values of variables or expressions at various points in the execution of a program. They are particularly useful for spotting problematic not-a-number of infinite values, both of which will be printed.

    It is particularly useful to print the value of the target log density accumulator (through the target() function), as in the following example.

    -
    vector[2] y;
    -y[1] = 1;
    -print("log density before =", target());
    -y ~ normal(0,1);  // bug!  y[2] not defined
    -print("log density after =", target());
    +
    vector[2] y;
    +y[1] = 1;
    +print("log density before =", target());
    +y ~ normal(0,1);  // bug!  y[2] not defined
    +print("log density after =", target());

    The example has a bug in that y[2] is not defined before the vector y is used in the sampling statement. By printing the value of the log probability accumulator before and after each sampling statement, it’s possible to isolate where the log probability becomes ill-defined (i.e., becomes not-a-number).

    +

    Note that print statements may not always be displayed immediately, but rather at the end of an operation (e.g., leapfrog step). As such, some issues such as infinite loops are difficult to debug effectively with this technique.

    @@ -1201,9 +1205,9 @@

    Reject statement

    The Stan reject statement provides a mechanism to report errors or problematic values encountered during program execution and either halt processing or reject iterations.

    Like the print statement, the reject statement accepts any number of quoted string literals or Stan expressions as arguments.

    Reject statements are typically embedded in a conditional statement in order to detect variables in illegal states. For example, the following code handles the case where a variable x’s value is negative.

    -
    if (x < 0) {
    -  reject("x must not be negative; found x=", x);
    -}
    +
    if (x < 0) {
    +  reject("x must not be negative; found x=", x);
    +}

    Behavior of reject statements

    Reject statements have the same behavior as exceptions thrown by built-in Stan functions. For example, the normal_lpdf function raises an exception if the input scale is not positive and finite. The effect of a reject statement depends on the program block in which the rejection occurs.

    @@ -1226,26 +1230,26 @@

    Rejection is not for constraints

    Rejection should be used for error handling, not defining arbitrary constraints. Consider the following errorful Stan program.

    -
    parameters {
    -  real a;
    -  real<lower=a> b;
    -  real<lower=a, upper=b> theta;
    -  // ...
    -}
    -model {
    -  // **wrong** needs explicit truncation
    -  theta ~ normal(0, 1);
    -  // ...
    -}
    +
    parameters {
    +  real a;
    +  real<lower=a> b;
    +  real<lower=a, upper=b> theta;
    +  // ...
    +}
    +model {
    +  // **wrong** needs explicit truncation
    +  theta ~ normal(0, 1);
    +  // ...
    +}

    This program is wrong because its truncation bounds on theta depend on parameters, and thus need to be accounted for using an explicit truncation on the distribution. This is the right way to do it.

    -
      theta ~ normal(0, 1) T[a, b];
    +
      theta ~ normal(0, 1) T[a, b];

    The conceptual issue is that the prior does not integrate to one over the admissible parameter space; it integrates to one over all real numbers and integrates to something less than one over \([a ,b]\); in these simple univariate cases, we can overcome that with the T[ , ] notation, which essentially divides by whatever the prior integrates to over \([a, b]\).

    This problem is exactly the same problem as you would get using reject statements to enforce complicated inequalities on multivariate functions. In this case, it is wrong to try to deal with truncation through constraints.

    -
      if (theta < a || theta > b) {
    -    reject("theta not in (a, b)");
    -  }
    -  // still **wrong**, needs T[a,b]
    -  theta ~ normal(0, 1);
    +
      if (theta < a || theta > b) {
    +    reject("theta not in (a, b)");
    +  }
    +  // still **wrong**, needs T[a,b]
    +  theta ~ normal(0, 1);

    In this case, the prior integrates to something less than one over the region of the parameter space where the complicated inequalities are satisfied. But we don’t generally know what value the prior integrates to, so we can’t increment the log probability function to compensate.

    Even if this adjustment to a proper probability model may seem minor in particular models where the amount of truncated posterior density is negligible or constant, we can’t sample from that truncated posterior efficiently. Programs need to use one-to-one mappings that guarantee the constraints are satisfied and only use reject statements to raise errors or help with debugging.

    @@ -1265,25 +1269,6 @@

    The adjoint component is always zero during execution for the algorithmic differentiation variables used to implement parameters, transformed parameters, and local variables in the model.↩︎

    - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -842,7 +845,7 @@

    Pro

    Indexes

    Standalone expressions used as indexes must denote either an integer (int) or an integer array (array[] int). Expressions participating in range indexes (e.g., a and b in a : b) must denote integers (int).

    -

    A second condition is that there not be more indexes provided than dimensions of the underlying expression (in general) or variable (on the left side of assignments) being indexed. A vector or row vector adds 1 to the array dimension and a matrix adds 2. That is, the type matrix[ , , ], a three-dimensional array of matrices, has five index positions: three for the array, one for the row of the matrix and one for the column.

    +

    A second condition is that there not be more indexes provided than dimensions of the underlying expression (in general) or variable (on the left side of assignments) being indexed. A vector or row vector adds 1 to the array dimension and a matrix adds 2. That is, the type array[ , , ] matrix, a three-dimensional array of matrices, has five index positions: three for the array, one for the row of the matrix and one for the column.

    @@ -850,25 +853,6 @@

    Indexes

    Back to top - - + + - + + - + @@ -234,6 +236,7 @@
    + - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -600,7 +603,7 @@

    Array types

    array[6, 7] matrix[3, 3] m; array[12, 8, 15] complex z;

    declares x to be a one-dimensional array of size 10 containing real values, declares m to be a two-dimensional array of size \(6 \times 7\) containing values that are \(3 \times 3\) matrices, and declares z to be a \(12 \times 8 \times 15\) array of complex numbers.

    -

    Prior to 2.26 Stan models used a different syntax which has since been removed. See the [Removed Features] chapter for more details.

    +

    Prior to 2.26 Stan models used a different syntax which has since been removed. See the Removed Features chapter for more details.

    Tuple types

    @@ -630,8 +633,8 @@

    Primitive numerical d

    Integers

    Stan uses 32-bit (4-byte) integers for all of its integer representations. The maximum value that can be represented as an integer is \(2^{31}-1\); the minimum value is \(-(2^{31})\).

    -

    When integers overflow, their values wrap. Thus it is up to the Stan programmer to make sure the integer values in their programs stay in range. In particular, every intermediate expression must have an integer value that is in range.

    -

    Integer arithmetic works in the expected way for addition, subtraction, and multiplication, but rounds the result of division (see the Stan Functions Reference integer-valued arithmetic operators section for more information).

    +

    When integers overflow, their value is determined by the underlying architecture. On most, their values wrap, but this cannot be guaranteed. Thus it is up to the Stan programmer to make sure the integer values in their programs stay in range. In particular, every intermediate expression must have an integer value that is in range.

    +

    Integer arithmetic works in the expected way for addition, subtraction, and multiplication, but truncates the result of division (see the Stan Functions Reference integer-valued arithmetic operators section for more information).

    Reals

    @@ -721,7 +724,7 @@

    Constrained re
    real<lower=-1, upper=1> rho;

    Infinite constraints

    -

    Lower bounds that are negative infinity or upper bounds that are positive infinity are ignored. Stan provides constants positive_infinity() and negative_infinity() which may be used for this purpose, or they may be read as data in the dump format.

    +

    Lower bounds that are negative infinity or upper bounds that are positive infinity are ignored. Stan provides constants positive_infinity() and negative_infinity() which may be used for this purpose, or they may be supplied as data.

    @@ -1621,25 +1624,6 @@

    C
  • Stan compiles integers to int and reals to double types in C++. Precise details of rounding will depend on the compiler and hardware architecture on which the code is run.↩︎

  • - - + + - + + - + @@ -219,6 +221,7 @@
    +
    @@ -657,6 +660,7 @@

    Retur
  • a for loop or while loop qualifies if its body qualifies, and
  • a conditional statement qualifies if it has a default else clause and all of its body statements qualify.
  • +

    An exception is made for “obviously infinite” loops like while (1), which contain a return statement and no break statements. The only way to exit such a loop is to return, so they are considered as returning statements.

    These rules disqualify

    real foo(real x) {
       if (x > 2) {
    @@ -727,25 +731,6 @@ 

    Declarations<
  • Despite being declared constant and appearing to have a pass-by-value syntax in Stan, the implementation of the language passes function arguments by constant reference in C++.↩︎

  • - - + + - + + - + @@ -234,6 +236,7 @@

    - - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -437,25 +440,6 @@

    Whitespace location Back to top - - + + - + + - + @@ -268,6 +270,7 @@

    - - + + - + + - + @@ -268,6 +270,7 @@
    - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -716,25 +719,6 @@

    Depende Back to top - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -248,6 +250,7 @@
    + - - + + - + + - + @@ -268,6 +270,7 @@
    - - + + - + + - + @@ -268,6 +270,7 @@
    - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -214,6 +216,7 @@
    + - - + + - + + - + @@ -267,6 +269,7 @@
    + - - + + - + + - + @@ -268,6 +270,7 @@
    + - - + + - + + - + @@ -268,6 +270,7 @@
    - - + + - + + - + @@ -184,6 +186,7 @@
    +
    @@ -516,7 +519,7 @@

    Stan User’s Guide

    -

    This is the official user’s guide for Stan. It provides example models and programming techniques for coding statistical models in Stan.

    +

    This is the official user’s guide for Stan. It provides example models and programming techniques for coding statistical models in Stan.

    • Part 1 gives Stan code and discussions for several important classes of models.

    • Part 2 discusses various general Stan programming techniques that are not tied to any particular model.

    • @@ -544,25 +547,6 @@

      Licensing

      Back to top - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -802,25 +805,6 @@

      Aliasing in St Back to top - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -826,25 +829,6 @@

      Mat Back to top - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -268,6 +270,7 @@
      - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -842,25 +845,6 @@

      Back to top - - + + - + + @@ -156,7 +158,7 @@ }; - + @@ -271,6 +273,7 @@
      +
      @@ -586,7 +589,7 @@

      On this page

      • Simulating from the posterior predictive distribution
      • Plotting multiples
      • -
      • Posterior ``p-values’’ +
      • Posterior ‘’p-values’’
      • @@ -679,7 +682,7 @@

        Plotting multiples

        Now consider generating data \(y \sim \textrm{Poisson}(5)\). The resulting small multiples plot shows the original data plotted in the upper left and eight different posterior replications plotted in the remaining boxes.

        -

        Posterior predictive checks for Poisson data generating process and Poisson model.

        +

        Posterior predictive checks for Poisson data generating process and Poisson model.

        Posterior predictive checks for Poisson data generating process and Poisson model.
        @@ -687,14 +690,14 @@

        Plotting multiples

        Now consider generating over-dispersed data \(y \sim \textrm{negative-binomial2}(5, 1).\) This has the same mean as \(\textrm{Poisson}(5)\), namely \(5\), but a standard deviation of \(\sqrt{5 + 5^2 /1} \approx 5.5.\) There is no way to fit this data with the Poisson model, because a variable distributed as \(\textrm{Poisson}(\lambda)\) has mean \(\lambda\) and standard deviation \(\sqrt{\lambda},\) which is \(\sqrt{5}\) for \(\textrm{Poisson}(5).\) Here’s the resulting small multiples plot, again with original data in the upper left.

        -

        Posterior predictive checks for negative binomial data generating process and Poisson model.

        +

        Posterior predictive checks for negative binomial data generating process and Poisson model.

        Posterior predictive checks for negative binomial data generating process and Poisson model.

        This time, the original data stands out in stark contrast to the replicated data sets, all of which are clearly more symmetric and lower variance than the original data. That is, the model’s not appropriately capturing the variance of the data.

        -

        Posterior ``p-values’’

        +

        Posterior ‘’p-values’’

        If a model captures the data well, summary statistics such as sample mean and standard deviation, should have similar values in the original and replicated data sets. This can be tested by means of a p-value-like statistic, which here is just the probability the test statistic \(s(\cdot)\) in a replicated data set exceeds that in the original data, \[ \Pr\!\left[ s(y^{\textrm{rep}}) \geq s(y) \mid y \right] = @@ -702,7 +705,7 @@

        Posterior ``p-values’ \textrm{I}\left( s(y^{\textrm{rep}}) \geq s(y) \mid y \right) \cdot p\left( y^{\textrm{rep}} \mid y \right) \, \textrm{d}{y^{\textrm{rep}}}. -\] It is important to note that``p-values’’ is in quotes because these statistics are not classically calibrated, and thus will not in general have a uniform distribution even when the model is well specified (Bayarri and Berger 2000).

        +\] It is important to note that ‘’p-values’’ is in quotes because these statistics are not classically calibrated, and thus will not in general have a uniform distribution even when the model is well specified (Bayarri and Berger 2000).

        Nevertheless, values of this statistic very close to zero or one are cause for concern that the model is not fitting the data well. Unlike a visual test, this p-value-like test is easily automated for bulk model fitting.

        To calculate event probabilities in Stan, it suffices to define indicator variables that take on value 1 if the event occurs and 0 if it does not. The posterior mean is then the event probability. For efficiency, indicator variables are defined in the generated quantities block.

        generated quantities {
        @@ -718,7 +721,7 @@ 

        Posterior ``p-values’

        For the example in the previous section, where over-dispersed data generated by a negative binomial distribution was fit with a simple Poisson model, the following plot illustrates the posterior p-value calculation for the mean statistic.

        -

        Histogram of means of replicated data sets; vertical red line at mean of original data.

        +

        Histogram of means of replicated data sets; vertical red line at mean of original data.

        Histogram of means of replicated data sets; vertical red line at mean of original data.
        @@ -726,7 +729,7 @@

        Posterior ``p-values’

        The standard deviation statistic tells a different story.

        -

        Scatterplot of standard deviations of replicated data sets; the vertical red line is at standard deviation of original data.

        +

        Scatterplot of standard deviations of replicated data sets; the vertical red line is at standard deviation of original data.

        Scatterplot of standard deviations of replicated data sets; the vertical red line is at standard deviation of original data.
        @@ -933,25 +936,6 @@

        - -// replace cmd keyboard shortcut w/ control on non-Mac platforms -const kPlatformMac = typeof navigator !== 'undefined' ? /Mac/.test(navigator.platform) : false; -if (!kPlatformMac) { - var kbds = document.querySelectorAll("kbd") - kbds.forEach(function(kbd) { - kbd.innerHTML = kbd.innerHTML.replace(/⌘/g, '⌃'); - }); -} - -// tweak headings in pymd -document.querySelectorAll(".pymd span.co").forEach(el => { - if (!el.innerText.startsWith("#|")) { - el.style.fontWeight = 1000; - } -}); - - - + + - + + - + @@ -268,6 +270,7 @@

      - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -665,25 +668,6 @@

      Back to top - - + + - + + - + @@ -183,6 +185,7 @@
      +
      @@ -218,25 +221,6 @@

      Re Back to top - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -268,6 +270,7 @@
      - - + + - + + - + @@ -268,6 +270,7 @@
      - - + + - + + - + @@ -248,6 +250,7 @@
      + - - + + - + + - + @@ -268,6 +270,7 @@
      +
      @@ -1030,25 +1033,6 @@

      - -// replace cmd keyboard shortcut w/ control on non-Mac platforms -const kPlatformMac = typeof navigator !== 'undefined' ? /Mac/.test(navigator.platform) : false; -if (!kPlatformMac) { - var kbds = document.querySelectorAll("kbd") - kbds.forEach(function(kbd) { - kbd.innerHTML = kbd.innerHTML.replace(/⌘/g, '⌃'); - }); -} - -// tweak headings in pymd -document.querySelectorAll(".pymd span.co").forEach(el => { - if (!el.innerText.startsWith("#|")) { - el.style.fontWeight = 1000; - } -}); - - - + + - + + - + @@ -268,6 +270,7 @@
      + - - + + - + + - + @@ -248,6 +250,7 @@
      +
      @@ -720,25 +723,6 @@

      Back to top - - + + - + + - + @@ -248,6 +250,7 @@
      + - - + + - + + - + @@ -219,6 +221,7 @@
      +
      @@ -943,7 +946,7 @@

      Nonlinear transformations

      -

      When a parameter is transformed in a non-linear fashion, an adjustment must be applied to account for distortion caused by the transform. This is discussed in depth in the [Changes of variables] section.

      +

      When a parameter is transformed in a non-linear fashion, an adjustment must be applied to account for distortion caused by the transform. This is discussed in depth in the Changes of variables section.

      This portion of pedantic mode tries to detect instances where such an adjustment would be necessary and remind the user.

      For example, consider the following program.

      parameters {
      @@ -979,7 +982,7 @@ 

      Automatic updating a

      These flags work for both .stan model files and .stanfunctions function files. They can be combined with --o to redirect the formatted output to a new file.

      Automatic formatting

      -

      Invoking stanc --auto-format <model_file> will print a version of your model which has been re-formatted. The goal is to have this automatic formatting stay as close as possible to the [Stan Program Style Guide]. This means spacing, indentation, and line length are all regularized. Some deprecated features, like the use of # for line comments, are replaced, but the goal is mainly to preserve the program while formatting it.

      +

      Invoking stanc --auto-format <model_file> will print a version of your model which has been re-formatted. The goal is to have this automatic formatting stay as close as possible to the Stan Program Style Guide. This means spacing, indentation, and line length are all regularized. Some deprecated features, like the use of # for line comments, are replaced, but the goal is mainly to preserve the program while formatting it.

      By default, this will try to split lines at or before column 78. This number can be changed using --max-line-length.

      @@ -1386,25 +1389,6 @@

      Static lo

      Back to top - - + + - + + - + @@ -217,6 +219,7 @@
      +
      @@ -266,25 +269,6 @@

      Page Not Found

      Back to top - - + + - + + - + @@ -183,6 +185,7 @@
      +
      @@ -218,25 +221,6 @@

      Re Back to top - - + - + + - + + - + @@ -214,6 +216,7 @@

    @@ -311,7 +290,7 @@
    - - @@ -563,25 +536,6 @@

    Error mess Back to top -

    @@ -440,46 +451,101 @@

    On this page

    -
    -

    Standalone Generate Quantities

    -

    The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. For an overview of the uses of this feature, see the QuickStart Guide section and the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

    -

    This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a sample from an equivalent model, i.e., a model with the same parameters, transformed parameters, and model blocks, conditioned on the same data.

    +
    +

    Generating Quantities of Interest from a Fitted Model

    +

    The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. Instead, you write a modified version of the original Stan program and add a generated quantities block or modify the existing one which specifies how to compute the new quantities of interest. Running the generate_quantities method on the new program together with sampler outputs (i.e., a set of draws) from the fitted model runs the generated quantities block of the new program using the the existing sample by plugging in the per-draw parameter estimates for the computations in the generated quantities block.

    +

    This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a parameter values from an equivalent model, i.e., a model with the same parameters block, conditioned on the same data.

    +

    The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

    +
      +
    • generate simulated data for model testing by forward sampling
    • +
    • generate predictions for new data
    • +
    • calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
    • +
    • calculate posterior expectations
    • +
    • transform parameters for reporting
    • +
    • apply full Bayesian decision theory
    • +
    • calculate log likelihoods, deviances, etc. for model comparison
    • +
    +

    For an overview of the uses of this feature, see the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

    +
    +

    Example

    +

    To illustrate how this works we use the generate_quantities method to do posterior predictive checks using the estimate of theta given the example bernoulli model and data, following the posterior predictive simulation procedure in the Stan User’s Guide.

    +

    We write a program bernoulli_ppc.stan which contains the following generated quantities block, with comments to explain the procedure:

    +
    generated quantities {
    +  array[N] int y_sim;
    +  // use current estimate of theta to generate new sample
    +  for (n in 1:N) {
    +    y_sim[n] = bernoulli_rng(theta);
    +  }
    +  // estimate theta_rep from new sample
    +  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
    +}
    +

    The rest of the program is the same as in bernoulli.stan.

    +

    The generate_method requires the sub-argument fitted_params which takes as its value the name of a Stan CSV file. The per-draw parameter values from the fitted_params file will be used to run the generated quantities block.

    If we run the bernoulli.stan program for a single chain to generate a sample in file bernoulli_fit.csv:

    > ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv

    Then we can run the bernoulli_ppc.stan to carry out the posterior predictive checks:

    > ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
                       data file=bernoulli.data.json \
                       output file=bernoulli_ppc.csv
    -

    The fitted_params file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

    +

    The output file bernoulli_ppc.csv contains only the values for the variables declared in the generated quantities block, i.e., theta_rep and the elements of y_sim:

    +
    # model = bernoulli_ppc_model
    +# method = generate_quantities
    +#   generate_quantities
    +#     fitted_params = bernoulli_fit.csv
    +# id = 1 (Default)
    +# data
    +#   file = bernoulli.data.json
    +# init = 2 (Default)
    +# random
    +#   seed = 2983956445 (Default)
    +# output
    +#   file = output.csv (Default)
    +y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep
    +1,1,1,0,0,0,1,1,0,1,0.6
    +1,1,0,1,0,0,1,0,1,0,0.5
    +1,0,1,1,1,1,1,1,0,1,0.8
    +0,1,0,1,0,1,0,1,0,0,0.4
    +1,0,0,0,0,0,0,0,0,0,0.1
    +0,0,0,0,0,1,1,1,0,0,0.3
    +0,0,1,0,1,0,0,0,0,0,0.2
    +1,0,1,0,1,1,0,1,1,0,0.6
    +...
    +

    Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:

    +
    generated quantities {
    +  array[N] int y_sim;
    +  // use current estimate of theta to generate new sample
    +  for (n in 1:N) {
    +    y_sim[n] = bernoulli_rng(theta);
    +  }
    +  real<lower=0, upper=1> theta_cp = theta;
    +  // estimate theta_rep from new sample
    +  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
    +}
    +

    Now the output is slightly more interpretable: theta_cp is the same as the theta used to generate the values y_sim[1] through y_sim[1]. Comparing columns theta_cp and theta_rep allows us to see how the uncertainty in our estimate of theta is carried forward into our predictions:

    +
    y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep
    +0,1,1,0,1,0,0,1,1,0,0.545679,0.5
    +1,1,1,1,1,1,0,1,1,0,0.527164,0.8
    +1,1,1,1,0,1,1,1,1,0,0.529116,0.8
    +1,0,1,1,1,1,0,0,1,0,0.478844,0.6
    +0,1,0,0,0,0,1,0,1,0,0.238793,0.3
    +0,0,0,0,0,1,1,0,0,0,0.258294,0.2
    +1,1,1,0,0,0,0,0,0,0,0.258465,0.3
    +
    +
    +

    Errors

    +

    The fitted_params file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

    Error reading fitted param names from sample csv file <filename.csv>

    The fitted_params file must contain columns corresponding to legal values for all parameters defined in the model. If any parameters are missing, the program will exit with an error message of the form:

    Error reading fitted param names from sample csv file <filename.csv>
    -

    The parameter values of the fitted_params are on the constrained scale and must obey all constraints. For example, if we modify the contencts of the first reported draw in bernoulli_fit.csv so that the value of theta is outside the declared bounds real<lower=0, upper=1>, the program will return the following error message:

    -
    Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] (in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
    +

    The parameter values of the fitted_params are on the constrained scale and must obey all constraints. For example, if we modify the contents of the first reported draw in bernoulli_fit.csv so that the value of theta is outside the declared bounds real<lower=0, upper=1>, the program will return the following error message:

    +
    Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \
    +(in 'bernoulli_ppc.stan', line 5, column 2 to column 30)
    +
    Back to top -
    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    + @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
    - - @@ -496,7 +469,7 @@

    CmdStan Installation

    Installation via conda

    -

    With conda, you can install CmdStan from the conda-forge channel. This will install a pre-built version of CmdStan along with the required dependencies (i.e. a C++ compiler, a version of Make, and required libraries) detailed below under [Source installation]. The conda installation is designed so one can use the R or Python bindings to CmdStan seamlessly. Additionally, it provides the command cmdstan_model to activate the CmdStan makefile from anywhere.

    +

    With conda, you can install CmdStan from the conda-forge channel. This will install a pre-built version of CmdStan along with the required dependencies (i.e. a C++ compiler, a version of Make, and required libraries). The conda installation is designed so one can use the R or Python bindings to CmdStan seamlessly. Additionally, it provides the command cmdstan_model to activate the CmdStan makefile from anywhere.

    Note: This requires that conda has been installed already on your machine. You can either install miniconda, a free, minimal installer for conda or you can get the full Anaconda system which provides graphical installer wizards for MacOS and Windows users.

    We recommend installing CmdStan in a new conda environment:

     conda create -n stan -c conda-forge cmdstan
    @@ -570,7 +543,7 @@

    Checking the St > make examples/bernoulli/bernoulli # fit to provided data (results of 10 trials, 2 out of 10 successes) -> ./examples/bernoulli/bernoulli sample\ +> ./examples/bernoulli/bernoulli sample\ data file=examples/bernoulli/bernoulli.data.json # default output written to file `output.csv`, @@ -741,7 +714,7 @@

    Using GNU Make

    4. Build the diagnose utility bin/diagnose 5. Build all libraries and object files compile and link an executable Stan program - Note: to build using multiple cores, use the -j option to make, e.g., + Note: to build using multiple cores, use the -j option to make, e.g., for 4 cores: > make build -j4 @@ -798,25 +771,6 @@

    Using GNU Make

  • To open a Windows command shell, first open the Start Menu, (usually in the lower left of the screen), select option All Programs, then option Accessories, then program Command Prompt. Alternatively, enter [Windows+r] (both keys together on the keyboard), and enter cmd into the text field that pops up in the Run window, then press [Return] on the keyboard to run.↩︎

  • - - + + - + + - + @@ -248,6 +250,7 @@
    +
    @@ -286,7 +289,7 @@ - - - - @@ -345,7 +324,7 @@
    - - @@ -687,25 +660,6 @@

    Empty arrays in JSON< Back to top - - + + - + + - + @@ -185,6 +187,7 @@
    + @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -447,7 +420,7 @@

    On this page

    Laplace sampling

    -

    The laplace method produces a sample from a normal approximation centered at the mode of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. If the mode is a maximum likelihood estimate (MLE), the sample provides an estimate of the standard error of the likelihood. In general, the posterior mode in the unconstrained space doesn’t correspond to the mean (nor mode) in the constrained space, and thus the sample is needed to infer the mean as well as the standard deviation. (See this case study for a visual illustration.)

    +

    The laplace method produces a sample from a normal approximation centered at the mode of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. If the mode is a maximum likelihood estimate (MLE), the sample provides an estimate of the standard error of the likelihood. In general, the posterior mode in the unconstrained space doesn’t correspond to the mean (nor mode) in the constrained space, and thus the sample is needed to infer the mean as well as the standard deviation. (See this case study for a visual illustration.)

    This is computationally inexpensive compared to exact Bayesian inference with MCMC. The goodness of this estimate depends on both the estimate of the mode and how much the true posterior in the unconstrained space resembles a Gaussian.

    Configuration

    @@ -518,25 +491,6 @@

    Example

    Back to top -
    Back to top - - + + +

    Redirecting…

    - Click here if you are not redirected. + Click here if you are not redirected. diff --git a/docs/cmdstan-guide/mcmc_config.html b/docs/cmdstan-guide/mcmc_config.html index 114955e9b..18ceb1d81 100644 --- a/docs/cmdstan-guide/mcmc_config.html +++ b/docs/cmdstan-guide/mcmc_config.html @@ -7,7 +7,7 @@ -MCMC Sampling Configuration +MCMC Sampling - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    MCMC Sampling

    -
    -

    Running the sampler

    -

    To generate a sample from the posterior distribution of the model conditioned on the data, we run the executable program with the argument sample or method=sample together with the input data. The executable can be run from any directory. Here, we run it in the directory which contains the Stan program and input data, <cmdstan-home>/examples/bernoulli:

    -
    > cd examples/bernoulli
    -

    To execute sampling of the model under Linux or Mac, use:

    -
    > ./bernoulli sample data file=bernoulli.data.json
    -

    In Windows, the ./ prefix is not needed:

    -
    > bernoulli.exe sample data file=bernoulli.data.json
    -

    The output is the same across all supported platforms. First, the configuration of the program is echoed to the standard output:

    -
    method = sample (Default)
    -  sample
    -    num_samples = 1000 (Default)
    -    num_warmup = 1000 (Default)
    -    save_warmup = 0 (Default)
    -    thin = 1 (Default)
    -    adapt
    -      engaged = 1 (Default)
    -      gamma = 0.050000000000000003 (Default)
    -      delta = 0.80000000000000004 (Default)
    -      kappa = 0.75 (Default)
    -      t0 = 10 (Default)
    -      init_buffer = 75 (Default)
    -      term_buffer = 50 (Default)
    -      window = 25 (Default)
    -      save_metric = 0 (Default)
    -    algorithm = hmc (Default)
    -      hmc
    -        engine = nuts (Default)
    -          nuts
    -            max_depth = 10 (Default)
    -        metric = diag_e (Default)
    -        metric_file =  (Default)
    -        stepsize = 1 (Default)
    -        stepsize_jitter = 0 (Default)
    -    num_chains = 1 (Default)
    -id = 0 (Default)
    -data
    -  file = bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 3252652196 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -

    After the configuration has been displayed, a short timing message is given.

    -
    Gradient evaluation took 1.2e-05 seconds
    -1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
    -Adjust your expectations accordingly!
    -

    Next, the sampler reports the iteration number, reporting the percentage complete.

    -
    Iteration:    1 / 2000 [  0%]  (Warmup)
    -....
    -Iteration: 2000 / 2000 [100%]  (Sampling)
    -

    Finally, the sampler reports timing information:

    -
     Elapsed Time: 0.007 seconds (Warm-up)
    -               0.017 seconds (Sampling)
    -               0.024 seconds (Total)
    -
    -
    -

    Running multiple chains

    -

    A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains.

    -

    There are two different ways of running multiple chains, with the num_chains argument using a single executable and by using the Unix and DOS shell to run multiple executables.

    -
    -

    Using the num_chains argument to run multiple chains

    -

    The num_chains argument can be used for all of Stan’s samplers with the exception of the static HMC engine.

    -

    Example that will run 4 chains:

    -
    ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv
    -

    If the model was not compiled with STAN_THREADS=true, the above command will run 4 chains sequentially and will produce the sample in output_1.csv, output_2.csv, output_3.csv, output_4.csv. A suffix with the chain id is appended to the provided output filename (output.csv in the above command).

    -

    If the model was compiled with STAN_THREADS=true, the chains can run in parallel, with the num_threads argument defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization (map_rect or reduce_sum calls), the below command will run 4 chains in parallel, provided there are cores available:

    -
    ./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4
    -

    If the model uses within-chain parallelization (map_rect or reduce_sum calls), the threads are automatically scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, 1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the Threading Building Blocks scheduler.

    -
    ./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16
    -
    -
    -

    Using shell for running multiple chains

    -

    To run multiple chains given a model and data, either sequentially or in parallel, we can also use the Unix or DOS shell for loop to set up index variables needed to identify each chain and its outputs.

    -

    On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters is:

    -
    for NAME [in LIST]; do COMMANDS; done
    -

    The list can be a simple sequence of numbers, or you can use the shell expansion syntax {1..N} which expands to the sequence from \(1\) to \(N\), e.g. {1..4} expands to 1 2 3 4. Note that the expression {1..N} cannot contain spaces.

    -

    To run 4 chains for the example bernoulli model on MacOS or Linux:

    -
    > for i in {1..4}
    -    do
    -      ./bernoulli sample data file=bernoulli.data.json \
    -      output file=output_${i}.csv
    -    done
    -

    The backslash (\) indicates a line continuation in Unix. The expression ${i} substitutes in the value of loop index variable i. To run chains in parallel, put an ampersand (&) at the end of the nested sampler command:

    -
    > for i in {1..4}
    -    do
    -      ./bernoulli sample data file=bernoulli.data.json \
    -      output file=output_${i}.csv &
    -    done
    -

    This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.

    -

    On Windows, the DOS for-loop syntax is one of:

    -
    for %i in (SET) do COMMAND COMMAND-ARGUMENTS
    -for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS
    -

    To run 4 chains in parallel on Windows:

    -
    >for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^
    -                                    data file=bernoulli.data.json my_data ^
    -                                    output file=output_%i.csv
    -

    The caret (^) indicates a line continuation in DOS.

    -
    -
    -
    -

    Stan CSV output file

    -

    Each execution of the model results in draws from a single Markov chain being written to a file in comma-separated value (CSV) format. The default name of the output file is output.csv.

    -

    The first part of the output file records the version of the underlying Stan library and the configuration as comments (i.e., lines beginning with the pound sign (#)).

    -
    # stan_version_major = 2
    -# stan_version_minor = 23
    -# stan_version_patch = 0
    -# model = bernoulli_model
    -# method = sample (Default)
    -#   sample
    -#     num_samples = 1000 (Default)
    -#     num_warmup = 1000 (Default)
    -...
    -# output
    -#   file = output.csv (Default)
    -#   diagnostic_file =  (Default)
    -#   refresh = 100 (Default)
    -

    This is followed by a CSV header indicating the names of the values sampled.

    -
    lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
    -

    The first output columns report the HMC sampler information:

    -
      -
    • lp__ - the total log probability density (up to an additive constant) at each sample
    • -
    • accept_stat__ - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory
    • -
    • stepsize__ - integrator step size
    • -
    • treedepth__ - depth of tree used by NUTS (NUTS sampler)
    • -
    • n_leapfrog__ - number of leapfrog calculations (NUTS sampler)
    • -
    • divergent__ - has value 1 if trajectory diverged, otherwise 0. (NUTS sampler)
    • -
    • energy__ - value of the Hamiltonian
    • -
    • int_time__ - total integration time (static HMC sampler)
    • -
    -

    Because the above header is from the NUTS sampler, it has columns treedepth__, n_leapfrog__, and divergent__ and doesn’t have column int_time__. The remaining columns correspond to model parameters. For the Bernoulli model, it is just the final column, theta.

    -

    The header line is written to the output file before warmup begins. If option save_warmup is set to 1, the warmup draws are output directly after the header. The total number of warmup draws saved is num_warmup divided by thin, rounded up (i.e., ceiling).

    -

    Following the warmup draws (if any), are comments which record the results of adaptation: the stepsize, and inverse mass metric used during sampling:

    -
    # Adaptation terminated
    -# Step size = 0.884484
    -# Diagonal elements of inverse mass matrix:
    -# 0.535006
    -

    The default sampler is NUTS with an adapted step size and a diagonal inverse mass matrix. For this example, the step size is 0.884484, and the inverse mass contains the single entry 0.535006 corresponding to the parameter theta.

    -

    Draws from the posterior distribution are printed out next, each line containing a single draw with the columns corresponding to the header.

    -
    -6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853
    --6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295
    --7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299
    --6.88712,1,0.884484,1,1,0,7.02101,0.188229
    --7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596
    -...
    -

    The output ends with timing details:

    -
    #  Elapsed Time: 0.007 seconds (Warm-up)
    -#                0.017 seconds (Sampling)
    -#                0.024 seconds (Total)
    -
    -
    -

    Summarizing sampler output(s) with stansummary

    -

    The stansummary utility processes one or more output files from a run or set of runs of Stan’s HMC sampler given a model and data. For all columns in the Stan CSV output file stansummary reports a set of statistics including mean, standard deviation, percentiles, effective number of samples, and \(\hat{R}\) values.

    -

    To run stansummary on the output files generated by the for loop above, by the above run of the bernoulli model on Mac or Linux:

    -
    <cmdstan-home>/bin/stansummary output_*.csv
    -

    On Windows, use backslashes to call the stansummary.exe.

    -
    <cmdstan-home>\bin\stansummary.exe output_*.csv
    -

    The stansummary output consists of one row of statistics per column in the Stan CSV output file. Therefore, the first rows in the stansummary report statistics over the sampler state. The final row of output summarizes the estimates of the model variable theta:

    -
    Inference for Stan model: bernoulli_model
    -4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
    -
    -Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total
    -Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total
    -
    -                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
    -lp__            -7.3  1.8e-02    0.75   -8.8  -7.0  -6.8  1.8e+03  2.4e+04  1.0e+00
    -accept_stat__   0.89  2.7e-03    0.17   0.52  0.96   1.0  3.9e+03  5.1e+04  1.0e+00
    -stepsize__       1.1  7.5e-02    0.11   0.93   1.2   1.2  2.0e+00  2.6e+01  2.5e+13
    -treedepth__      1.4  8.1e-03    0.49    1.0   1.0   2.0  3.6e+03  4.7e+04  1.0e+00
    -n_leapfrog__     2.3  1.7e-02    0.98    1.0   3.0   3.0  3.3e+03  4.3e+04  1.0e+00
    -divergent__     0.00      nan    0.00   0.00  0.00  0.00      nan      nan      nan
    -energy__         7.8  2.6e-02     1.0    6.8   7.5   9.9  1.7e+03  2.2e+04  1.0e+00
    -theta           0.25  2.9e-03    0.12  0.079  0.23  0.46  1.7e+03  2.1e+04  1.0e+00
    -
    -Samples were drawn using hmc with nuts.
    -For each parameter, N_Eff is a crude measure of effective sample size,
    -and R_hat is the potential scale reduction factor on split chains (at
    -convergence, R_hat=1).
    -

    In this example, we conditioned the model on a dataset consisting of the outcomes of 10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% percentile values for theta reflect the uncertainty in our estimate, due to the small amount of data, given the prior of beta(1, 1)

    - - -
    -
    - - Back to top
    - - - -
    - - - - - \ No newline at end of file + + + Redirecting… + + + + +

    Redirecting…

    + Click here if you are not redirected. + diff --git a/docs/cmdstan-guide/optimization_intro.html b/docs/cmdstan-guide/optimization_intro.html index e74210b71..521625b1e 100644 --- a/docs/cmdstan-guide/optimization_intro.html +++ b/docs/cmdstan-guide/optimization_intro.html @@ -1,961 +1,11 @@ - - - - - - - - -Optimization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    Optimization

    -

    The CmdStan executable can run Stan’s optimization algorithms which provide a deterministic method to find the posterior mode. If the posterior is not convex, there is no guarantee Stan will be able to find the global mode as opposed to a local optimum of log probability.

    -

    The executable does not need to be recompiled in order to switch from sampling to optimization, and the data input format is the same. The following is a minimal call to Stan’s optimizer using defaults for everything but the location of the data file.

    -
    > ./bernoulli optimize data file=bernoulli.data.json
    -

    Executing this command prints both output to the console and to a csv file.

    -

    The first part of the console output reports on the configuration used. The above command uses all default configurations, therefore the optimizer used is the L-BFGS optimizer and its default initial stepsize and tolerances for monitoring convergence:

    -
     ./bernoulli optimize data file=bernoulli.data.json
    -method = optimize
    -  optimize
    -    algorithm = lbfgs (Default)
    -      lbfgs
    -        init_alpha = 0.001 (Default)
    -        tol_obj = 1e-12 (Default)
    -        tol_rel_obj = 10000 (Default)
    -        tol_grad = 1e-08 (Default)
    -        tol_rel_grad = 10000000 (Default)
    -        tol_param = 1e-08 (Default)
    -        history_size = 5 (Default)
    -    iter = 2000 (Default)
    -    save_iterations = 0 (Default)
    -id = 0 (Default)
    -data
    -  file = bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 87122538 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -

    The second part of the output indicates how well the algorithm fared, here converging and terminating normally. The numbers reported indicate that it took 5 iterations and 8 gradient evaluations. This is, not surprisingly, far fewer iterations than required for sampling; even fewer iterations would be used with less stringent user-specified convergence tolerances. The alpha value is for step size used. In the final state the change in parameters was roughly \(0.002\) and the length of the gradient roughly 3e-05 (\(0.00003\)).

    -
    Initial log joint probability = -6.85653
    -    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
    -       5      -5.00402    0.00184936   3.35074e-05           1           1        8   
    -Optimization terminated normally: 
    -  Convergence detected: relative gradient magnitude is below tolerance
    -

    The output from optimization is written into the file output.csv by default. The output follows the same pattern as the output for sampling, first dumping the entire set of parameters used as comment lines:

    -
    # stan_version_major = 2
    -# stan_version_minor = 23
    -# stan_version_patch = 0
    -# model = bernoulli_model
    -# method = optimize
    -#   optimize
    -#     algorithm = lbfgs (Default)
    -...
    -

    Following the config information, are two lines of output: the CSV headers and the recorded values:

    -
    lp__,theta
    --5.00402,0.200003
    -

    Note that everything is a comment other than a line for the header, and a line for the values. Here, the header indicates the unnormalized log probability with lp__ and the model parameter theta. The maximum log probability is -5.0 and the posterior mode for theta is 0.20. The mode exactly matches what we would expect from the data. Because the prior was uniform, the result 0.20 represents the maximum likelihood estimate (MLE) for the very simple Bernoulli model. Note that no uncertainty is reported.

    -

    All of the optimizers stream per-iteration intermediate approximations to the command line console. The sub-argument save_iterations specifies whether or not to save the intermediate iterations to the output file. Allowed values are \(0\) or \(1\), corresponding to False and True respectively. The default value is \(0\), i.e., intermediate iterations are not saved to the output file. Running the optimizer with save_iterations=1 writes both the initial log joint probability and values for all iterations to the output CSV file.

    -

    Running the example model with option save_iterations=1, i.e., the command

    -
    > ./bernoulli optimize save_iterations=1 data file=bernoulli.data.json
    -

    produces CSV file output rows:

    -
    lp__,theta
    --6.85653,0.493689
    --6.10128,0.420936
    --5.02953,0.22956
    --5.00517,0.206107
    --5.00403,0.200299
    --5.00402,0.200003
    - - -
    - - Back to top
    - - - -
    - - - - - \ No newline at end of file + + + Redirecting… + + + + +

    Redirecting…

    + Click here if you are not redirected. + diff --git a/docs/cmdstan-guide/optimize_config.html b/docs/cmdstan-guide/optimize_config.html index 1f5b56d7b..2a3ab13e5 100644 --- a/docs/cmdstan-guide/optimize_config.html +++ b/docs/cmdstan-guide/optimize_config.html @@ -7,7 +7,7 @@ -Optimization Configuration +Optimization +/* CSS for citations */ +div.csl-bib-body { } +div.csl-entry { + clear: both; + margin-bottom: 0em; +} +.hanging-indent div.csl-entry { + margin-left:2em; + text-indent:-2em; +} +div.csl-left-margin { + min-width:2em; + float:left; +} +div.csl-right-inline { + margin-left:2em; + padding-left:1em; +} +div.csl-indent { + margin-left: 2em; +} @@ -38,10 +58,12 @@ - + + - + + - + @@ -214,6 +236,7 @@ - - @@ -477,9 +470,14 @@

    On this page

    Pathfinder Method for Approximate Bayesian Inference

    -

    The Pathfinder algorithm is described in section Pathfinder overview.

    +

    The CmdStan method pathfinder uses the Pathfinder algorithm of Zhang et al. (2022), which is further described in the Stan Reference Manual.

    +

    A single run of the Pathfinder algorithm generates a set of approximate draws. Inference is improved by running multiple Pathfinder instances and using Pareto-smoothed importance resampling (PSIS) of the resulting sets of draws. This better matches non-normal target densities and also eliminates minor modes.

    The pathfinder method runs multi-path Pathfinder by default, which returns a PSIS sample over the draws from several individual (“single-path”) Pathfinder runs. Argument num_paths specifies the number of single-path Pathfinders, the default is \(4\). If num_paths is set to 1, then only one individual Pathfinder is run without the PSIS reweighting of the sample.

    -

    The full set of configuration options available for the pathfinder method is reported at the beginning of the pathfinder output file as CSV comments. When the example model bernoulli.stan is run with method=pathfinder via the command line with all default arguments, the resulting Stan CSV file header comments show the complete set of default configuration options:

    +

    The full set of configuration options available for the pathfinder method is available by using the pathfinder help-all subcommand. The arguments with their requested values or defaults are also reported at the beginning of the algorithm’s console output and in the output CSV file’s comments.

    +

    The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.

    +
    > ./bernoulli pathfinder data file=bernoulli.data.R
    +

    Executing this command prints both output to the console and csv files.

    +

    The first part of the console output reports on the configuration used.

    method = pathfinder
       pathfinder
         init_alpha = 0.001 (Default)
    @@ -496,7 +494,39 @@ 

    Pathfinder Method for Approximate Bayesian Inference

    save_single_paths = 0 (Default) max_lbfgs_iters = 1000 (Default) num_draws = 1000 (Default) - num_elbo_draws = 25 (Default)
    + num_elbo_draws = 25 (Default) +id = 1 (Default) +data + file = examples/bernoulli/bernoulli.data.json +init = 2 (Default) +random + seed = 1995513073 (Default) +output + file = output.csv (Default) + diagnostic_file = (Default) + refresh = 100 (Default) + sig_figs = -1 (Default) + profile_file = profile.csv (Default) +num_threads = 1 (Default)
    +

    The rest of the output describes the progression of the algorithm.

    +

    By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, then uses importance resampling on the set of returned draws to produce the specified number of draws.

    +
    Path [1] :Initial log joint density = -11.543343
    +Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      1.070e-03   1.707e-05    1.000e+00  1.000e+00       126 -6.220e+00 -6.220e+00
    +Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
    +Path [2] :Initial log joint density = -7.443345
    +Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      9.936e-05   3.738e-07    1.000e+00  1.000e+00       126 -6.164e+00 -6.164e+00
    +Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
    +Path [3] :Initial log joint density = -18.986308
    +Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      2.996e-04   4.018e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00
    +Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
    +Path [4] :Initial log joint density = -8.304453
    +Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    +              5      -6.748e+00      2.814e-04   2.034e-06    1.000e+00  1.000e+00       126 -6.221e+00 -6.221e+00
    +Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
    +Total log probability function evaluations:8404

    Pathfinder Configuration

      @@ -512,18 +542,43 @@

      Pathfinder Config

    L-BFGS Configuration

    -

    Arguments init_alpha through history_size are the full set of arguments to the L-BFGS optimizer and have the same defaults for optimization.

    +

    Arguments init_alpha through history_size are the full set of arguments to the L-BFGS optimizer and have the same defaults for optimization.

    Multi-path Pathfinder CSV files

    By default, the pathfinder method uses 4 independent Pathfinder runs, each of which produces 1000 approximate draws, which are then importance resampled down to 1000 final draws. The importance resampled draws are output as a StanCSV file.

    The CSV files have the following structure:

    -
      -
    • The full set of configuration options available for the pathfinder method is reported at the beginning of the sampler output file as CSV comments.

    • -
    • The CSV header row consists of columns lp_approx__, lp__, and the Stan model parameters, transformed parameters, and generated quantities in the order in which they are declared in the Stan program.

    • -
    • The data rows contain the draws from the single- or multi-path run.

    • -
    • Final comments containing timing information.

    • -
    +

    The initial CSV comment rows contain the complete set of CmdStan configuration options.

    +
    ...
    +# method = pathfinder
    +#   pathfinder
    +#     init_alpha = 0.001 (Default)
    +#     tol_obj = 9.9999999999999998e-13 (Default)
    +#     tol_rel_obj = 10000 (Default)
    +#     tol_grad = 1e-08 (Default)
    +#     tol_rel_grad = 10000000 (Default)
    +#     tol_param = 1e-08 (Default)
    +#     history_size = 5 (Default)
    +#     num_psis_draws = 1000 (Default)
    +#     num_paths = 4 (Default)
    +#     psis_resample = 1 (Default)
    +#     calculate_lp = 1 (Default)
    +#     save_single_paths = 0 (Default)
    +#     max_lbfgs_iters = 1000 (Default)
    +#     num_draws = 1000 (Default)
    +#     num_elbo_draws = 25 (Default)
    +...
    +

    Next is the column header line, followed the set of approximate draws. The Pathfinder algorithm first outputs lp_approx__, the log density in the approximating distribution, and lp__, the log density in the target distribution, followed by estimates of the model parameters, transformed parameters, and generated quantities.

    +
    lp_approx__,lp__,theta
    +-2.4973, -8.2951, 0.0811852
    +-0.87445, -7.06526, 0.160207
    +-0.812285, -7.07124, 0.35819
    +...
    +

    The final lines are comment lines which give timing information.

    +
    # Elapsed Time: 0.016000 seconds (Pathfinders)
    +#               0.003000 seconds (PSIS)
    +#               0.019000 seconds (Total)
    +

    Pathfinder provides option save_single_paths which will save output from the single-path Pathfinder runs.

    Single-path Pathfinder Outputs.

    @@ -591,29 +646,15 @@

    Single-pat

    Option num_paths=1 runs one single-path Pathfinder and output CSV file contains the draws from that run without PSIS reweighting. The combination of arguments num_paths=1 save_single_paths=1 creates just two output files, the CSV sample and the set of ELBO iterations. In this case, the default output file name is “output.csv” and the default diagnostic file name is “output.json”.

    +

    - Back to top - + Back to top

    References

    +
    +Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49. http://jmlr.org/papers/v23/21-0889.html. +
    +
    diff --git a/docs/cmdstan-guide/pathfinder_intro.html b/docs/cmdstan-guide/pathfinder_intro.html index 92918fd7a..4f3244d4a 100644 --- a/docs/cmdstan-guide/pathfinder_intro.html +++ b/docs/cmdstan-guide/pathfinder_intro.html @@ -1,965 +1,11 @@ - - - - - - - - -Pathfinder for Variational Inference - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -
    - - -
    - -
    - - -
    - - - -
    - - - - -
    -

    Variational Inference using Pathfinder

    -

    The CmdStan method pathfinder uses the Pathfinder algorithm of Zhang et al. (2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the L-BFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.

    -

    Pathfinder differs from the ADVI method in that it uses quasi-Newton optimization on the log posterior instead of stochastic gradient descent (SGD) on the Monte Carlo computation of the evidence lower bound (ELBO). Pathfinder’s approach is both faster and more stable than that of ADVI. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.

    -

    A single run of the Pathfinder algorithm generates a set of approximate draws. Inference is improved by running multiple Pathfinder instances and using Pareto-smoothed importance resampling (PSIS) of the resulting sets of draws. This better matches non-normal target densities and also eliminates minor modes. By default, the pathfinder method uses 4 independent Pathfinder runs, each of which produces 1000 approximate draws, which are then importance resampled down to 1000 final draws.

    -

    The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.

    -
    > ./bernoulli pathfinder data file=bernoulli.data.R
    -

    Executing this command prints both output to the console and csv files.

    -

    The first part of the console output reports on the configuration used.

    -
    method = pathfinder
    -  pathfinder
    -    init_alpha = 0.001 (Default)
    -    tol_obj = 9.9999999999999998e-13 (Default)
    -    tol_rel_obj = 10000 (Default)
    -    tol_grad = 1e-08 (Default)
    -    tol_rel_grad = 10000000 (Default)
    -    tol_param = 1e-08 (Default)
    -    history_size = 5 (Default)
    -    num_psis_draws = 1000 (Default)
    -    num_paths = 4 (Default)
    -    psis_resample = 1 (Default)
    -    calculate_lp = 1 (Default)
    -    save_single_paths = 0 (Default)
    -    max_lbfgs_iters = 1000 (Default)
    -    num_draws = 1000 (Default)
    -    num_elbo_draws = 25 (Default)
    -id = 1 (Default)
    -data
    -  file = examples/bernoulli/bernoulli.data.json
    -init = 2 (Default)
    -random
    -  seed = 1995513073 (Default)
    -output
    -  file = output.csv (Default)
    -  diagnostic_file =  (Default)
    -  refresh = 100 (Default)
    -  sig_figs = -1 (Default)
    -  profile_file = profile.csv (Default)
    -num_threads = 1 (Default)
    -

    The rest of the output describes the progression of the algorithm.

    -

    By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, the uses importance resampling on the set of returned draws to produce the specified number of draws.

    -
    Path [1] :Initial log joint density = -11.543343
    -Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      1.070e-03   1.707e-05    1.000e+00  1.000e+00       126 -6.220e+00 -6.220e+00
    -Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
    -Path [2] :Initial log joint density = -7.443345
    -Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      9.936e-05   3.738e-07    1.000e+00  1.000e+00       126 -6.164e+00 -6.164e+00
    -Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
    -Path [3] :Initial log joint density = -18.986308
    -Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      2.996e-04   4.018e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00
    -Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
    -Path [4] :Initial log joint density = -8.304453
    -Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes
    -              5      -6.748e+00      2.814e-04   2.034e-06    1.000e+00  1.000e+00       126 -6.221e+00 -6.221e+00
    -Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
    -Total log probability function evaluations:8404
    -

    Pathfinder outputs a StanCSV file file which contains the importance resampled draws from multi-path Pathfinder. The initial CSV comment rows contain the complete set of CmdStan configuration options. Next is the column header line, followed the set of approximate draws. The Pathfinder algorithm first outputs lp_approx__, the log density in the approximating distribution, and lp__, the log density in the target distribution, followed by estimates of the model parameters, transformed parameters, and generated quantities.

    -
    lp_approx__,lp__,theta
    --2.4973, -8.2951, 0.0811852
    --0.87445, -7.06526, 0.160207
    --0.812285, -7.07124, 0.35819
    -...
    -

    The final lines are comment lines which give timing information.

    -
    # Elapsed Time: 0.016000 seconds (Pathfinders)
    -#               0.003000 seconds (PSIS)
    -#               0.019000 seconds (Total)
    -

    Pathfinder provides option save_single_paths which will save output from the single-path Pathfinder runs. See section Pathfinder Method for details.

    - - - -
    - - Back to top

    References

    -
    -Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49. http://jmlr.org/papers/v23/21-0889.html. -
    -
    - - - -
    - - - - - \ No newline at end of file + + + Redirecting… + + + + +

    Redirecting…

    + Click here if you are not redirected. + diff --git a/docs/cmdstan-guide/print.html b/docs/cmdstan-guide/print.html index 59a934c48..f8b75a86c 100644 --- a/docs/cmdstan-guide/print.html +++ b/docs/cmdstan-guide/print.html @@ -38,10 +38,12 @@ - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -448,25 +421,6 @@

    print (deprecated): MCMC Output Analysis

    Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
    - - @@ -704,25 +677,6 @@

    BNF grammar for Back to top - - + + - + + - + @@ -214,6 +216,7 @@
    +
    @@ -252,7 +255,7 @@ - - - - @@ -311,7 +290,7 @@
    - - @@ -570,7 +543,7 @@

    Sampler Stan # refresh = 100 (Default)

    Note that when running multi-threaded programs which use reduce_sum for high-level parallelization, the number of threads used will also be included in this initial comment header.

    Column headers

    -

    The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. The sampler parameters are described in detail in the output file section of the Quickstart Guide chapter on MCMC Sampling. The example model bernoulli.stan only contains one parameter theta, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter:

    +

    The CSV header row lists all sampler parameters, model parameters, transformed parameters, and quantities of interest. The sampler parameters are described in detail in the output file section of the chapter on MCMC Sampling. The example model bernoulli.stan only contains one parameter theta, therefore the CSV file data table consists of 7 sampler parameter columns and one column for the model parameter:

    lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta

    As a second example, we show the output of the eight_schools.stan model on run on example dataset. This model has 3 parameters: mu, theta a vector whose length is dependent on the input data, here N = 8, and tau. The initial columns are for the 7 sampler parameters, as before. The column headers for the model parameters are:

    mu,theta.1,theta.2,theta.3,theta.4,theta.5,theta.6,theta.7,theta.8,tau
    @@ -693,25 +666,6 @@

    Diagnose method ou Back to top - - + - + + - + + - + @@ -185,6 +187,7 @@
    +
    @@ -223,7 +226,7 @@ - - - - @@ -282,7 +261,7 @@
    - - @@ -478,25 +451,6 @@

    The Stan compile Back to top -