Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New default hyperparameters and miscellaneous bug fixes for Kriging. #374

Merged
merged 8 commits into from
Jul 12, 2022

Conversation

archermarx
Copy link
Contributor

@archermarx archermarx commented Jul 11, 2022

Addresses #369 and #368. This PR defines new default hyperparameters for the Kriging surrogate. Along the way, I found that the implementation of the 1D kriging surrogate was incorrect and did not work properly for theta not equal to 1, so I've fixed that as well.

To summarize, the default hyperparameters for Kriging used to be p = ones(d), theta = ones(d), where d is the problem dimension. These are very bad choices. I have changed them to

$$p_i = 2, i \in [1, d]$$

$$\theta_i = \frac{1}{2}\mathrm{std}(\mathbf{X_i})^{-p_i}, i\in [1, d]$$

The latter comes from A Practical Guide to Gaussian Processes and the interpretation of $\theta_i = 1/2l_i^2$, where $l_i$ is a characteristic correlation length scale for variable $i$. Scaling the length scale with the standard deviation of the samples makes the hyperparameter initialization independent of the box size, which is a very desirable property. With this initialization, here's some comparisons to the old defaults:

Sphere function

Old
sphere_old_defaults
New
sphere_new_defaults

L1-norm function

Old
l1norm_old_defaults
New
l1norm_new_defaults

Branin function

Old
branin_old_defaults

New
branin_new_defaults

Rosenbrock

Old
rosenbrock_old_defaults

New
rosenbrock_new_defaults

Clearly, the new hyperparameters are a significant improvement. During implementation, I found that the 1D kriging surrogate only worked properly when theta = 1, so I have fixed that as well:

Old (with old default hyperparams)
plot_169

Old (with new default hyperparams)
plot_171

New (with new default hyperparams)
plot_170

Still to do is update the documentation for kriging to make use of the better defaults, as the current documentation doesn't make kriging look all that effective

@codecov
Copy link

codecov bot commented Jul 11, 2022

Codecov Report

Merging #374 (a862299) into master (aba57f3) will decrease coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #374      +/-   ##
==========================================
- Coverage   78.99%   78.85%   -0.14%     
==========================================
  Files          16       16              
  Lines        2266     2280      +14     
==========================================
+ Hits         1790     1798       +8     
- Misses        476      482       +6     
Impacted Files Coverage Δ
src/Kriging.jl 94.44% <100.00%> (+0.82%) ⬆️
src/GEKPLS.jl 91.20% <0.00%> (-0.93%) ⬇️
src/Optimization.jl 72.06% <0.00%> (-0.37%) ⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@archermarx
Copy link
Contributor Author

archermarx commented Jul 11, 2022

I have added nugget regularization to address the problem where the correlation matrix becomes singular when points are sufficiently close together. The way it currently works is to add a small "nugget" value to the main diagonal of the correlation matrix to keep the condition number less than 1e8. This has the same effect as a tiny noise variance, and prevents the matrix from becoming singular at the cost of slightly relaxing the interpolation condition.

@archermarx archermarx marked this pull request as ready for review July 11, 2022 22:15
@@ -21,27 +18,45 @@ a = 2
b = 6

#Using Kriging
my_k_SRBF1 = Kriging(x, y, lb, ub)
surrogate_optimize(objective_function, SRBF(), a, b, my_k_SRBF1, UniformSample())
begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of the blocks here? It's fine but a little odd.

src/Kriging.jl Outdated
Comment on lines 190 to 204
# Estimate nugget based on maximum allowed condition number
# This regularizes R to allow for points being close to eachother without R becoming
# singular, at the cost of slightly relaxing the interpolation condition
λ = eigen(R).values

λmax = λ[end]
λmin = λ[1]

κmax = 1e8
λdiff = λmax - κmax * λmin
if λdiff ≥ 0
nugget = λdiff / (κmax - 1)
else
nugget = 0.0
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on the source for this?

@archermarx
Copy link
Contributor Author

archermarx commented Jul 12, 2022 via email

@ChrisRackauckas
Copy link
Member

yeah probably best to remove the blocks then.

@ChrisRackauckas ChrisRackauckas merged commit 5bd2810 into SciML:master Jul 12, 2022
@ChrisRackauckas
Copy link
Member

amazing, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants