GaussianProcessRegression

Purpose

The goal of the driver is to build up a regression model of a function f: \mathcal{X} \rightarrow \mathbb{R} with respect to variations of a parameter vector \mathbf{p}\in\mathcal{X}\subset\mathbb{R}^d. The search domain \mathcal{X} is bounded by box constraints l_i\leq p_i \leq u_i for 1\leq i\leq d and may be subject to several constraints c_j: \mathbb{R}^d \rightarrow \mathbb{R} such that \mathbf{p} \in \mathcal{X} only if c_j(\mathbf{p}) \leq 0 (see jcmwave_optimizer_create_study()). To this end the driver performs a Gaussian process regression.

The regression model allows to to make predictions (study.predict()), determine the positions and widths of local minima (study.get_minima()), or compute averages of the objective function for parameter values with a given random distribution (study.get_statistics()).

Usage Example

addpath(fullfile(getenv('JCMROOT'), 'ThirdPartySupport', 'Matlab'));
client = jcmwave_optimizer_client();

% Definition of the search domain
domain = {
    struct('name','x1', 'type','continuous', 'domain',[-pi,pi]),...
    struct('name','x2', 'type','continuous', 'domain',[-pi,pi]),...
    struct('name','x3', 'type','continuous', 'domain',[-pi,pi]),...
};

% Creation of the study object with study_id 'example'
study = client.create_study('domain',domain,  ...
                'driver','GaussianProcessRegression',...
                'name','GaussianProcessRegression example', ...
                'study_id','GaussianProcessRegression_example');

% Definition of the objective function (Ishigami function)
a = 7; b = 0.1; %parameters of the Ishigami function
function obs = objective(sample)
  pause(1.0) %makes objective expensive
  obs = study.new_observation();
  x1 = sample.x1;
  x2 = sample.x2;
  x3 = sample.x3;
  obs = obs.add(sin(x1)+ a*sin(x2)^2 + b*x3^4*sin(x1));
  %derivative w.r.t. x1, x2, x3
  obs.add(cos(x1) + b*x3^4*cos(x1), 'x1');
  obs.add(2*a*sin(x2)*cos(x2), 'x2');
  obs.add(4*b*x3^3*sin(x1), 'x3');

end

% Set study parameters
study.set_parameters('max_iter', 150);

% Run the study
while(not(study.is_done))
    sug = study.get_suggestion();
    obs = objective(sug.sample);
    study.add_observation(obs, sug.id);
end

% Make prediction at (0,0,0)
prediction = study.predict('samples',[[0,0,0]],'derivatives',true);
fprintf(['\nPrediction f=%.3f (exact 0), ',...
         'grad=[%.3f %.3f %.3f] (exact [1 0 0])'],...
         prediction.means(1),...
         prediction.derivatives(1,1),...
         prediction.derivatives(1,2),...
         prediction.derivatives(1,3));

% Determine mean and variance of function by Bayesian quadrature

% Analytic mean and variance for the Ishigami function
mean = a/2;
variance = 1/2 + a^2/8 + b*pi^4/5 + b^2*pi^8/18;

% Run Bayesian quadrature
statistics = study.get_statistics();
fprintf('\nMean %.3f Exact %.3f', statistics.mean, mean);
fprintf('\nVariance %.3f Exact %.3f', statistics.variance, variance);

Parameters

The following parameters can be set by calling, e.g.

study.set_parameters('example_parameter1',[1,2,3], 'example_parameter2',true);
min_UC (float):The sampling is stopped when the uncertainty at the sampling point before acquisition is smaller than min_UC (default: 1e-06)
max_iter (int):Maximum number of evaluations of the objective function (default: inf)
max_time (int):Maximum run time in seconds (default: inf)
num_parallel (int):
 Number of parallel observations of the objective function (default: 1)
distribution (list):
 

Definition of random distribution for each parameter in the format of a list. All continuous parameters with unspecified distribution are assumed to be uniformely distributed in the parameter domain. Fixed and discrete parameters are not random parameters. The value of discrete parameters defaults to the first listed value. (default: None)

Example:
{struct("name","param1", "distribution","normal", "mean",1.0, "variance",2.0),
 struct("name","param2", "distribution","uniform", "domain",[-5,5]),
 struct("name","param3", "distribution","gamma", "alpha",1.2, "beta",0.5),
 struct("name","param4", "distribution","fixed", "value",7.0),
 struct("name","param5", "distribution","beta", "alpha",1.5, "beta",0.8)}
optimization_step_min (int):
 Minimum number of observations of the objective before the hyperparameters are optimized (default: 2 times number of dimensions). Note: Each derivative observation is counting as an independent observation (default: None)
optimization_step_max (int):
 Maximum number of observations of the objective after which no more hyperparameter optimization is performed. Note: Each derivative observation is counting as an independent observation (default: 300)
optimization_interval (int):
 Maximum number of observations of the objective after which the hyperparameters are optimized. Note: Each derivative observation is counting as an independent observation (default: 20)
compute_suggestion_in_advance (bool):
 If True a suggestion is computed in advance to speed up the sample computation. (default: True)
num_training_samples (int):
 Number of random initial samples before the samples are drawn according to the acquisition function. (default: 0)
parameter_uncertainties (list):
 

Sometimes, one is interested in minima of the objective function that are preferably insensitive to variations of certain parameters (e.g. if the parameters have large fabrication variances). In this case, one can specify the uncertainties of those parameters, i.e. their standard deviations. The regression model of the objective is averaged over the uncertainty intervals such that very narrow local minima are averaged out whereas broad minima are not effected. Note that the averaging results in additional numerical effort that grows exponentially with the number of uncertain parameters. (default: None)

Example:
{struct("name","param1", "uncertainty",0.1),
 struct("name","param3", "uncertainty",0.2)}
optimization_level (float):
 Steers how often the hyper-parameters are optimized. Small values (e.g. 0.01) lead to more frequent optimizations. Large values (e.g. 1.0) to less frequent optimizations (default: 0.05)
num_samples_hyperparameters (int):
 Number of local searches for optimal hyperparameters. If None, then the number is chosen accordingly to the number of dimensions. (default: None)
num_fantasies (int):
 Number of “fantasy” samples drawn from the Gaussian process in order to average over uncertain function values. This is used for avoiding the position of running samples or to handle noisy inputs. (default: 18)
matern_order (int):
 Order of the used Matern covariance function, i.e. for m=5 the Matern 5/2 function is used. (default: 5)
length_scales (list):
 List of length scales for each parameter on which the objective functions varies. If not set, the length scales are chosen automatically. (default: None)
detect_noise (bool):
 If true, variance due to noise is modelled by two hyperparameter (noise of objective and noise of derivatives). The value of the hyperparameters are chosen to maximize the likelihood of the observations. (default: False)
noise_variance (list):
 List of noise variances for objective observations and derivative observations relative to total variance. If not set, noiseless observations are assumed. Note: If detect_noise is true, the noise variances will be optimized after optimization_step_min observations. (default: None)
warp_input (bool):
 If true, the input function values are transformed (warped) in order to maximize the likelihood of the observations. (default: False)
warping_strengths (list):
 List of lower and upper warping strength. If not set, function values are left unwarped. Note: If warp_input is true, the warping strengths will be optimized after optimization_step_min observations. (default: None)
localize (bool):
 If true, a local search is performed, i.e. samples are not drawn in regions with large uncertainty. (default: False)
horizon (int):Horizon of previous observations to take into account for driver. (default: all previous observations are taken into account). (default: None)
max_scaling (float):
 Maximum scaling parameter. The scaling parameter is used to enforce a global search after convergence to a local minimum.It is automatically increased when a local convergence has been detected. (default: 10)