amisc.training
Classes for storing and managing training data for surrogate models. The TrainingData
interface also
specifies how new training data should be sampled over the input space (i.e. experimental design).
Includes:
TrainingData
— an interface for storing surrogate training data.SparseGrid
— a class for storing training data in a sparse grid format.
SparseGrid(collocation_rule='leja', knots_per_level=2, expand_latent_method='round-robin', opt_args=lambda: {'locally_biased': False, 'maxfun': 300}(), betas=set(), x_grids=dict(), yi_map=dict(), yi_nan_map=dict(), error_map=dict(), latent_size=dict())
dataclass
Bases: TrainingData
, PickleSerializable
A class for storing training data in a sparse grid format. The SparseGrid
class stores training points
by their coordinate location in a larger tensor-product grid, and obtains new training data by refining
a single 1d grid at a time.
MISC and sparse grids
MISC itself can be thought of as an extension to the well-known sparse grid technique, so this class
readily integrates with the MISC implementation in Component
. Sparse grids limit the curse
of dimensionality up to about dim = 10-15
for the input space (which would otherwise be infeasible with a
normal full tensor-product grid of the same size).
About points in a sparse grid
A sparse grid approximates a full tensor-product grid \((N_1, N_2, ..., N_d)\), where \(N_i\) is the number of grid
points along dimension \(i\), for a \(d\)-dimensional space. Each point is uniquely identified in the sparse grid
by a list of indices \((j_1, j_2, ..., j_d)\), where \(j_i = 0 ... N_i\). We refer to this unique identifier as a
"grid coordinate". In the SparseGrid
data structure, these coordinates are used along with the alpha
fidelity index to uniquely locate the training data for a given multi-index pair.
ATTRIBUTE | DESCRIPTION |
---|---|
collocation_rule |
the collocation rule to use for generating new grid points (only 'leja' is supported)
TYPE:
|
knots_per_level |
the number of grid knots/points per level in the
TYPE:
|
expand_latent_method |
method for expanding latent grids, either 'round-robin' or 'tensor-product'
TYPE:
|
opt_args |
extra arguments for the global 1d
TYPE:
|
betas |
a set of all
TYPE:
|
x_grids |
a
TYPE:
|
yi_map |
a
TYPE:
|
yi_nan_map |
a
TYPE:
|
error_map |
a
TYPE:
|
latent_size |
the number of latent coefficients for each variable (0 if scalar)
TYPE:
|
beta_to_knots(beta, knots_per_level=None, latent_size=None, expand_latent_method=None)
Convert a beta
multi-index to the number of knots per dimension in the sparse grid.
PARAMETER | DESCRIPTION |
---|---|
beta |
refinement level indices
TYPE:
|
knots_per_level |
level-to-grid-size multiplier, i.e. number of new points (or knots) for each beta level
TYPE:
|
latent_size |
the number of latent coefficients for each variable (0 if scalar); number of variables and order should match the
TYPE:
|
expand_latent_method |
method for expanding latent grids, either 'round-robin' or 'tensor-product'
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple
|
the number of knots/points per dimension for the sparse grid |
Source code in src/amisc/training.py
clear()
collocation_1d(N, z_bds, z_pts=None, wt_fcn=None, method='leja', opt_args=None)
staticmethod
Find the next N
points in the 1d sequence of z_pts
using the provided collocation method.
PARAMETER | DESCRIPTION |
---|---|
N |
number of new points to add to the sequence
TYPE:
|
z_bds |
bounds on the 1d domain
TYPE:
|
z_pts |
current univariate sequence
TYPE:
|
wt_fcn |
weighting function, uses a constant weight if
TYPE:
|
method |
collocation method to use, currently only 'leja' is supported
DEFAULT:
|
opt_args |
extra arguments for the global 1d
DEFAULT:
|
RETURNS | DESCRIPTION |
---|---|
ndarray
|
the univariate sequence |
Source code in src/amisc/training.py
get(alpha, beta, y_vars=None, skip_nan=False)
Get the training data from the sparse grid for a given alpha
and beta
pair.
Source code in src/amisc/training.py
get_by_coord(alpha, coords, y_vars=None, skip_nan=False)
Get training data from the sparse grid for a given alpha
and list of grid coordinates. Try to replace
nan
values with imputed values. Skip any data points with remaining nan
values if skip_nan=True
.
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
coords |
a list of grid coordinates to locate the
TYPE:
|
y_vars |
the keys of the outputs to return (if
TYPE:
|
skip_nan |
skip any data points with remaining
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
|
Source code in src/amisc/training.py
impute_missing_data(alpha, beta)
Impute missing values in the sparse grid for a given multi-index pair by linear regression imputation.
Source code in src/amisc/training.py
is_one_level_refinement(beta_old, beta_new)
staticmethod
Check if a new beta
multi-index is a one-level refinement from a previous beta
.
Example
Refining from (0, 1, 2)
to the new multi-index (1, 1, 2)
is a one-level refinement. But refining to
either (2, 1, 2)
or (1, 2, 2)
are not, since more than one refinement occurs at the same time.
PARAMETER | DESCRIPTION |
---|---|
beta_old |
the starting multi-index
TYPE:
|
beta_new |
the new refined multi-index
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
whether |
Source code in src/amisc/training.py
refine(alpha, beta, input_domains, weight_fcns=None)
Refine the sparse grid for a given alpha
and beta
pair and given collocation rules. Return any new
grid points that do not have model evaluations saved yet.
Note
The beta
multi-index is used to determine the number of collocation points in each input dimension. The
length of beta
should therefore match the number of variables in x_vars
.
Source code in src/amisc/training.py
293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 |
|
set(alpha, beta, coords, yi_dict)
Store model output yi_dict
values.
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
coords |
a list of grid coordinates to locate the
TYPE:
|
yi_dict |
a
TYPE:
|
Source code in src/amisc/training.py
set_errors(alpha, beta, coords, errors)
Store error information in the sparse-grid for a given multi-index pair.
Source code in src/amisc/training.py
TrainingData
Bases: Serializable
, ABC
Interface for storing and collecting surrogate training data. TrainingData
objects should:
get
- retrieve the training dataset
- store the training datarefine
- generate new design points for the parentComponent
modelclear
- clear all training dataset_errors
- store error information (if desired)impute_missing_data
- fill in missing values in the training data (if desired)
clear()
abstractmethod
from_dict(config)
classmethod
Create a TrainingData
object from a dict
configuration. Currently, only method='sparse-grid'
is
supported for the SparseGrid
class.
Source code in src/amisc/training.py
get(alpha, beta, y_vars=None, skip_nan=False)
abstractmethod
Return the training data for a given multi-index pair.
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
y_vars |
the keys of the outputs to return (if
TYPE:
|
skip_nan |
skip any data points with remaining
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[Dataset, Dataset]
|
|
Source code in src/amisc/training.py
impute_missing_data(alpha, beta)
abstractmethod
Impute missing values in the training data for a given multi-index pair (just pass if you don't care).
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
Source code in src/amisc/training.py
refine(alpha, beta, input_domains, weight_fcns=None)
abstractmethod
Return new design/training points for a given multi-index pair and their coordinates/locations in the
TrainingData
storage structure.
Example
The returned data coordinates coords
should be any object that can be used to locate the corresponding
x_train
training points in the TrainingData
storage structure. These coords
will be passed back to the
set
function to store the training data at a later time (i.e. after model evaluation).
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
input_domains |
a
TYPE:
|
weight_fcns |
a
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tuple[list[Any], Dataset]
|
a list of new data coordinates |
Source code in src/amisc/training.py
set(alpha, beta, coords, yi_dict)
abstractmethod
Store training data for a given multi-index pair.
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
coords |
locations for storing the
TYPE:
|
yi_dict |
a
TYPE:
|
Source code in src/amisc/training.py
set_errors(alpha, beta, coords, errors)
abstractmethod
Store error information for a given multi-index pair (just pass if you don't care).
PARAMETER | DESCRIPTION |
---|---|
alpha |
the model fidelity indices
TYPE:
|
beta |
the surrogate fidelity indices
TYPE:
|
coords |
locations for storing the error information in the underlying data structure
TYPE:
|
errors |
a list of error dictionaries, should be the same length as
TYPE:
|