Use variables
Variables are the basic objects used as inputs or outputs in a model. In this guide, we will learn how to construct and use variables within amisc
.
Construct a variable
In its most basic form, a variable is just a placeholder with a name, just like \(x\) in the equation \(y=x^2\).
from amisc import Variable
x = Variable() # implicitly named 'x'
y = Variable('y') # explicitly named 'y'
Variables can also have several descriptive attributes assigned to them.
x = Variable(name='x1', # the main identification string of the variable
nominal=1, # a nominal value
description='My first variable', # a lengthier description
units='rad/s', # units
tex='$x_1$', # a latex representation (for plotting/displaying)
category='calibration') # for further classification (can be anything)
The variable's name
is the key identifier of the variable, and allows the variable to be treated symbolically as a string. For example:
x = Variable('x')
assert x == 'x'
d = {x: 'You can use the variable as a key in hash structures'}
assert d[x] == d['x']
In addition, a useful data structure for lists of Variables
is the VariableList
:
from amisc import VariableList, Variable
var_list = VariableList(['a', 'b', 'c'])
assert var_list['a'] == 'a' # can use 'dict'-like access of variables
assert isinstance(var_list['a'], Variable) # stores the actual Variable objects
assert var_list[2] == var_list['c'] # can also use normal 'list' indexing
An important attribute of Variables
in the context of amisc
is their domain, which must be defined when building surrogates:
There are three more attributes of variables that we will cover in the next sections: normalization, distributions (for random variables), and compression (for field quantities).
Normalization
In the context of surrogates, it is sometimes advantageous to approximate over a transformed, or normalized input space. For example, a variable defined over the domain \(x\in (0.001, 100)\) covers many orders of magnitude, which may be difficult to directly approximate using a polynomial surrogate. There are four basic normalizations provided by amisc
:
from amisc.transform import Log, Linear, Minmax, Zscore
log = Log((10, 0)) # base 10 log with 0 offset
linear = Linear((0.5, 1)) # slope of 0.5 and offset of 1
minmax = Minmax((-20, 20, 0, 1)) # scale from (-20, 20) -> (0, 1)
zscore = Zscore((5, 2)) # (x - mu) / sigma
These may also be specified as an equivalent string representation. The transform method should be passed as the norm
attribute of the variable:
Values can then be normalized or denormalized directly by the variable:
import numpy as np
x = Variable(norm='log10')
values = 10 ** (np.random.rand(20))
assert np.allclose(x.denormalize(x.normalize(values)), values)
When a variable has a norm
, the surrogate will select new training points in the transformed space and also compute the approximation on normalized inputs. If a variable is an output and has a norm
, then the surrogate will fit the approximation to the normalized output.
Building a surrogate in normalized space
Consider the variable defined as:
The surrogate will construct an approximation over the transformed domain \((-3, 2)\). When predicting with the surrogate, inputs will automatically have the same transform applied \(\tilde{x} = \log_{10}(x)\in(-3, 2)\) before computing the surrogate.New transforms can be created by extending the amisc.transform.Transform
base class. In addition, multiple transforms can be applied in series by passing a list of transforms to the norm
attribute. For example, x = Variable(norm=['log10', 'minmax'])
will apply a minmax
transform over the log10
space of x
.
Random variables
A common use of surrogates is to permit propagating uncertain random variable inputs through a complicated simulation to quantify output uncertainty or to calibrate the model parameters. To this end, a Variable
can be given a PDF through the distribution
attribute. Several common PDFs are provided in amisc.distribution
.
uniform = Variable(distribution='U(0, 1)')
normal = Variable(distribution='N(0, 1)')
log_uniform = Variable(distribution='LU(1e-3, 1e2)')
log_normal = Variable(distribution='LN(-2, 1)')
With a distribution, variable's can sample from the PDF or evaluate the PDF of values under the distribution:
When a variable has a distribution, the surrogate will select new training points during fit()
that are clustered closer to areas of greater weight. New distributions can be created by extending the amisc.distribution.Distribution
base class.
Field quantities
By default, all variables are treated as scalar quantities. However, it is sometimes possible to have high-dimensional variables, such as the solution of a simulation on a PDE mesh -- we refer to these variables as "field quantities". For field quantities to be useful in the context of amisc
surrogates, we must be able to "compress" them to lower dimension such that we can effectively build surrogate approximations in an appropriate low-dimensional "latent" space.
To this end, a field quantity is defined by giving a compression
attribute to a variable. A compression method must:
- define a set of coordinates on which the field quantity exists (i.e. the Cartesian points from a simulation mesh grid),
- define a "map" that both compresses field quantity data into the latent space and reconstructs the full field quantity back from the latent space, and
- have a predetermined size (or "rank") of the latent space.
Compression coordinates should be an array of shape \((N, D)\), where \(N\) is the total number of grid points and \(D\) is the Cartesian dimension (i.e. 1d, 2d, etc.). A single field quantity Variable
may contain several QoIs on the same grid coordinates, so that the total number of "degrees of freedom" (DoF) of the variable is equal to \(N\times Q\), where \(Q\) is the number of QoIs.
For example, say a simulation outputs the \(x, y, z\) components of velocity on an unstructured mesh of 1000 nodes. We might define a velocity field quantity as:
vel = Variable('velocity', compression=dict(coords=sim_coords,
fields=['ux', 'uy', 'uz'],
method='svd'))
print(sim_coords.shape) # (num_pts, dim)
assert vel.compression.dof == 1000 * 3 # (num_pts * num_qoi)
sim_coords
that we extracted from our simulation. Currently, SVD is the only available compression method, but other methods can be used by implementing the amisc.compression.Compression
base class.
In order to make use of this field quantity when building surrogates, we'll need to call compression.compute_map()
, which for SVD
requires passing a data matrix and a desired rank
of the truncation.
SVD compression
To use the SVD compression method, we need to form a "data matrix" of shape (dof, num_samples)
, where dof
is the original (N, Q)
field quantity flattened to dof
, and num_samples
are several samples of the full field quantity (such as for varying simulation inputs). In other words, each column of the data matrix is a "snapshot" of the simulation output for this field quantity.
sim_coords = np.random.rand(1000, 3) # (i.e. load actual Cartesian coords from a result file)
num_samples = 200
dof = 3000
data_matrix = np.empty((dof, num_samples))
for i in range(num_samples):
simulation_data = np.random.rand(1000, 3) # (N, Q) simulation data (i.e. load from a result file)
data_matrix[:, i] = np.ravel(simulation_data)
vel = Variable(compression=dict(coords=sim_coords, fields=['ux', 'uy', 'uz'], method='svd'))
vel.compression.compute_map(data_matrix, rank=10)
# Now we can use the compression map to compress/reconstruct new values
new_sim_data = {'ux': np.random.rand(1000), 'uy': ..., 'uz': ...}
latent_data = vel.compress(new_sim_data)
reconstructed = vel.reconstruct(latent_data)
Once the compression map has been computed, we can compress or reconstruct new field quantity data:
new_sim_data = {'field1': ..., 'field2': ...} # arrays of shape (num_pts,) for each QoI in compression.fields
latent_data = vel.compress(new_sim_data) # a single array of shape (rank,) with the key 'latent'
reconstructed = vel.reconstruct(latent_data) # arrays of shape (num_pts,) for each reconstructed QoI
You can optionally pass new coordinates to compress()
and reconstruct()
, so that the data will be interpolated to/from any set of coordinates to the original compression.coords
(e.g. if the new data is not defined on the same grid).
If you also pass a norm
method to a field quantity Variable
, then raw simulation data will be normalized first by the indicated method before compression. In general, the compression workflow is interpolate → normalize → compress and vice versa for reconstruction. The interpolate step is required to make sure the data aligns with the compression map's coordinates. See Variable.compress
for more details.
Unlike scalar variables, the domain of a field quantity Variable
should be a list of domains, one for each "latent" dimension. Since it's typically not practical to know these domains ahead of time, you can either:
- Use the
Variable
to compress some example data and extract the latent domains manually, - Use the built-in
Compression.estimate_latent_ranges()
function (which forSVD
will compress thedata_matrix
and estimate latent ranges from there), - Specify a single, conservative domain (like
(-10, 10)
) that will be used for all the latent dimensions at runtime, or - Leave the domain empty, and have
System.fit()
estimate and update the domains from a test set.
The only time you would need to worry about specifying the latent domains is if you are intending on using a field quantity as an input to any of your component models.
As a final note on field quantities, once you've defined and computed the compression map, amisc
will internally use the compression map during training or prediction to convert the field quantity to/from the latent space. If you have a field quantity named "vel"
for example, amisc
will generate latent coefficients with the names "vel_LATENT0" ... "vel_LATENT1"
and so on up to the total size of the latent space. These temporary latent coefficients will be used as inputs and outputs until they are converted back to the full field quantity. So if you ever inspect raw data arrays returned by amisc
, you may find these temporary latent coefficients floating around. See the amisc.to_model_dataset
utility function for reconstructing such arrays.