Observables
Observables define outputs or derived quantities produced by the simulator which are of interest to the user, either for the purpose of inference (i.e. conditioning on data), validation, or debugging.
The observables API is one of the key features of SimulationBasedInference
that separates the package from other software tools for automated Bayesian inference. Many such tools consider only models of the form $y = f(\theta)$ where $\theta$ are the uknowns to be inferred and $y$ are the outputs of the simulator $f$ to be compared to data. While mathematically convenient, this formulation ignores some basic practial realities of many scientific and engineering use cases:
- The simulator may be computaionally non-trivial to evaluate, i.e. $f(\theta)$ may require significant computational resources to evaluate.
- The output of the simulator is often itself large and complex, e.g. a solution to a set of partial differential equations.
- The output of the simulator is not always directly comparable to the available data.
In light of this, SimulationBasedInference
formally treats the simulator inference pipeline as a composition of two parts:
\[\begin{align*} u &= f(\theta) \\ y_i &= g_i(u) \end{align*}\]
where $u$ is the output or latent state of the simulator and $g_i$ for $i=1\dots m$ are $m$ quantities derived from $u$. More importantly, for the case where $f$ is a dynamical system defined by an ODEProblem
or similar and the solution $u(t)$ is a function over time (and possibly space), observables $g_i$ can be defined as functionals which are evaluated during each simulation thereby circumventing the need to store the solution u
explicitly.
Basic usage
Observables are constructed using the SimulatorObservable
type which consists of the following components:
- A name/identifier represented as a
Symbol
(e.g.:y1
) - A function of the form $y_i = g_i(u)$ where $u$ is the (possibly transient) state of the simulator and $y_i$ is the output.
- An output type which defines the behavior of the observable and how the outputs are stored.
- A set of coordinates for $y_i$ which define its shape and indices.
Currently, SimulationBasedInference
provides two types of observable outputs:
- Transient output types (with alias
TransientObservable
) which naively evaluateg_i
and store a pointer to the return value. Repeated evaluations of the observable overwrite previous values. - TimeSampled output types (with alias
TimeSampledObservable
) for dynamical systems which define a sampling frequency and a set of save times. Outputs ofg_i
at each sample time are buffered and then aggregated at each save time according to some reducer function (e.g.mean
,sum
, etc.).
Type specific dispatches of the SimulatorObservable
constructor for each output type as a convenience.
SimulationBasedInference.SimulatorObservable
— MethodSimulatorObservable(name::Symbol, f, coords::Tuple)
Constructs a Transient
observable with state mapping function f
and coordinates coords
.
where coords
are a Tuple
of dimension sizes (e.g. (2,)
for a 2-dimensional output) or coordinates (e.g. (1:2,)
) that index the dimensions of the output. Dimensions from DimensionalData
can also be directly supplied (e.g. (X(1:10),Y(1:10))
). Note that the output of g_i
must match the shape specified by output_dims
.
For time sampled observables of dynamical systems, it is necessary to additionally provide an initial time stamp t0
and a vector of timestamps ts
which specify the time points at which the the buffered observable values should be saved.
SimulationBasedInference.SimulatorObservable
— MethodSimulatorObservable(
name::Symbol,
obsfunc,
t0::tType,
tsave::AbstractVector{tType},
output_shape_or_coords::Tuple;
reducer=mean,
samplerate=default_sample_rate(tsave),
) where {tType}
Constructs a TimeSampled
observable which iteratively samples and stores outputs on each call to observe!
.
Here is a simple example for a generic ODEProblem
from the SciML
package ecosystem:
prob = ODEProblem(dudt, u0 tspan)
output_dims = size(u0)
t0 = tspan[1]
ts = LinRange(tspan[1], tspan[2], 100)
# this observable will simply save the full state at each save point in `ts`
obs = SimulatorObservable(:yt, state -> state.u, t0, ts, output_dims)
For time sampled observables, output_dims
represents the shape of the output at each time step, i.e. without the time dimension which is imlicitly defined by the number of save points.
Defining custom observables
In general, it is recommended to define custom observables using one of the pre-defined output types of SimulatorObservable
described above.
However, in some cases, it may be easier to define a new observable type that is tailored to custom-tailored to a particular simulator or use case.
SimulationBasedInference.Observable
— TypeObservable{outputType<:SimulatorOutput}
Base type for observables with the given outputType
.
This can be done by extending SimulatorObservable
with a new SimulatorOutput
type, or by defining an entirely new subtype of Observable. Generally the former is recommended.
In either case, subtypes of Observable
must implement the following methods.
SimulationBasedInference.initialize!
— Methodinitialize!(::Observable, state)
Initialize the Observable
from the given simulator state.
SimulationBasedInference.observe!
— Methodobserve!(::Observable, state)
Computes and stores the relevant state variables from the given simulator state.
SimulationBasedInference.getvalue
— Methodgetvalue(::Observable, ::Type{T}=Any) where {T}
Retreive the obsevable at all coordinates.
SimulationBasedInference.setvalue!
— Methodsetvalue!(obs::Observable, value)
Overwrites the value of this observable. The type of value
will depend on the type of the observable. This should generally only be used for testing and emulation purposes.
SimulationBasedInference.coordinates
— Methodcoordinates(obs::Observable)
Retrieves coordinates for each dimension of the observables as a Tuple
with length matching the number of dimensions.
New subtypes of Observable
should additionally implement Base.nameof to return a Symbol
corresponding to the unique identifier of the observable.