Multivariate Gaussian

Data Type: Sequence[float]

The probability density function of a d-dimensional multivariate Gaussian random variable \((X_1, X_2, ..., X_d)\) with mean \(\boldsymbol{\mu}\) and postive definite covariance matrix \(\Sigma\) is given by

\[f(\boldsymbol{x} | \boldsymbol{\mu}, \Sigma) = \left(\frac{1}{2\pi}\right)^{d/2} \vert\Sigma\vert^{d/2} \exp{\left(-frac{1}{2}\left(\boldsymbol{x} - \boldsymbol{\mu}\right)^{t} \Sigma^{-1} \left(\boldsymbol{x} - \boldsymbol{\mu}\right)\right)}.\]

If you are assuming \(Cov(X_i, X_j) = 0 \; i \neq j\), a faster and more efficient option is the Diagonal Multivariate Gaussian.

For more info see Multivariate Normal Distribution.

MultivariateGaussianDistribution

class dml.stats.mvn.MultivariateGaussianDistribution(mu, covar, name=None, keys=None)

MultivariateGaussianDistribution object for multivariate Gaussian with mean mu and covaraince ‘covar’.

dim

N is the dim of multivariate normal.

Type:

int

mu

Length N numpy array

Type:

np.ndarray

covar

N by N numpy array for Covariance matrix.

Type:

np.ndarray

chol

Cholesky decomposition of covar.

Type:

np.ndarray

lower

Flag for lower (False for upper)

Type:

bool

name

Set name to object.

Type:

Optional[str]

keys

Set keys for distribution.

Type:

Optional[str]

self.use_lstsq

Cholesky does not exist so use least squares approx.

Type:

bool

self.chol_const

det from covar if lstsq is to be used.

Type:

float

__init__(mu, covar, name=None, keys=None)

MultivariateGaussianDistribution object.

Parameters:
  • mu (Union[List[float], np.ndarray]) – N-dimensional mean.

  • covar (Union[List[List[float]], np.ndarray]) – Covariance matrix, should be N by N and positive definite.

  • name (Optional[str]) – Set name to object.

  • keys (Optional[str]) – Set keys for distribution.

density(x)

Evaluate the density at x.

Parameters:

x (np.ndarray) – Observation from multivariate Gaussian distribution.

Returns:

Density at x.

Return type:

float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

MultivariateGaussianDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Returns:

ParameterEstimator

log_density(x)

Evaluate the log-density at x.

Parameters:

x (np.ndarray) – Observation from multivariate Gaussian distribution.

Returns:

Log-density at x.

Return type:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

MultivariateGaussianEstimator

class dml.stats.mvn.MultivariateGaussianEstimator(dim=None, pseudo_count=(None, None), suff_stat=(None, None), name=None, keys=None)

MultivariateGaussianEstimator object for estimating multivariate normal distribution from sufficient stats.

dim

Dimension of multivariate normal.

Type:

int

pseudo_count

Regularize mean and/or covariance.

Type:

Optional[Tuple[Optional[float], Optional[float]]]

prior_mu

Mean from prior data or used to regularize.

Type:

Optional[np.ndarray]

prior_covar

Covariance matrix from prior data or used to regularize.

Type:

Optional[np.ndarray]

name

Set name to object.

Type:

Optional[str]

keys

Keys for merging sufficient statistics.

Type:

Optional[str]

__init__(dim=None, pseudo_count=(None, None), suff_stat=(None, None), name=None, keys=None)

MultivariateGaussianEstimator object.

Parameters:
  • dim (Optional[int]) – Dimension of multivariate normal. Inferred from ‘suff_stat’ if None.

  • pseudo_count (Optional[Tuple[Optional[float], Optional[float]]]) – Regularize mean and/or covariance.

  • suff_stat (Optional[Tuple[Optional[np.ndarray], Optional[np.ndarray]]]) – Mean and covariance estimated from previous data or used to regularize.

  • name (Optional[str]) – Set name for object instance.

  • keys (Optional[str]) – Set keys for estimator.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

MultivariateGaussianAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

MultivariateGaussianDistribution

Returns:

SequenceEncodableProbabilityDistribution

MultivariateGaussianSampler

class dml.stats.mvn.MultivariateGaussianSampler(dist, seed=None)

MultivariateGaussianSampler object for sampling from MultivariateGaussianDistribution.

rng

Sets seed for generating samples.

Type:

RandomState

dist

MultivariateGaussianDistribution to sample from.

Type:

MultivariateGaussianDistribution

sample(size=None)

Generate samples from MultivariateGaussianDistribution.

Parameters:

size (Optional[int]) – Number of samples to generate.

Returns:

Size by dim number of samples.

Return type:

np.ndarray