Heterogeneous Mixture Distribution

The Heterogeneous mixture distribution can be used to assign heterogeneous mixture components to the Mixture Distribution. For example, consider observing a postive float x. We can define a two component mixture to be \(f_1(x \vert \lambda) \sim Exp(\lambda)\) and \(f_2(x \vert \mu, \sigma) \sim LogNormal(\mu, \sigma)\). The only requirement for the components of the heterogeneous mixture is that the components distributions have the same support as the data type of x.

HeterogeneousMixtureDistribution

class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureDistribution(components, w, name=None, keys=(None, None))

HeterogeneousMixtureDistribution object defined by component distributions and weights.

components

List of component distributions (data type T).

Type:

Sequence[SequenceEncodableProbabilityDistribution]

w

Mixture weights assigned from args (w).

Type:

np.ndarray

name

String name for the HeterogeneousMixtureDistribution object.

Type:

Optional[str]

zw

True if a weight is 0.0, else False.

Type:

np.ndarray

log_w

Log of weights (w). Set to -np.inf where zw is True.

Type:

np.ndarray

num_components

Number of components in HeterogeneousMixtureDistribution instance.

Type:

int

keys

Keys for weights and components.

Type:

Tuple[Optional[str], Optional[str]]

__init__(components, w, name=None, keys=(None, None))

Initialize HeterogeneousMixtureDistribution.

Parameters:
  • components (Sequence[SequenceEncodableProbabilityDistribution]) – Set component distributions. Must all be compatible with type T.

  • w (Union[Sequence[float], np.ndarray]) – Mixture weights, must sum to 1.0.

  • name (Optional[str], optional) – Assign string name to HeterogeneousMixtureDistribution object.

  • keys (Tuple[Optional[str], Optional[str]], optional) – Keys for weights and components.

component_log_density(x)

Evaluate component-wise log-density of heterogeneous mixture distribution at observation x.

Returns a num_components-dimensional array with \(\log(f_k(x))\) in each entry.

Parameters:

x (T) – Single observation from mixture distribution. T is data type of components.

Returns:

Component-wise log-density at x.

Return type:

np.ndarray

density(x)

Evaluate density of heterogeneous mixture distribution at observation x.

Parameters:

x (T) – Single observation from heterogeneous mixture distribution. T is data type of components.

Returns:

Density at x.

Return type:

float

dist_to_encoder()

Return a HeterogeneousMixtureDataEncoder for this distribution.

Returns:

Encoder object.

Return type:

HeterogeneousMixtureDataEncoder

estimator(pseudo_count=None)

Return a HeterogeneousMixtureEstimator for this distribution.

Parameters:

pseudo_count (Optional[float], optional) – Pseudo-count for regularization.

Returns:

Estimator object.

Return type:

HeterogeneousMixtureEstimator

log_density(x)

Evaluate log-density of heterogeneous mixture distribution at observation x.

\[\log{f(x)} = \log{\left(\sum_{k=1}^{K} f_k(x) \pi_k\right)}.\]
Parameters:

x (T) – Single observation from mixture distribution. T is data type of components.

Returns:

Log-density at x.

Return type:

float

posterior(x)

Obtain the posterior distribution for each heterogeneous mixture component at observation x.

\[f(z=k \vert x ) = \frac{f_k(x) \pi_k}{\sum_{k=1}^{K} f_k(x) \pi_k}\]
Parameters:

x (T) – Single observation from mixture distribution. T is data type of components.

Returns:

Posterior distribution at observation x.

Return type:

np.ndarray

sampler(seed=None)

Return a HeterogeneousMixtureSampler for this distribution.

Parameters:

seed (Optional[int], optional) – Seed for random number generator.

Returns:

Sampler object.

Return type:

HeterogeneousMixtureSampler

seq_component_log_density(x)

Vectorized evaluation of component-wise log-density for encoded sequence x.

Parameters:

x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.

Returns:

2-d array of shape (n_samples, n_components).

Return type:

np.ndarray

seq_log_density(x)

Vectorized evaluation of log-density for encoded sequence x.

Parameters:

x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.

Returns:

log_density of each observation in encoded sequence.

Return type:

np.ndarray

seq_posterior(x)

Vectorized evaluation of posterior of HeterogeneousMixtureDistribution for encoded sequence x.

Parameters:

x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.

Returns:

Posterior probabilities for each observation in encoded sequence.

Return type:

np.ndarray

HeterogeneousMixtureEstimator

class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureEstimator(estimators, fixed_weights=None, suff_stat=None, pseudo_count=None, name=None, keys=(None, None))

Estimator for HeterogeneousMixtureDistribution from aggregated sufficient statistics.

estimators

Estimators for the mixture components.

Type:

Sequence[ParameterEstimator]

fixed_weights

Fixed weights for the mixture (if any).

Type:

Optional[np.ndarray]

suff_stat

Sufficient statistics for the weights.

Type:

Optional[np.ndarray]

pseudo_count

Pseudo-count for regularization.

Type:

Optional[float]

name

Name for the estimator.

Type:

Optional[str]

keys

Keys for the weights and component distributions.

Type:

Tuple[Optional[str], Optional[str]]

__init__(estimators, fixed_weights=None, suff_stat=None, pseudo_count=None, name=None, keys=(None, None))

Initialize HeterogeneousMixtureEstimator.

Parameters:
  • estimators (Sequence[ParameterEstimator]) – Estimators for the mixture components.

  • fixed_weights (Optional[np.ndarray], optional) – Fixed weights for the mixture.

  • suff_stat (Optional[np.ndarray], optional) – Sufficient statistics for the weights.

  • pseudo_count (Optional[float], optional) – Pseudo-count for regularization.

  • name (Optional[str], optional) – Name for the estimator.

  • keys (Tuple[Optional[str], Optional[str]], optional) – Keys for the weights and component distributions.

Raises:

TypeError – If keys is not a tuple of two strings or None.

accumulator_factory()

Return a HeterogeneousMixtureAccumulatorFactory for this estimator.

Returns:

Factory object.

Return type:

HeterogeneousMixtureAccumulatorFactory

estimate(nobs, suff_stat)

Estimate a HeterogeneousMixtureDistribution from sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Number of observations (not used).

  • suff_stat (Tuple[np.ndarray, Tuple[Any, ...]]) – Sufficient statistics.

Returns:

Estimated distribution.

Return type:

HeterogeneousMixtureDistribution

HeterogeneousMixtureSampler

class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureSampler(dist, seed=None)

Sampler for HeterogeneousMixtureDistribution.

dist

Distribution to sample from.

Type:

HeterogeneousMixtureDistribution

rng

Seeded RandomState for sampling.

Type:

RandomState

comp_samplers

List of DistributionSampler objects for each mixture component.

Type:

List[DistributionSampler]

sample(size=None)

Draw iid samples from a heterogeneous mixture distribution.

Parameters:

size (Optional[int], optional) – Number of iid samples to draw.

Returns:

Single sample or list of samples.

Return type:

Any or Sequence[Any]