Heterogeneous Mixture Distribution
The Heterogeneous mixture distribution can be used to assign heterogeneous mixture components to the Mixture Distribution. For example, consider observing a postive float x. We can define a two component mixture to be \(f_1(x \vert \lambda) \sim Exp(\lambda)\) and \(f_2(x \vert \mu, \sigma) \sim LogNormal(\mu, \sigma)\). The only requirement for the components of the heterogeneous mixture is that the components distributions have the same support as the data type of x.
HeterogeneousMixtureDistribution
- class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureDistribution(components, w, name=None, keys=(None, None))
HeterogeneousMixtureDistribution object defined by component distributions and weights.
- components
List of component distributions (data type T).
- Type:
Sequence[SequenceEncodableProbabilityDistribution]
- w
Mixture weights assigned from args (w).
- Type:
np.ndarray
- name
String name for the HeterogeneousMixtureDistribution object.
- Type:
Optional[str]
- zw
True if a weight is 0.0, else False.
- Type:
np.ndarray
- log_w
Log of weights (w). Set to -np.inf where zw is True.
- Type:
np.ndarray
- num_components
Number of components in HeterogeneousMixtureDistribution instance.
- Type:
int
- keys
Keys for weights and components.
- Type:
Tuple[Optional[str], Optional[str]]
- __init__(components, w, name=None, keys=(None, None))
Initialize HeterogeneousMixtureDistribution.
- Parameters:
components (Sequence[SequenceEncodableProbabilityDistribution]) – Set component distributions. Must all be compatible with type T.
w (Union[Sequence[float], np.ndarray]) – Mixture weights, must sum to 1.0.
name (Optional[str], optional) – Assign string name to HeterogeneousMixtureDistribution object.
keys (Tuple[Optional[str], Optional[str]], optional) – Keys for weights and components.
- component_log_density(x)
Evaluate component-wise log-density of heterogeneous mixture distribution at observation x.
Returns a num_components-dimensional array with \(\log(f_k(x))\) in each entry.
- Parameters:
x (T) – Single observation from mixture distribution. T is data type of components.
- Returns:
Component-wise log-density at x.
- Return type:
np.ndarray
- density(x)
Evaluate density of heterogeneous mixture distribution at observation x.
- Parameters:
x (T) – Single observation from heterogeneous mixture distribution. T is data type of components.
- Returns:
Density at x.
- Return type:
float
- dist_to_encoder()
Return a HeterogeneousMixtureDataEncoder for this distribution.
- Returns:
Encoder object.
- Return type:
HeterogeneousMixtureDataEncoder
- estimator(pseudo_count=None)
Return a HeterogeneousMixtureEstimator for this distribution.
- Parameters:
pseudo_count (Optional[float], optional) – Pseudo-count for regularization.
- Returns:
Estimator object.
- Return type:
- log_density(x)
Evaluate log-density of heterogeneous mixture distribution at observation x.
\[\log{f(x)} = \log{\left(\sum_{k=1}^{K} f_k(x) \pi_k\right)}.\]- Parameters:
x (T) – Single observation from mixture distribution. T is data type of components.
- Returns:
Log-density at x.
- Return type:
float
- posterior(x)
Obtain the posterior distribution for each heterogeneous mixture component at observation x.
\[f(z=k \vert x ) = \frac{f_k(x) \pi_k}{\sum_{k=1}^{K} f_k(x) \pi_k}\]- Parameters:
x (T) – Single observation from mixture distribution. T is data type of components.
- Returns:
Posterior distribution at observation x.
- Return type:
np.ndarray
- sampler(seed=None)
Return a HeterogeneousMixtureSampler for this distribution.
- Parameters:
seed (Optional[int], optional) – Seed for random number generator.
- Returns:
Sampler object.
- Return type:
- seq_component_log_density(x)
Vectorized evaluation of component-wise log-density for encoded sequence x.
- Parameters:
x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.
- Returns:
2-d array of shape (n_samples, n_components).
- Return type:
np.ndarray
- seq_log_density(x)
Vectorized evaluation of log-density for encoded sequence x.
- Parameters:
x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.
- Returns:
log_density of each observation in encoded sequence.
- Return type:
np.ndarray
- seq_posterior(x)
Vectorized evaluation of posterior of HeterogeneousMixtureDistribution for encoded sequence x.
- Parameters:
x (HeterogeneousMixtureEncodedDataSequence) – EncodedDataSequence for Heterogeneous Mixture.
- Returns:
Posterior probabilities for each observation in encoded sequence.
- Return type:
np.ndarray
HeterogeneousMixtureEstimator
- class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureEstimator(estimators, fixed_weights=None, suff_stat=None, pseudo_count=None, name=None, keys=(None, None))
Estimator for HeterogeneousMixtureDistribution from aggregated sufficient statistics.
- estimators
Estimators for the mixture components.
- Type:
Sequence[ParameterEstimator]
- fixed_weights
Fixed weights for the mixture (if any).
- Type:
Optional[np.ndarray]
- suff_stat
Sufficient statistics for the weights.
- Type:
Optional[np.ndarray]
- pseudo_count
Pseudo-count for regularization.
- Type:
Optional[float]
- name
Name for the estimator.
- Type:
Optional[str]
- keys
Keys for the weights and component distributions.
- Type:
Tuple[Optional[str], Optional[str]]
- __init__(estimators, fixed_weights=None, suff_stat=None, pseudo_count=None, name=None, keys=(None, None))
Initialize HeterogeneousMixtureEstimator.
- Parameters:
estimators (Sequence[ParameterEstimator]) – Estimators for the mixture components.
fixed_weights (Optional[np.ndarray], optional) – Fixed weights for the mixture.
suff_stat (Optional[np.ndarray], optional) – Sufficient statistics for the weights.
pseudo_count (Optional[float], optional) – Pseudo-count for regularization.
name (Optional[str], optional) – Name for the estimator.
keys (Tuple[Optional[str], Optional[str]], optional) – Keys for the weights and component distributions.
- Raises:
TypeError – If keys is not a tuple of two strings or None.
- accumulator_factory()
Return a HeterogeneousMixtureAccumulatorFactory for this estimator.
- Returns:
Factory object.
- Return type:
HeterogeneousMixtureAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate a HeterogeneousMixtureDistribution from sufficient statistics.
- Parameters:
nobs (Optional[float]) – Number of observations (not used).
suff_stat (Tuple[np.ndarray, Tuple[Any, ...]]) – Sufficient statistics.
- Returns:
Estimated distribution.
- Return type:
HeterogeneousMixtureSampler
- class dmx.stats.heterogeneous_mixture.HeterogeneousMixtureSampler(dist, seed=None)
Sampler for HeterogeneousMixtureDistribution.
- dist
Distribution to sample from.
- rng
Seeded RandomState for sampling.
- Type:
RandomState
- comp_samplers
List of DistributionSampler objects for each mixture component.
- Type:
List[DistributionSampler]
- sample(size=None)
Draw iid samples from a heterogeneous mixture distribution.
- Parameters:
size (Optional[int], optional) – Number of iid samples to draw.
- Returns:
Single sample or list of samples.
- Return type:
Any or Sequence[Any]