Composite Distribution

The composite distribution is the staple distribtion of DMixLearn that allows for distributions over heterogenous tuples of data. Assume we have observed a d-dimensional tuple \(x=(x_1, x_2, \dots, x_d)\) with component-wise data types \((T_1, T_2, \dots, T_d)\). The composite distribution models the tuple with a likelihood

\[f(x_1, \dots, x_d \vert \theta_1, \dots, \theta_k) = \prod_{i=1}^{d} f(x_i \vert \theta_i)\]

where \(f(x_i \vert \theta_i)\) are distributions compatible with component data type \(T_i\).

CompositeDistribution

class dml.stats.composite.CompositeDistribution(dists, name=None, keys=None)

CompositeDistribution for modeling tuples of heterogeneous data.

dists

Distributions for each component.

Type:

Tuple[SequenceEncodableProbabilityDistribution, …]

count

Number of components (i.e. len(dists)).

Type:

int

name

Name of object.

Type:

Optional[str]

keys

Key for marking shared parameters.

Type:

Optional[str]

__init__(dists, name=None, keys=None)

Create an instance of CompositeDistribution.

Parameters:
  • dists (Sequence[SequenceEncodableProbabilityDistribution]) – Component distributions.

  • name (Optional[str], optional) – Name of object. Defaults to None.

  • keys (Optional[str], optional) – Key for marking shared parameters. Defaults to None.

density(x)

Evaluate density of CompositeDistribution for a single observation tuple x.

Parameters:

x (Tuple[Any, ...]) – Tuple of length = len(dists), the k-th data type must be consistent with dists[k].

Returns:

Density value.

Return type:

float

dist_to_encoder()

Return a CompositeDataEncoder for this distribution.

Returns:

Encoder object.

Return type:

CompositeDataEncoder

estimator(pseudo_count=None)

Create CompositeEstimator for estimating CompositeDistribution.

Parameters:

pseudo_count (Optional[float], optional) – Used to inflate sufficient statistics in estimation.

Returns:

Estimator object.

Return type:

CompositeEstimator

log_density(x)

Evaluate log-density of CompositeDistribution for a single observation tuple x.

Parameters:

x (Tuple[Any, ...]) – Tuple of length = len(dists), the k-th data type must be consistent with dists[k].

Returns:

Log-density value.

Return type:

float

sampler(seed=None)

Create CompositeSampler for sampling from CompositeDistribution instance.

Parameters:

seed (Optional[int], optional) – Seed to set for sampling with RandomState. Defaults to None.

Returns:

Sampler object.

Return type:

CompositeSampler

seq_log_density(x)

Vectorized evaluation of log density for CompositeEncodedDataSequence.

Parameters:

x (CompositeEncodedDataSequence) – EncodedDataSequence for Composite Distribution.

Returns:

Log-density evaluated at all encoded data points.

Return type:

np.ndarray

Raises:

Exception – If input is not a CompositeEncodedDataSequence.

CompositeEstimator

class dml.stats.composite.CompositeEstimator(estimators, keys=None, name=None)

Estimator for CompositeDistribution.

estimators

Estimators for each component.

Type:

Sequence[ParameterEstimator]

keys

Keys used for merging sufficient statistics.

Type:

Optional[str]

count

Number of components.

Type:

int

name

Name of the object.

Type:

Optional[str]

__init__(estimators, keys=None, name=None)

Initialize CompositeEstimator.

Parameters:
  • estimators (Sequence[ParameterEstimator]) – Estimators for each component.

  • keys (Optional[str], optional) – Keys used for merging sufficient statistics. Defaults to None.

  • name (Optional[str], optional) – Name of the object. Defaults to None.

Raises:

TypeError – If keys is not a string or None.

accumulator_factory()

Return a CompositeAccumulatorFactory for this estimator.

Returns:

Factory object.

Return type:

CompositeAccumulatorFactory

estimate(nobs, suff_stat)

Estimate a CompositeDistribution from aggregated sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations used to form suff_stat.

  • suff_stat (Tuple[Any, ...]) – Tuple of sufficient statistics for each estimator.

Returns:

Estimated distribution.

Return type:

CompositeDistribution

CompositeSampler

class dml.stats.composite.CompositeSampler(dist, seed=None)

CompositeSampler used to generate samples from CompositeDistribution.

dist

CompositeDistribution to draw samples from.

Type:

CompositeDistribution

rng

RandomState with seed set if provided.

Type:

RandomState

dist_samplers

List of DistributionSamplers for each component.

Type:

List[DistributionSampler]

sample(size=None)

Generate independent samples from a CompositeDistribution.

If size is None, draw one sample and return as Tuple of length = len(dists). If size > 0, draw size samples and return a list of length size containing tuples of len(dists).

Parameters:

size (Optional[int], optional) – If None, draw 1 sample. Else, draw size number of iid samples.

Returns:

A tuple of length = len(dists) or a list of length size containing tuples of length = len(dists).

Return type:

Union[List[Tuple[Any, …]], Tuple[Any, …]]