Joint Mixture Distribution

The Joint Mixture Distribution is a mixture of mixtures (see Mixture Distribution). This model is particularly useful when observations can belong to multiple latent groups simultaneously. This model can capture mutli-level clustering and dependencies. For \(K_1 = K_2\), this model can be viewed as a single-step Hidden Markov Distribution. The generative process for a Joint Mixture Model with \(K_1\) outer-states and \(K_2\) inner-states is described as

\[\begin{split} \begin{array}{ll} z_1 &\sim \boldsymbol{\pi} \\ z_2 \vert z_1 = k_1 &\sim \boldsymbol{\tau_{k_1}} \\ x \vert z_2 = k_2 &\sim f_k(x \vert \theta_{k_2}) \end{array}\end{split}\]

where the initial group membership is drawn \(P(Z_1 = k_1) = \pi_{k_1}\) and transition probability is given by \(P(Z_2 = k_2 \vert Z_1 = k_1) = \tau_{k_1, k_2}\).

JointMixtureDistribution

class pysp.stats.jmixture.JointMixtureDistribution(components1, components2, w1, w2, taus12, taus21, keys=(None, None, None), name=None)

JointMixtureDistribution object for defining a joint mixture distribution.

Notes

Data type is Tuple[T0, T1] where all components1 entries and component2 entries are compatible with T0 and T1 respectively.

components1

Mixture components for mixture of X1.

Type:: Sequence[SequenceEncodableProbabilityDistribution]

components2

Mixture components for mixture X2.

Type:: Sequence[SequenceEncodableProbabilityDistribution]

w1

Probability of drawing X1 from component i.

Type:: np.ndarray

w2

Probability of drawing X2 from component j.

Type:: np.ndarray

num_components1

Number of mixture components for X1.

Type:: int

num_components2

Number of mixture components for X2.

Type:: int

taus12

2-d Numpy array with probabilities of drawing X2 from comp j given X1 was drawn from comp i. Rows are component X1 state.

Type:: np.ndarray

taus21

2-d Numpy array with probabilities of drawing X1 from comp i given X2 was drawn from comp j. Rows are component X1 state.

Type:: np.ndarray

log_w1

Log-probability of drawing X1 from component i.

Type:: np.ndarray

log_w2

Log-probability of drawing X2 from component j.

Type:: np.ndarray

log_taus12

2-d Numpy array with log-probabilities of drawing X2 from comp j given X1 was drawn from comp i. Rows are component X1 state.

Type:: np.ndarray

log_taus21

2-d Numpy array with log-probabilities of drawing X1 from comp i given X2 was drawn from comp j. Rows are component X1 state.

Type:: np.ndarray

keys

Set keys for weights, mixture components of X1, mixture components of X2.

Type:: Optional[Tuple[Optional[str], Optional[str], Optional[str]]]

name

Set name to object.

Type:: Optional[str]

__init__(components1, components2, w1, w2, taus12, taus21, keys=(None, None, None), name=None)

JointMixtureDistribution object.

Parameters:

components1 (Sequence[SequenceEncodableProbabilityDistribution]) – Mixture components for mixture of X1.
components2 (Sequence[SequenceEncodableProbabilityDistribution]) – Mixture components for mixture X2.
w1 (np.ndarray) – Probability of drawing X1 from component i.
w2 (np.ndarray) – Probability of drawing X2 from component j.
taus12 (np.ndarray) – 2-d Numpy array with probabilities of drawing X2 from comp j given X1 was drawn from comp i. Rows are component X1 state.
taus21 (np.ndarray) – 2-d Numpy array with probabilities of drawing X1 from comp i given X2 was drawn from comp j. Rows are component X1 state.
keys (Optional[Tuple[Optional[str], Optional[str], Optional[str]]]) – Set keys for weights, mixture components of X1, mixture components of X2.
name (Optional[str]) – Set name to object.

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:: DataSequenceEncoder
Returns:: DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:: pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
Return type:: JointMixtureEstimator
Returns:: ParameterEstimator

log_density(x)

Evaluate the log-density of distribution.

Return type:: float
Returns:: float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:: seed (Optional[int]) – Set seed for drawing samples from distribution.
Return type:: JointMixtureSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:: x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
Return type:: ndarray
Returns:: np.ndarray

JointMixtureEstimator

class pysp.stats.jmixture.JointMixtureEstimator(estimators1, estimators2, suff_stat=None, pseudo_count=None, keys=(None, None, None), name=None)

JointMixtureEstimator object for estimating joint mixture distribution from aggregated sufficient stats.

estimators1

Estimators for mixture component of X1.

Type:: Sequence[ParameterEstimator]

estimators2

Estimators for mixture component of X2.

Type:: Sequence[ParameterEstimator]

suff_stat

pseudo_count

Used to re-weight the state counts in estimation.

Type:: Optional[Tuple[float, float, float]]

keys

Set keys for weights, mixture components of X1, mixture components of X2.

Type:: Optional[Tuple[Optional[str], Optional[str], Optional[str]]]

name

Set name to object.

Type:: Optional[str]

__init__(estimators1, estimators2, suff_stat=None, pseudo_count=None, keys=(None, None, None), name=None)

JointMixtureEstimator object.

Parameters:

estimators1 (Sequence[ParameterEstimator]) – Estimators for mixture component of X1.
estimators2 (Sequence[ParameterEstimator]) – Estimators for mixture component of X2.
suff_stat (Optional[Tuple[ndarray, ndarray, ndarray, Tuple[TypeVar(E0), ...], Tuple[TypeVar(E1), ...]]])
pseudo_count (Optional[Tuple[float, float, float]]) – Used to re-weight the state counts in estimation.
keys (Optional[Tuple[Optional[str], Optional[str], Optional[str]]]) – Set keys for weights, mixture components of X1, mixture components of X2.
name (Optional[str]) – Set name to object.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:: JointMixtureEstimatorAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:

nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

JointMixtureDistribution

Returns:

SequenceEncodableProbabilityDistribution

JointMixtureSampler

class pysp.stats.jmixture.JointMixtureSampler(dist, seed=None)

JointMixtureSampler object for sampling from a joint mixture distribution.

rng

RandomState for seeding samples.

Type:: RandomState

dist

Distribution to sample from.

Type:: JointMixtureDistribution

comp_sampler1

Inner-mixture sampler.

Type:: DistributionSampler

comp_sampler2

Outer-mixture sampler.

Type:: DistributionSampler

sample(size=None)

Generate samples from distribution.

Parameters:: size (Optional[int]) – Number of samples to generate.
Return type:: Union[Tuple[Any, Any], Sequence[Tuple[Any, Any]]]
Returns:: Samples from distribution.