Abstract Classes

DMixLearn captures most distributions in the exponential family. A detailed walkthrough on defining a custom distribution class can be found in User Defined Classes. We list the abstract classes that exist in DMixLearn below.

ProbabilityDistribution

class dml.stats.pdist.ProbabilityDistribution

Defines ProbabilityDistribution Abstract Class.

Note

This is generally used as an inherited class for SequenceEncodableProbabilityDistribution.

__init__()
abstract estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

ParameterEstimator

Returns:

ParameterEstimator

abstract log_density(x)

Evaluate the log-density of distribution.

Return type:

float

Returns:

float

abstract sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

DistributionSampler

SequenceEncodableProbabilityDistribution

class dml.stats.pdist.SequenceEncodableProbabilityDistribution

Extends the ProbabilityDistribution to handle vectorized calls.

abstract dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

DataSequenceEncoder

Returns:

DataSequenceEncoder

abstract seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

DistributionSampler

class dml.stats.pdist.DistributionSampler(dist, seed=None)

DistributionSampler is an Abstract class for distribution samplers.

dist

Distribution to sample from.

Type:

SequenceEncodableProbabilityDistribution

rng

Random number generator.

Type:

RandomState

__init__(dist, seed=None)

Initialize DistributionSampler.

Parameters:
new_seed()

Generates a new seed from rng

Return type:

int

abstract sample(size=None)

Generate samples from distribution.

Parameters:

size (Optional[int]) – Number of samples to generate.

Return type:

Any

Returns:

Samples from distribution.

ConditionalSampler

class dml.stats.pdist.ConditionalSampler

AbstractClass for ConditionalSampler.

Note

This is only implemented for samples of conditional distributions.

abstract sample_given(x)

Sample at conditional value.

Parameters:

x (Any) – Conditioned on x, sample from dist.

Returns:

Sample from conditional distribution.

StatisticAccumulator

class dml.stats.pdist.StatisticAccumulator
abstract combine(suff_stat)

Method for combining aggregated sufficient statistics.

Parameters:

suff_stat (SS) – Sufficient statistics.

Return type:

StatisticAccumulator

Returns:

None

abstract from_value(x)

Set sufficient statistics equal to passed value.

Parameters:

x (SS) – Generic sufficient statistic for instance of StatisticAccumulator.

Return type:

SequenceEncodableStatisticAccumulator

initialize(x, weight, rng)

Initialize sufficient statistics for a single data observation.

Note

Used for debugging only.

Parameters:
  • x (Any) – Data type corresponding to StatisticAccumulator object.

  • weight (float) – Weight associated with single observation.

  • rng (np.random.RandomState) – Set seed for initialization.

Return type:

None

abstract key_merge(stats_dict)

Merge sufficient statistics with matching keys.

Parameters:

stats_dict (Dict[str, Any]) – Dict mapping keys to sufficient statistic value or accumulator.

Return type:

None

abstract key_replace(stats_dict)

Set sufficient statistics of accumulator instance to key’d values.

Parameters:

stats_dict (Dict[str, Any]) – Dict mapping keys to sufficient statistic value or accumulator.

Return type:

None

update(x, weight, estimate)

Accumulate sufficient statistics for a single data observation.

Note

Used for debugging only.

Parameters:
  • x (Any) – Data type corresponding to StatisticAccumulator object.

  • weight (float) – Weight associated with single observation.

  • estimate (SequenceEncodableProbabilityDistribution) – Previous estimate of distribution.

Return type:

None

abstract value()

Return sufficient statistics of StatisticAccumulator.

Return type:

TypeVar(SS)

SequenceEncodableStatisticAccumulator

class dml.stats.pdist.SequenceEncodableStatisticAccumulator
abstract acc_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableStatisticAccumulator instance.

Return type:

DataSequenceEncoder

abstract seq_initialize(x, weights, rng)

Vectorized initialization of sufficient statistics.

Parameters:
  • x (EncodedDataSequence) – EncodedDataSequence for given SequenceEncodableStatisticAccumulator type.

  • weights (np.ndarray) – weights for observations.

  • rng (np.random.RandomState) – RandomState object for setting seed on initialization.

Return type:

None

abstract seq_update(x, weights, estimate)

Vectorized accumulation of sufficient statistics for EM updates.

Parameters:
Return type:

None

ParameterEstimator

class dml.stats.pdist.ParameterEstimator(*args)

Abstract class for ParameterEstimator object.

abstract __init__(*args)

Must implement constructor for ParameterEstimator

abstract accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

StatisticAccumulatorFactory

abstract estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

SequenceEncodableProbabilityDistribution

Returns:

SequenceEncodableProbabilityDistribution

DataSequenceEncoder

class dml.stats.pdist.DataSequenceEncoder
abstract seq_encode(x)

Create EncodedDataSequence from iid observations from SequenceEncodedProbabilityDistribution.

Parameters:

x (Any) – Sequence of observations from corresponding distribution.

Return type:

EncodedDataSequence

Returns:

EncodedDataSequence

EncodedDataSequence

class dml.stats.pdist.EncodedDataSequence(data)

EncodedDatSequence is the outputed data structure from DataSeqeunceEncoder. Object is used for vectorized functions and type checks.

__init__(data)

Create instance of EncodedDataSequence.

Parameters:

data (Any) – Store the data encocded for vectorized calls.