Dirichelt Distribution

Data Type: Sequence[float]

The Dirichlet distribution is a distribution on the simplex. A d-dimensinoal Dirichlet random variable \((X_1, X_2, ..., X_d)\) has a density function is given by

\[f(\boldsymbol{x} \vert \boldsymbol{\alpha}) = \frac{1}{B(\boldsymbol{\alpha})}\prod_{i=1}^{d} x_i^{\alpha_i-1}, \;0 \leq x_i \leq 1\]

where \(B(\boldsymbol(\alpha)) = \frac{\prod_{i=1}^{d}\Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^{d}\alpha_i\right)}\) and \(\sum_{i=1}^{d} x_i = 1\).

For more info see Dirichlet Distribution.

DirichletDistribution

class dml.stats.dirichlet.DirichletDistribution(alpha, name=None, keys=None)

DirichletDistribution object defining Dirichlet distribution with parameter alpha.

dim

Number of categories in Dirichlet.

Type:

int

alpha

Concentration parameters of length dim.

Type:

np.ndarray

alpha_ma

Boolean mask for positive alpha entries.

Type:

np.ndarray

log_const

Normalizing constant for distribution.

Type:

float

has_invalid

True if any alpha are less than or equal to 0.

Type:

bool

name

Optional name for object instance.

Type:

Optional[str]

keys

Optional key for merging sufficient statistics.

Type:

Optional[str]

__init__(alpha, name=None, keys=None)

Initialize DirichletDistribution.

Parameters:
  • alpha (Union[List[float], np.ndarray]) – Array of alpha values. Determines size of Dirichlet distribution.

  • name (Optional[str], optional) – Name for distribution.

  • keys (Optional[str], optional) – Key for merging sufficient statistics.

density(x)

Evaluate the density of a Dirichlet observation.

Parameters:

x (Union[List[float], np.ndarray]) – A single Dirichlet observation.

Returns:

Density evaluated at x.

Return type:

float

dist_to_encoder()

Create DirichletDataEncoder object for encoding sequences of iid Dirichlet observations.

Returns:

Encoder object.

Return type:

DirichletDataEncoder

estimator(pseudo_count=None)

Return a DirichletEstimator for this distribution.

Parameters:

pseudo_count (Optional[float], optional) – Pseudo-count for regularization.

Returns:

Estimator object.

Return type:

DirichletEstimator

log_density(x)

Evaluate the log-density of a Dirichlet observation.

The log-density of a Dirichlet with dim = K, is given by

log(p_mat(x)) = -log(Const) + sum_{k=0}^{K-1} (alpha_k -1)*log(x_k), for sum_k x_k = 1.0,

where

log(Const) = sum_{k=0}^{K-1} log(Gamma(alpha_k)) - log(Gamma(sum_{k=0}^{K-1} alpha_k)).

Parameters:

x (Union[List[float], np.ndarray]) – A single Dirichlet observation.

Returns:

Log-density evaluated at x.

Return type:

float

sampler(seed=None)

Return a DirichletSampler for this distribution.

Parameters:

seed (Optional[int], optional) – Seed for random number generator.

Returns:

Sampler object.

Return type:

DirichletSampler

seq_log_density(x)

Vectorized log-density for encoded data.

Parameters:

x (DirichletEncodedDataSequence) – Encoded data sequence.

Returns:

Log-density values.

Return type:

np.ndarray

DirichletEstimator

class dml.stats.dirichlet.DirichletEstimator(dim, pseudo_count=None, suff_stat=None, delta=1e-08, keys=None, use_mpe=False, name=None)

DirichletEstimator object.

dim

Dimension of Dirichlet distribution to estimate.

Type:

int

pseudo_count

Pseudo count for sufficient statistics.

Type:

Optional[float]

delta

Tolerance for shape estimation from sufficient statistics.

Type:

Optional[float]

suff_stat

Sufficient statistics.

Type:

Optional[np.ndarray]

keys

Optional key string for shape parameter.

Type:

Optional[str]

use_mpe

If True, use max posterior estimate.

Type:

bool

name

Name for object.

Type:

Optional[str]

__init__(dim, pseudo_count=None, suff_stat=None, delta=1e-08, keys=None, use_mpe=False, name=None)

Initialize DirichletEstimator.

Parameters:
  • dim (int) – Dimension of Dirichlet distribution to estimate.

  • pseudo_count (Optional[float], optional) – Pseudo count for sufficient statistics.

  • suff_stat (Optional[np.ndarray], optional) – Sufficient statistics.

  • delta (Optional[float], optional) – Tolerance for shape estimation from sufficient statistics.

  • keys (Optional[str], optional) – Optional key string for shape parameter.

  • use_mpe (bool, optional) – If True, use max posterior estimate.

  • name (Optional[str], optional) – Name for object.

Raises:

TypeError – If keys is not a string or None.

accumulator_factory()

Return a DirichletAccumulatorFactory for this estimator.

Returns:

Factory object.

Return type:

DirichletAccumulatorFactory

estimate(nobs, suff_stat)

Estimate a DirichletDistribution from sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Number of observations.

  • suff_stat (Tuple[float, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics.

Returns:

Estimated distribution.

Return type:

DirichletDistribution

DirichletSampler

class dml.stats.dirichlet.DirichletSampler(dist, seed=None)

DirichletSampler object for drawing samples from Dirichlet distribution.

rng

RandomState object for generating seeded samples.

Type:

RandomState

dist

DirichletDistribution object to draw samples from.

Type:

DirichletDistribution

sample(size=None)

Draw samples from Dirichlet distribution.

Parameters:

size (Optional[int], optional) – Number of samples to draw.

Returns:

Array of samples (size, dim).

Return type:

np.ndarray