Dirichelt Distribution
Data Type: Sequence[float]
The Dirichlet distribution is a distribution on the simplex. A d-dimensinoal Dirichlet random variable \((X_1, X_2, ..., X_d)\) has a density function is given by
where \(B(\boldsymbol(\alpha)) = \frac{\prod_{i=1}^{d}\Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^{d}\alpha_i\right)}\) and \(\sum_{i=1}^{d} x_i = 1\).
For more info see Dirichlet Distribution.
DirichletDistribution
- class dml.stats.dirichlet.DirichletDistribution(alpha, name=None, keys=None)
DirichletDistribution object defining Dirichlet distribution with parameter alpha.
- dim
Number of categories in Dirichlet.
- Type:
int
- alpha
Concentration parameters of length dim.
- Type:
np.ndarray
- alpha_ma
Boolean mask for positive alpha entries.
- Type:
np.ndarray
- log_const
Normalizing constant for distribution.
- Type:
float
- has_invalid
True if any alpha are less than or equal to 0.
- Type:
bool
- name
Optional name for object instance.
- Type:
Optional[str]
- keys
Optional key for merging sufficient statistics.
- Type:
Optional[str]
- __init__(alpha, name=None, keys=None)
Initialize DirichletDistribution.
- Parameters:
alpha (Union[List[float], np.ndarray]) – Array of alpha values. Determines size of Dirichlet distribution.
name (Optional[str], optional) – Name for distribution.
keys (Optional[str], optional) – Key for merging sufficient statistics.
- density(x)
Evaluate the density of a Dirichlet observation.
- Parameters:
x (Union[List[float], np.ndarray]) – A single Dirichlet observation.
- Returns:
Density evaluated at x.
- Return type:
float
- dist_to_encoder()
Create DirichletDataEncoder object for encoding sequences of iid Dirichlet observations.
- Returns:
Encoder object.
- Return type:
DirichletDataEncoder
- estimator(pseudo_count=None)
Return a DirichletEstimator for this distribution.
- Parameters:
pseudo_count (Optional[float], optional) – Pseudo-count for regularization.
- Returns:
Estimator object.
- Return type:
- log_density(x)
Evaluate the log-density of a Dirichlet observation.
The log-density of a Dirichlet with dim = K, is given by
log(p_mat(x)) = -log(Const) + sum_{k=0}^{K-1} (alpha_k -1)*log(x_k), for sum_k x_k = 1.0,
where
log(Const) = sum_{k=0}^{K-1} log(Gamma(alpha_k)) - log(Gamma(sum_{k=0}^{K-1} alpha_k)).
- Parameters:
x (Union[List[float], np.ndarray]) – A single Dirichlet observation.
- Returns:
Log-density evaluated at x.
- Return type:
float
- sampler(seed=None)
Return a DirichletSampler for this distribution.
- Parameters:
seed (Optional[int], optional) – Seed for random number generator.
- Returns:
Sampler object.
- Return type:
- seq_log_density(x)
Vectorized log-density for encoded data.
- Parameters:
x (DirichletEncodedDataSequence) – Encoded data sequence.
- Returns:
Log-density values.
- Return type:
np.ndarray
DirichletEstimator
- class dml.stats.dirichlet.DirichletEstimator(dim, pseudo_count=None, suff_stat=None, delta=1e-08, keys=None, use_mpe=False, name=None)
DirichletEstimator object.
- dim
Dimension of Dirichlet distribution to estimate.
- Type:
int
- pseudo_count
Pseudo count for sufficient statistics.
- Type:
Optional[float]
- delta
Tolerance for shape estimation from sufficient statistics.
- Type:
Optional[float]
- suff_stat
Sufficient statistics.
- Type:
Optional[np.ndarray]
- keys
Optional key string for shape parameter.
- Type:
Optional[str]
- use_mpe
If True, use max posterior estimate.
- Type:
bool
- name
Name for object.
- Type:
Optional[str]
- __init__(dim, pseudo_count=None, suff_stat=None, delta=1e-08, keys=None, use_mpe=False, name=None)
Initialize DirichletEstimator.
- Parameters:
dim (int) – Dimension of Dirichlet distribution to estimate.
pseudo_count (Optional[float], optional) – Pseudo count for sufficient statistics.
suff_stat (Optional[np.ndarray], optional) – Sufficient statistics.
delta (Optional[float], optional) – Tolerance for shape estimation from sufficient statistics.
keys (Optional[str], optional) – Optional key string for shape parameter.
use_mpe (bool, optional) – If True, use max posterior estimate.
name (Optional[str], optional) – Name for object.
- Raises:
TypeError – If keys is not a string or None.
- accumulator_factory()
Return a DirichletAccumulatorFactory for this estimator.
- Returns:
Factory object.
- Return type:
DirichletAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate a DirichletDistribution from sufficient statistics.
- Parameters:
nobs (Optional[float]) – Number of observations.
suff_stat (Tuple[float, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics.
- Returns:
Estimated distribution.
- Return type:
DirichletSampler
- class dml.stats.dirichlet.DirichletSampler(dist, seed=None)
DirichletSampler object for drawing samples from Dirichlet distribution.
- rng
RandomState object for generating seeded samples.
- Type:
RandomState
- dist
DirichletDistribution object to draw samples from.
- Type:
- sample(size=None)
Draw samples from Dirichlet distribution.
- Parameters:
size (Optional[int], optional) – Number of samples to draw.
- Returns:
Array of samples (size, dim).
- Return type:
np.ndarray