Categorical

Data Type: str

The Categorical distributioni, also known as the mutlinomial distribution, is a probability distribution over a set of possible categories. Although this distribution is claimed to support data type str, this distribtion can be used to define a distribution over any set of objects. The probability mass function is given by over a set of values \(V=\{v_1, ..., v_k\}\) is given by

\[\begin{split}f(x|\boldsymbol{p}) = \left\{ \begin{array}{ll} p_i, & x=v_i \\ 0, & x \notin V \end{array} \right.\end{split}\]

where \(\sum_{i=1}^{k} p_i = 1\). Note that any set of values V can be enumerated and mapped to the set of integers \(0, 1, ..., k-1\). The user can then refer to the faster Integer Categorical Distribution.

For more info see Categorical Distribution.

CategoricalDistribution

class pysp.stats.categorical.CategoricalDistribution(pmap, default_value=0.0, name=None, keys=None)

Defines a CategoricalDistribution object for data type T.

name

Assigns a name to the CategoricalDistribution object.

Type:: Optional[str]

pmap

Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).

Type:: Dict[Any, float]

default_value

Value for prob of observation outside support of CategoricalDistribution, default to 0.0.

Type:: float

no_default

True if a non-zero default value is given.

Type:: bool

log_default_value

log(default_value).

Type:: float

log1p_default_value

log(1+default_value).

Type:: float

keys

Key for distribution

Type:: Optional[str]

__init__(pmap, default_value=0.0, name=None, keys=None)

Initializes a CategoricalDistribution object.

Parameters:

pmap (Dict[Any, float]) – Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).
default_value (float, optional) – Value for prob of observation outside support of CategoricalDistribution. Defaults to 0.0.
name (Optional[str], optional) – Assigns a name to the CategoricalDistribution object. Defaults to None.
keys (Optional[str], optional) – Key for distribution. Defaults to None.

density(x)

Evaluates the density of the CategoricalDistribution at a given value.

Parameters:: x (Any) – Value at which to evaluate the density.
Returns:: Density value at x.
Return type:: float

dist_to_encoder()

Returns a CategoricalDataEncoder for this distribution.

Returns:: Encoder for categorical data.
Return type:: CategoricalDataEncoder

estimator(pseudo_count=None)

Creates a CategoricalEstimator for estimating parameters of the CategoricalDistribution.

Parameters:: pseudo_count (Optional[float], optional) – If set, inflates counts for currently set sufficient statistic (pmap). Defaults to None.
Returns:: Estimator object for the distribution.
Return type:: CategoricalEstimator

log_density(x)

Evaluates the log-density of the CategoricalDistribution at a given value.

Parameters:: x (Any) – Value at which to evaluate the log-density.
Returns:: Log-density of Categorical distribution evaluated at x.
Return type:: float

sampler(seed=None)

Creates a CategoricalSampler for sampling from the CategoricalDistribution.

Parameters:: seed (Optional[int], optional) – Seed for setting random number generator used to sample. Defaults to None.
Returns:: Sampler object for the distribution.
Return type:: CategoricalSampler

seq_log_density(x)

Vectorized log-density evaluation for a sequence of encoded categorical data.

Parameters:: x (CategoricalEncodedDataSequence) – Encoded sequence of categorical data.
Returns:: Array of log-density values for the sequence.
Return type:: np.ndarray

CategoricalEstimator

class pysp.stats.categorical.CategoricalEstimator(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)

CategoricalEstimator used to estimate CategoricalDistribution.

pseudo_count

Inflate sufficient statistic counts by pseudo_count.

Type:: Optional[float]

suff_stat

Dictionary with category labels and probabilities as values.

Type:: Optional[Dict[Any, float]]

default_value

True is default value should be set.

Type:: bool

name

Assign name to be passed to Distribution, Accumulator, ect.

Type:: Optional[str]

keys

Assign key to Estimator designating all same key estimators to later be combined, in accumulation.

Type:: Optional[str]

__init__(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)

Initializes a CategoricalEstimator object.

Parameters:

pseudo_count (Optional[float], optional) – Inflate sufficient statistic counts by pseudo_count. Defaults to None.
suff_stat (Optional[Dict[Any, float]], optional) – Dictionary with category labels and probabilities as values. Defaults to None.
default_value (bool, optional) – True if default value should be set. Defaults to False.
name (Optional[str], optional) – Assign name to be passed to Distribution, Accumulator, etc. Defaults to None.
keys (Optional[str], optional) – Assign key to Estimator designating all same key estimators to later be combined, in accumulation. Defaults to None.

accumulator_factory()

Returns a CategoricalAccumulatorFactory for this estimator.

Returns:: Factory for creating accumulators.
Return type:: CategoricalAccumulatorFactory

estimate(nobs, suff_stat)

Estimates a CategoricalDistribution from sufficient statistics.

Parameters:

nobs (Optional[float]) – Not used. Kept for consistency with ParameterEstimator.estimate.
suff_stat (Dict[Any, float]) – Dict with categories as keys and counts as values from accumulated data.

Returns:

Estimated distribution.

Return type:

CategoricalDistribution

CategoricalSampler

class pysp.stats.categorical.CategoricalSampler(dist, seed=None)

CategoricalSampler object used to generate samples from CategoricalDistribution.

rng

RandomState with seed set to seed if provided. Else just RandomState().

Type:: RandomState

levels

Category labels for the CategoricalDistribution.

Type:: List[Any]

probs

Probabilities for each category in CategoricalDistribution.

Type:: List[float]

num_levels

Total number of categories. I.e. len(levels).

Type:: int

sample(size=None)

Draws samples from the CategoricalSampler object.

Parameters:: size (Optional[int], optional) – Number of samples to draw. If None, draws a single sample. Defaults to None.
Returns:: List of levels if size > 1, else a single sample from levels with prob probs.
Return type:: Union[Any, List[Any]]