Categorical

Data Type: str

The Categorical distributioni, also known as the mutlinomial distribution, is a probability distribution over a set of possible categories. Although this distribution is claimed to support data type str, this distribtion can be used to define a distribution over any set of objects. The probability mass function is given by over a set of values \(V=\{v_1, ..., v_k\}\) is given by

\[\begin{split}f(x|\boldsymbol{p}) = \left\{ \begin{array}{ll} p_i, & x=v_i \\ 0, & x \notin V \end{array} \right.\end{split}\]

where \(\sum_{i=1}^{k} p_i = 1\). Note that any set of values V can be enumerated and mapped to the set of integers \(0, 1, ..., k-1\). The user can then refer to the faster Integer Categorical Distribution.

For more info see Categorical Distribution.

CategoricalDistribution

class pysp.stats.categorical.CategoricalDistribution(pmap, default_value=0.0, name=None, keys=None)

Defines a CategoricalDistribution object for data type T.

name

Assigns a name to the CategoricalDistribution object.

Type:

Optional[str]

pmap

Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).

Type:

Dict[Any, float]

default_value

Value for prob of observation outside support of CategoricalDistribution, default to 0.0.

Type:

float

no_default

True if a non-zero default value is given.

Type:

bool

log_default_value

log(default_value).

Type:

float

log1p_default_value

log(1+default_value).

Type:

float

keys

Key for distribution

Type:

Optional[str]

__init__(pmap, default_value=0.0, name=None, keys=None)

Initializes a CategoricalDistribution object.

Parameters:
  • pmap (Dict[Any, float]) – Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).

  • default_value (float, optional) – Value for prob of observation outside support of CategoricalDistribution. Defaults to 0.0.

  • name (Optional[str], optional) – Assigns a name to the CategoricalDistribution object. Defaults to None.

  • keys (Optional[str], optional) – Key for distribution. Defaults to None.

density(x)

Evaluates the density of the CategoricalDistribution at a given value.

Parameters:

x (Any) – Value at which to evaluate the density.

Returns:

Density value at x.

Return type:

float

dist_to_encoder()

Returns a CategoricalDataEncoder for this distribution.

Returns:

Encoder for categorical data.

Return type:

CategoricalDataEncoder

estimator(pseudo_count=None)

Creates a CategoricalEstimator for estimating parameters of the CategoricalDistribution.

Parameters:

pseudo_count (Optional[float], optional) – If set, inflates counts for currently set sufficient statistic (pmap). Defaults to None.

Returns:

Estimator object for the distribution.

Return type:

CategoricalEstimator

log_density(x)

Evaluates the log-density of the CategoricalDistribution at a given value.

Parameters:

x (Any) – Value at which to evaluate the log-density.

Returns:

Log-density of Categorical distribution evaluated at x.

Return type:

float

sampler(seed=None)

Creates a CategoricalSampler for sampling from the CategoricalDistribution.

Parameters:

seed (Optional[int], optional) – Seed for setting random number generator used to sample. Defaults to None.

Returns:

Sampler object for the distribution.

Return type:

CategoricalSampler

seq_log_density(x)

Vectorized log-density evaluation for a sequence of encoded categorical data.

Parameters:

x (CategoricalEncodedDataSequence) – Encoded sequence of categorical data.

Returns:

Array of log-density values for the sequence.

Return type:

np.ndarray

CategoricalEstimator

class pysp.stats.categorical.CategoricalEstimator(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)

CategoricalEstimator used to estimate CategoricalDistribution.

pseudo_count

Inflate sufficient statistic counts by pseudo_count.

Type:

Optional[float]

suff_stat

Dictionary with category labels and probabilities as values.

Type:

Optional[Dict[Any, float]]

default_value

True is default value should be set.

Type:

bool

name

Assign name to be passed to Distribution, Accumulator, ect.

Type:

Optional[str]

keys

Assign key to Estimator designating all same key estimators to later be combined, in accumulation.

Type:

Optional[str]

__init__(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)

Initializes a CategoricalEstimator object.

Parameters:
  • pseudo_count (Optional[float], optional) – Inflate sufficient statistic counts by pseudo_count. Defaults to None.

  • suff_stat (Optional[Dict[Any, float]], optional) – Dictionary with category labels and probabilities as values. Defaults to None.

  • default_value (bool, optional) – True if default value should be set. Defaults to False.

  • name (Optional[str], optional) – Assign name to be passed to Distribution, Accumulator, etc. Defaults to None.

  • keys (Optional[str], optional) – Assign key to Estimator designating all same key estimators to later be combined, in accumulation. Defaults to None.

accumulator_factory()

Returns a CategoricalAccumulatorFactory for this estimator.

Returns:

Factory for creating accumulators.

Return type:

CategoricalAccumulatorFactory

estimate(nobs, suff_stat)

Estimates a CategoricalDistribution from sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Not used. Kept for consistency with ParameterEstimator.estimate.

  • suff_stat (Dict[Any, float]) – Dict with categories as keys and counts as values from accumulated data.

Returns:

Estimated distribution.

Return type:

CategoricalDistribution

CategoricalSampler

class pysp.stats.categorical.CategoricalSampler(dist, seed=None)

CategoricalSampler object used to generate samples from CategoricalDistribution.

rng

RandomState with seed set to seed if provided. Else just RandomState().

Type:

RandomState

levels

Category labels for the CategoricalDistribution.

Type:

List[Any]

probs

Probabilities for each category in CategoricalDistribution.

Type:

List[float]

num_levels

Total number of categories. I.e. len(levels).

Type:

int

sample(size=None)

Draws samples from the CategoricalSampler object.

Parameters:

size (Optional[int], optional) – Number of samples to draw. If None, draws a single sample. Defaults to None.

Returns:

List of levels if size > 1, else a single sample from levels with prob probs.

Return type:

Union[Any, List[Any]]