Categorical
Data Type: str
The Categorical distributioni, also known as the mutlinomial distribution, is a probability distribution over a set of possible categories. Although this distribution is claimed to support data type str, this distribtion can be used to define a distribution over any set of objects. The probability mass function is given by over a set of values \(V=\{v_1, ..., v_k\}\) is given by
where \(\sum_{i=1}^{k} p_i = 1\). Note that any set of values V can be enumerated and mapped to the set of integers \(0, 1, ..., k-1\). The user can then refer to the faster Integer Categorical Distribution.
For more info see Categorical Distribution.
CategoricalDistribution
- class pysp.stats.categorical.CategoricalDistribution(pmap, default_value=0.0, name=None, keys=None)
Defines a CategoricalDistribution object for data type T.
- name
Assigns a name to the CategoricalDistribution object.
- Type:
Optional[str]
- pmap
Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).
- Type:
Dict[Any, float]
- default_value
Value for prob of observation outside support of CategoricalDistribution, default to 0.0.
- Type:
float
- no_default
True if a non-zero default value is given.
- Type:
bool
- log_default_value
log(default_value).
- Type:
float
- log1p_default_value
log(1+default_value).
- Type:
float
- keys
Key for distribution
- Type:
Optional[str]
- __init__(pmap, default_value=0.0, name=None, keys=None)
Initializes a CategoricalDistribution object.
- Parameters:
pmap (Dict[Any, float]) – Keys (x_i) are the support of the categorical, the value is the probability of the key (p_i).
default_value (float, optional) – Value for prob of observation outside support of CategoricalDistribution. Defaults to 0.0.
name (Optional[str], optional) – Assigns a name to the CategoricalDistribution object. Defaults to None.
keys (Optional[str], optional) – Key for distribution. Defaults to None.
- density(x)
Evaluates the density of the CategoricalDistribution at a given value.
- Parameters:
x (Any) – Value at which to evaluate the density.
- Returns:
Density value at x.
- Return type:
float
- dist_to_encoder()
Returns a CategoricalDataEncoder for this distribution.
- Returns:
Encoder for categorical data.
- Return type:
CategoricalDataEncoder
- estimator(pseudo_count=None)
Creates a CategoricalEstimator for estimating parameters of the CategoricalDistribution.
- Parameters:
pseudo_count (Optional[float], optional) – If set, inflates counts for currently set sufficient statistic (pmap). Defaults to None.
- Returns:
Estimator object for the distribution.
- Return type:
- log_density(x)
Evaluates the log-density of the CategoricalDistribution at a given value.
- Parameters:
x (Any) – Value at which to evaluate the log-density.
- Returns:
Log-density of Categorical distribution evaluated at x.
- Return type:
float
- sampler(seed=None)
Creates a CategoricalSampler for sampling from the CategoricalDistribution.
- Parameters:
seed (Optional[int], optional) – Seed for setting random number generator used to sample. Defaults to None.
- Returns:
Sampler object for the distribution.
- Return type:
- seq_log_density(x)
Vectorized log-density evaluation for a sequence of encoded categorical data.
- Parameters:
x (CategoricalEncodedDataSequence) – Encoded sequence of categorical data.
- Returns:
Array of log-density values for the sequence.
- Return type:
np.ndarray
CategoricalEstimator
- class pysp.stats.categorical.CategoricalEstimator(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)
CategoricalEstimator used to estimate CategoricalDistribution.
- pseudo_count
Inflate sufficient statistic counts by pseudo_count.
- Type:
Optional[float]
- suff_stat
Dictionary with category labels and probabilities as values.
- Type:
Optional[Dict[Any, float]]
- default_value
True is default value should be set.
- Type:
bool
- name
Assign name to be passed to Distribution, Accumulator, ect.
- Type:
Optional[str]
- keys
Assign key to Estimator designating all same key estimators to later be combined, in accumulation.
- Type:
Optional[str]
- __init__(pseudo_count=None, suff_stat=None, default_value=False, name=None, keys=None)
Initializes a CategoricalEstimator object.
- Parameters:
pseudo_count (Optional[float], optional) – Inflate sufficient statistic counts by pseudo_count. Defaults to None.
suff_stat (Optional[Dict[Any, float]], optional) – Dictionary with category labels and probabilities as values. Defaults to None.
default_value (bool, optional) – True if default value should be set. Defaults to False.
name (Optional[str], optional) – Assign name to be passed to Distribution, Accumulator, etc. Defaults to None.
keys (Optional[str], optional) – Assign key to Estimator designating all same key estimators to later be combined, in accumulation. Defaults to None.
- accumulator_factory()
Returns a CategoricalAccumulatorFactory for this estimator.
- Returns:
Factory for creating accumulators.
- Return type:
CategoricalAccumulatorFactory
- estimate(nobs, suff_stat)
Estimates a CategoricalDistribution from sufficient statistics.
- Parameters:
nobs (Optional[float]) – Not used. Kept for consistency with ParameterEstimator.estimate.
suff_stat (Dict[Any, float]) – Dict with categories as keys and counts as values from accumulated data.
- Returns:
Estimated distribution.
- Return type:
CategoricalSampler
- class pysp.stats.categorical.CategoricalSampler(dist, seed=None)
CategoricalSampler object used to generate samples from CategoricalDistribution.
- rng
RandomState with seed set to seed if provided. Else just RandomState().
- Type:
RandomState
- levels
Category labels for the CategoricalDistribution.
- Type:
List[Any]
- probs
Probabilities for each category in CategoricalDistribution.
- Type:
List[float]
- num_levels
Total number of categories. I.e. len(levels).
- Type:
int
- sample(size=None)
Draws samples from the CategoricalSampler object.
- Parameters:
size (Optional[int], optional) – Number of samples to draw. If None, draws a single sample. Defaults to None.
- Returns:
List of levels if size > 1, else a single sample from levels with prob probs.
- Return type:
Union[Any, List[Any]]