Integer Categorical
Data Type: int
The Integer Categorical distribution is a categorical distribution defined on an integer support. The probability mass function is given by
where \(\sum_{i} p_i = 1\) and n is a user-defined length. A categorical distribution defined over sets of objects can be used if the user does not want map a given set of values to the integers (see Categorical Distribution.)
For more info see Integer Categorical Distribution.
IntegerCategoricalDistribution
- class pysp.stats.intrange.IntegerCategoricalDistribution(min_val, p_vec, name=None, keys=None)
IntegerCategoricalDistribution object defining an integer categorical distribution.
- p_vec
Must sum to 1.0. First probability is probability for p_mat(x_mat=min_val).
- Type:
np.ndarray[float]
- min_val
Minimum value in support of integer categorical
- Type:
int
- max_val
Maximum value in support of integer categorical set to min_val + length(p_vec) - 1.
- Type:
int
- log_p_vec
Log of p_vec.
- Type:
np.ndarray[float]
- num_vals
Total number of values in support of IntegerCategoricalDistribution instance.
- Type:
int
- name
Name for object.
- Type:
Optional[str]
- keys
Key for parameter.
- Type:
Optional[str]
- __init__(min_val, p_vec, name=None, keys=None)
IntegerCategoricalDistribution object.
- Parameters:
min_val (int) – Minimum value of the integer categorical support.
p_vec (Union[List[float], np.ndarray]) – Probability vector containing probability of each integer in the support range.
name (Optional[str]) – Assign name to IntegerCategoricalDistribution object.
keys (Optional[str]) – Key for parameter.
- density(x)
Evaluate the density of the integer categorical at observation x.
- Parameters:
x (int) – Integer value.
- Returns:
Density at x.
- Return type:
float
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
IntegerCategoricalDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Evaluate the log-density of the integer categorical at observation x.
- Parameters:
x (int) – Integer value.
- Returns:
Log-density at x.
- Return type:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
IntegerCategoricalEstimator
- class pysp.stats.intrange.IntegerCategoricalEstimator(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)
IntegerCategoricalEstimator object for estimating IntegerCategoricalDistribution
Notes
Must set either min_val and max_val, or suff_stat must be passed as arg.
- min_val
Minimum value of integer categorical.
- Type:
Optional[int]
- max_val
Maximum value of integer categorical.
- Type:
Optional[int]
- pseudo_count
Used to re-weight suff_stat when merged with new aggregated data.
- Type:
Optional[float]
- suff_stat
min value and prob vec
- Type:
Tuple[int, np.ndarray]
- name
Name to IntegerCategoricalEstimator object.
- Type:
Optional[str]
- keys
Keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.
- Type:
Optional[str]
- __init__(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)
IntegerCategoricalEstimator object.
- Parameters:
min_val (Optional[int]) – Set minimum value of integer categorical.
max_val (Optional[int]) – Set maximum value of integer categorical.
pseudo_count (Optional[float]) – Used to re-weight suff_stat member variables in merging of sufficient statistics
suff_stat (
Optional[Tuple[int,ndarray]]) – Set sufficient statistics. See above for details.name (Optional[str]) – Assign a name to IntegerCategoricalEstimator object.
keys (Optional[str]) – Set keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.
- accumulator_factory()
Create SequenceEncodableStatisticAccumulator object.
- Return type:
IntegerCategoricalAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.
- Parameters:
nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.
- Return type:
- Returns:
SequenceEncodableProbabilityDistribution
IntegerCategoricalSampler
- class pysp.stats.intrange.IntegerCategoricalSampler(dist, seed=None)
IntegerCategoricalSampler object for sampling from IntegerCategoricalDistribution.
- dist
IntegerCategoricalDistribution instance to sample from.
- rng
RandomState object with seed set if passed.
- Type:
RandomState
- sample(size=None)
Draw iid samples from IntegerCategoricalSampler object.
Note: If size is None, a single sample is returned as an integer. If size > 0, a List of integers with length equal to size is returned.
- Parameters:
size (Optional[int]) – Number of iid samples to draw.
- Return type:
Union[int,List[int]]- Returns:
Integer or List[int] of iid samples from IntegerCategoricalSampler instance.